Semi-supervised Micro-blog Sentiment Classification Method Combining Active Learning and Co-training
Bi Qiumin1, Li Ming2, Zeng Zhiyong3
1. Faculty of Art and Communication, Kunming University of Science and Technology, Kunming 650093, China;
2. School of Information, Yunnan University of Finance and Economics, Kunming 650221, China;
3. Center of Information Management, Yunnan University of Finance and Economics, Kunming 650221, China
[Objective] Aimed at less labeled data and more unlabeled samples in micro-blog sentiment classification, a novel method is proposed. [Methods] Active learning is introduced into co-training, the method selects the most valuable ones from low confidence samples, then labels and adds them into training dataset, trains classifiers again. [Results] Experimental results show that classifiers have better performance in this way, and the accuracy is improved obviously. Especially when labeled data reaches 40%, the accuracy increases by about 5%. [Limitations] In the collaborative process, random feature subspace generation can not build two strong classifiers, so hypothesis are not fulfilled. [Conclusions] This method solves the defects of co-training after introducing active learning; the performance and accuracy of classifiers are enhanced.
毕秋敏, 李明, 曾志勇. 一种主动学习和协同训练相结合的半监督微博情感分类方法[J]. 现代图书情报技术, 2015, 31(1): 38-44.
Bi Qiumin, Li Ming, Zeng Zhiyong. Semi-supervised Micro-blog Sentiment Classification Method Combining Active Learning and Co-training. New Technology of Library and Information Service, 2015, 31(1): 38-44.