Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (4): 28-38    DOI: 10.11925/infotech.2096-3467.2021.0545
News Classification with Semi-Supervised and Active Learning
Chen Guo1,2(),Ye Chao1
1School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094, China
2Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety, Nanjing 210094, China
[Objective] This paper proposes a news classification scheme combining semi-supervised learning and active learning, aiming to improve intelligence monitoring based on news mining. [Methods] First, we carried out K-means clustering based on the learning of news text representations, and selected a small number of representative samples from various clusters for manual judgment. These categories were merged and adjusted as sub-field categories. Then, we used the representative samples as the training set for a variety of integrated classification algorithms and train the initial classifier. Finally, we utilized active learning to optimize the initial classifier. [Results] We tested our new model with news on tanks and armored vehicles. After active learning, we received better text classification results. The precision, recall and F1 value reached 83.68%, 83.35% and 83.17%, which were increased by 2.71%, 2.52% and 2.81% respectively. [Limitations] To reduce manually labeling work, we only conducted 2 iterations. [Conclusions] The proposed method can effectively classify news with little corpus annotation and no pre-trained classifier. It could also be used in other fields.

Key wordsSemi-Supervised Learning      Active Learning      Text Classification      Ensemble Learning     
Received: 01 June 2021      Published: 12 May 2022
ZTFLH:  G350  
Fund:Youth Foundation of Social Science and Humanity, China Ministry of Education(21YJC870003);Social Science Fund of Jiangsu Province(21TQC002)
Corresponding Authors: Chen Guo,ORCID:0000-0003-2873-1051     E-mail:

Cite this article:

Chen Guo, Ye Chao. News Classification with Semi-Supervised and Active Learning. Data Analysis and Knowledge Discovery, 2022, 6(4): 28-38.

Flow Chart of News Classification in Subdivided Fields Based on Clustering and Active Learning
Active Learning Process
主题类别 数目
军事行动与部署 387
武器装备贸易 185
军事演练 317
新型装备技术 311
杂质 687
总计 1 887
Statistics of News Topics in Tank and Armored Vehicle Field
基分类器 初始权重
子训练集1 子训练集2 子训练集3
随机森林 0.68 0.70 0.70
SVM 0.89 0.84 0.86
Softmax 0.90 0.87 0.87
Weight of Initial Base Classifier
Confusion Matrix Analysis Based on Training Set
分类模型 正确率/% 召回率/% F1值/%
初始分类模型 80.97 80.83 80.36
第一轮主动学习后 83.38 83.00 82.51
第二轮主动学习后 83.68 83.35 83.17
Classification Results of Weapon Equipment News Based on Active Learning
Confusion Matrix Analysis Based on Test Set
