Please wait a minute...
Data Analysis and Knowledge Discovery
Current Issue | Archive | Adv Search |
The Research on Semi-Supervised Text Classification Method Based on DW-TCI
Yu Bengong,Ji Haomin
(School of Management, Hefei University of Technology, Hefei 230009, China)
(Key Laboratory of Process Optimization & Intelligent Decision-making, Ministry of Education, Hefei University of Technology, Hefei 230009, China)
Download: PDF(698 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] To efficiently classify text with only a small number of annotations, and propose a new semi-supervised text classification method.

[Methods] The proposed DW-TCI semi-supervised text classification method uses two-channel feature extraction to obtain two sets of feature input vectors of the base classifier group, and introduces the semi-supervised classification method based on divergence and the idea of integrated learning. The result sample is introduced into the model training, and finally the classification result of the predicted text is obtained by the equivalent weighted voting method.

[Result] Under two different data sets, when the DW-TCI method is trained with 20% labeled samples, the classification accuracy reaches 92.32% and 87.01% respectively, which is at least 5.54% and 5.65% higher than other semi-supervised classification methods.

[Limitations] Text uses a smaller number of data sets and has not been verified on more data sets.

[Conclusion] The semi-supervised classification method in this paper can greatly reduce the labeling of training samples and provide effective support for service providers to perform efficient text classification.

Key words semi-supervised classification      sample divergence      classifier divergence      ensemble learning      DW-TCI      
Published: 28 July 2020
ZTFLH:  TP391  

Cite this article:

Yu Bengong, Ji Haomin. The Research on Semi-Supervised Text Classification Method Based on DW-TCI . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0219     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y0/V/I/1

[1] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[2] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[3] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[4] Sisi Gui,Wei Lu,Xiaojuan Zhang. Temporal Intent Classification with Query Expression Feature[J]. 数据分析与知识发现, 2019, 3(3): 66-75.
[5] Wei Cao,Can Li,Tingting He,Weidong Zhu. Predicting Credit Risks of P2P Loans in China Based on Ensemble Learning Methods[J]. 数据分析与知识发现, 2018, 2(10): 65-76.
[6] Wang Huaqiu, Wang Bin, Nie Zhen. Research on Image Semantic Mapping with Multiple-Reservoirs Echo State Network[J]. 现代图书情报技术, 2015, 31(6): 41-48.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn