Data Analysis and Knowledge Discovery
A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation
Liu Tong,Liu Chen,Ni Weijian
(Department of Computer Science and Engineering, Shandong University of Science and Technology, Shandong 266590, China)
[Objective]In view of the difficulty in obtaining high-quality labeled data in the field of natural language processing, this paper designes a semi-supervised Chinese sentiment analysis method based on multi-level data augumentation. [Methods]A large number of unlabeled samples were obtained by simple data enhancement and reverse translation text enhancement techniques, and the data signals of unlabeled samples are extracted by calculating the consistency norm for unlabeled samples;The pseudo-label of the weakly enhanced sample was calculated, the supervised training signal is constructed from the strongly enhanced sample together with the pseudo-label, and the model is filtered by confidence threshold to make the modle produce prediction results with high confiendce. [Results] Experiments are conducted on three publicly availbale sentiment analysis datasets, and results show that using only 1000 labeled documents on the waimai and weibo datasets can achieve a performance improvement over BERT 2.3% and 6.1%respectively. [Limitation]The experiments were all carried out on the public general corpus, and the effect was on vertical domain datasets was not attempted. [Conclusion] The proposed method in this paper fully exploits the information of unlabeled samples, which can alleviate the problem that labeled data is not easily accessible, and has strong predictive stability.

Key words Sentiment Analysis      Semi-supervised Learning      Consistency Regularity      Data Augmentation      
Published: 08 March 2021
Cite this article:

Liu Tong, Liu Chen, Ni Weijian. A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL: 2020.1170     OR

