Please wait a minute...
Data Analysis and Knowledge Discovery
Current Issue | Archive | Adv Search |
A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation
Liu Tong,Liu Chen,Ni Weijian
(Department of Computer Science and Engineering, Shandong University of Science and Technology, Shandong 266590, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective]In view of the difficulty in obtaining high-quality labeled data in the field of natural language processing, this paper designes a semi-supervised Chinese sentiment analysis method based on multi-level data augumentation. [Methods]A large number of unlabeled samples were obtained by simple data enhancement and reverse translation text enhancement techniques, and the data signals of unlabeled samples are extracted by calculating the consistency norm for unlabeled samples;The pseudo-label of the weakly enhanced sample was calculated, the supervised training signal is constructed from the strongly enhanced sample together with the pseudo-label, and the model is filtered by confidence threshold to make the modle produce prediction results with high confiendce. [Results] Experiments are conducted on three publicly availbale sentiment analysis datasets, and results show that using only 1000 labeled documents on the waimai and weibo datasets can achieve a performance improvement over BERT 2.3% and 6.1%respectively. [Limitation]The experiments were all carried out on the public general corpus, and the effect was on vertical domain datasets was not attempted. [Conclusion] The proposed method in this paper fully exploits the information of unlabeled samples, which can alleviate the problem that labeled data is not easily accessible, and has strong predictive stability.

Key words Sentiment Analysis      Semi-supervised Learning      Consistency Regularity      Data Augmentation      
Published: 08 March 2021
ZTFLH:  TP393,G250  

Cite this article:

Liu Tong, Liu Chen, Ni Weijian. A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467. 2020.1170     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y0/V/I/1

[1] Sun Yu, Qiu Jiangnan. Studying Opinion Leaders with Network Analysis and Text Mining[J]. 数据分析与知识发现, 2022, 6(1): 69-79.
[2] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[3] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[4] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[5] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[6] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[7] Yu Bengong, Zhang Shuwen. Aspect-Level Sentiment Analysis Based on BAGCNN[J]. 数据分析与知识发现, 2021, 5(12): 37-47.
[8] Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[9] Lv Huakui,Liu Zhenghao,Qian Yuxing,Hong Xudong. Relationship Between Financial News and Stock Market Fluctuations[J]. 数据分析与知识发现, 2021, 5(1): 99-111.
[10] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[11] Jiang Lin,Zhang Qilin. Research on Academic Evaluation Based on Fine-Grain Citation Sentimental Quantification[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[12] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[13] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[14] Shen Zhuo,Li Yan. Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[15] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn