[Objective] The paper tries to improve the accuracy of cross-language sentiment classification by narrowing the distribution of bilingual text pairs in the shared space. [Methods] In the process of emotional knowledge transfer, we aligned the word and text pairs simultaneously by adjusting the balance coefficient. Then, we combined the language discriminator to generate the conversion matrix for adversarial network optimization. Finally, we used a multi-feature fusion hierarchical neural network to represent the texts, the contexts, as well as the topic relevance of words and sentences, which addressed the issue of long-distance feature dependence of the texts. [Results] We examined our model on the NLP&CC 2013 standard data sets and the average cross-language sentiment classification accuracy was 83.66%, which was 2.30% higher than the benchmark model. [Limitations] This method was only tested with Chinese and English datasets. More research is needed to evaluate its effectiveness with other languages. [Conclusions] Improving the similarity of bilingual texts could effectively increase the accuracy of cross-language sentiment classification.
杨文丽, 李娜娜. 基于对抗网络的文本对齐跨语言情感分类方法*[J]. 数据分析与知识发现, 2022, 6(7): 141-151.
Yang Wenli, Li Nana. A Text-Aligned Cross-Language Sentiment Classification Method Based on Adversarial Networks. Data Analysis and Knowledge Discovery, 2022, 6(7): 141-151.
The water in the hotel is too hot and burns people.
Table 1 评论实例
Fig.6 双语情感词典
数据集
DVD
Book
Music
训练集
英文
35 000
41 000
41 000
中文
30 000
35 000
40 000
测试集
中文
500
500
500
Table 2 数据集
Fig.7 准确率随相似度变化趋势
Fig.8 文本比例变化分类结果
Fig.9 优化前后的结果变化
数据集
指标
SD
SWD
SWAB
SWC
SS-BiDocv
DVD
Acc/%
75.22
76.00
76.39
80.97
83.52
Pre/%
74.15
74.89
76.60
76.54
82.02
Rec/%
77.12
78.22
76.00
76.54
82.00
Book
Acc/%
76.47
78.25
80.93
78.26
84.26
Pre/%
74.68
75.33
89.58
79.28
83.62
Rec/%
80.10
84.00
70.00
76.52
85.18
Music
Acc/%
75.25
78.31
77.39
79.26
83.20
Pre/%
76.34
77.25
75.00
80.96
81.50
Rec/%
73.96
80.26
82.18
76.52
85.90
Table 3 特征融合分类结果
Fig.10 不同阈值下分类准确率变化趋势
方法
分类准确率/%
DVD
Book
Music
Average
MT(En-Ch)
70.32
75.40
74.26
73.33
MT(Ch-En)
76.25
76.00
73.21
75.15
SCL-CLSC
82.60
82.90
78.95
81.48
CLWEs
82.92
83.00
81.13
82.35
BLSE
79.36
77.95
81.20
79.50
AttLSTM-CLSC
81.22
82.50
80.66
81.46
ACNN-AMT
80.58
81.85
81.22
81.22
BSWE
81.60
81.05
79.40
80.68
本文方法
83.52
84.26
83.20
83.66
Table 4 不同跨语言情感分类模型分类结果
[1]
Kornai A. Formal Phonology[M]. London: Routledge, 2018.
[2]
Barnes J, Lambert P, Badia T. Exploring Distributional Representations and Machine Translation for Aspect-based Cross-lingual Sentiment Classification[C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1613-1623.
[3]
Fei H L, Li P. Cross-Lingual Unsupervised Sentiment Classification with Multi-view Transfer Learning[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 5759-5771.
[4]
Otani N, Ozaki S, Zhao X Y, et al. Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 4451-4464.
[5]
Ormazabal A, Artetxe M, Labaka G, et al. Analyzing the Limitations of Cross-lingual Word Embedding Mappings[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4990-4995.
[6]
Wang H Z, Henderson J, Merlo P. Weakly-supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4418-4429.
[7]
Liu X B, Wong D F, Liu Y, et al. Shared-private Bilingual Word Embeddings for Neural Machine Translation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 3613-3622.
[8]
Artetxe M, Labaka G, Agirre E. A Robust Self-learning Method for Fully Unsupervised Cross-lingual Mappings of Word Embeddings[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 789-798.
[9]
Li N N, Zhai S F, Zhang Z F, et al. Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-Many Mappings[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3490-3496.
[10]
Cao H L, Zhao T J. Word Embedding Transformation for Robust Unsupervised Bilingual Lexicon Induction[OL]. arXiv Preprint, arXiv: 2105.12297.
[11]
Ni J, Florian R. Neural Cross-lingual Relation Extraction Based on Bilingual Word Embedding Mapping[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 399-409.
[12]
Artetxe M, Labaka G, Agirre E. Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-step Framework of Linear Transformations[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5012-5019.
[13]
Singh P, Lefever E. LT3 at SemEval-2020 Task 9:Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text[C]// Proceedings of the 14th Workshop on Semantic Evaluation. 2020: 1288-1293.
[14]
Marie B, Fujita A. Unsupervised Joint Training of Bilingual Word Embeddings[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics. 2019: 3224-3230.
[15]
Ri R, Tsuruoka Y. Revisiting the Context Window for Cross-lingual Word Embeddings[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 995-1005.
[16]
Zhang M Z, Fujinuma Y, Paul M J, et al. Why Overfitting isn’t Always Bad: Retrofitting Cross-lingual Word Embeddings to Dictionaries[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 2214-2220.
[17]
Barnes J, Klinger R, im Walde S S. Bilingual Sentiment Embeddings: Joint Projection of Sentiment across Languages[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 2483-2493.
[18]
Nishikawa S, Ri R, Tsuruoka Y. Data Augmentation with Unsupervised Machine Translation Improves the Structural Similarity of Cross-lingual Word Embeddings[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing:Student Research Workshop. 2021: 163-173.
[19]
Chen Z P, Shen S, Hu Z N, et al. Emoji-powered Representation Learning for Cross-lingual Sentiment Classification (Extended Abstract)[C]// Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2020: 4701-4705.
[20]
Dong X, Melo G. Cross-lingual Propagation for Deep Sentiment Analysis[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5771-5778.
[21]
Zhou X J, Wan X J, Xiao J G. Attention-based LSTM Network for Cross-lingual Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 246-256.
[22]
Lyu C Y, Foster J, Graham Y. Improving Document-level Sentiment Analysis with User and Product Context[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020: 6724-6729.
[23]
Chen X L, Sun Y, Athiwaratkun B, et al. Adversarial Deep Averaging Networks for Cross-lingual Sentiment Classification[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 557-570.
doi: 10.1162/tacl_a_00039
[24]
Esuli A, Sebastiani F. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining[C]// Proceedings of the 5th International Conference on Language Resources and Evaluation. 2006: 417-422.
[25]
Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[26]
Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM[C]// Proceedings of the 9th International Conference on Artificial Neural Networks. 1999: 850-855.
[27]
Wang W C, Feng S, Gao W, et al. Personalized Microblog Sentiment Classification via Adversarial Cross-lingual Multi-task Learning[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 338-348.
[28]
Zhou H W, Chen L, Shi F L, et al. Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). 2015: 430-440.