Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (7): 141-151     https://doi.org/10.11925/infotech.2096-3467.2021.1462
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于对抗网络的文本对齐跨语言情感分类方法*
杨文丽,李娜娜()
河北工业大学人工智能与数据科学学院 天津 300401
A Text-Aligned Cross-Language Sentiment Classification Method Based on Adversarial Networks
Yang Wenli,Li Nana()
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
全文: PDF (1854 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

目的】通过拉近共享空间中双语文本对的分布以提高跨语言情感分类的准确率。【方法】在情感知识迁移过程中,通过调整平衡系数同时对齐词对和文本对,并联合语言判别器生成对抗网络优化转换矩阵。此外,采用一种多特征融合的分层神经网络方法表示文本,同时兼顾单词和句子的上下文主题关联性,能够有效解决文本长距离特征依赖问题。【结果】在NLP&CC 2013标准数据集上的实验结果证明,所提方法的平均跨语言情感分类准确率达到83.66%,比基准模型平均提高2.30个百分点。【局限】 只在中英文数据集上进行实验,在其他语言组合中的有效性需要进一步验证。【结论】通过提高双语文本相似度的方式能够有效提高跨语言情感分类的准确率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
杨文丽
李娜娜
关键词 词对齐文本对齐生成对抗网络多特征融合分层神经网络    
Abstract

[Objective] The paper tries to improve the accuracy of cross-language sentiment classification by narrowing the distribution of bilingual text pairs in the shared space. [Methods] In the process of emotional knowledge transfer, we aligned the word and text pairs simultaneously by adjusting the balance coefficient. Then, we combined the language discriminator to generate the conversion matrix for adversarial network optimization. Finally, we used a multi-feature fusion hierarchical neural network to represent the texts, the contexts, as well as the topic relevance of words and sentences, which addressed the issue of long-distance feature dependence of the texts. [Results] We examined our model on the NLP&CC 2013 standard data sets and the average cross-language sentiment classification accuracy was 83.66%, which was 2.30% higher than the benchmark model. [Limitations] This method was only tested with Chinese and English datasets. More research is needed to evaluate its effectiveness with other languages. [Conclusions] Improving the similarity of bilingual texts could effectively increase the accuracy of cross-language sentiment classification.

Key wordsWord Alignment    Text Alignment    Generative Adversarial Network    Multi-Feature Fusion    Hierarchical Neural Network
收稿日期: 2021-12-28      出版日期: 2022-08-24
ZTFLH:  TP391  
基金资助:*国家自然科学青年基金项目的研究成果之一(61806072)
通讯作者: 李娜娜,ORCID:0000-0002-5517-6033     E-mail: linana@scse.hebut.edu.cn
引用本文:   
杨文丽, 李娜娜. 基于对抗网络的文本对齐跨语言情感分类方法*[J]. 数据分析与知识发现, 2022, 6(7): 141-151.
Yang Wenli, Li Nana. A Text-Aligned Cross-Language Sentiment Classification Method Based on Adversarial Networks. Data Analysis and Knowledge Discovery, 2022, 6(7): 141-151.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1462      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I7/141
Fig.1  跨语言情感分类模型
Fig.2  特征提取器
Fig.3  向量空间
Fig.4  双语向量空间
Fig.5  对抗模型结构
序号 评论实例
1 This is a hot movie recently.
2 The water in the hotel is too hot and burns people.
Table 1  评论实例
Fig.6  双语情感词典
数据集 DVD Book Music
训练集 英文 35 000 41 000 41 000
中文 30 000 35 000 40 000
测试集 中文 500 500 500
Table 2  数据集
Fig.7  准确率随相似度变化趋势
Fig.8  文本比例变化分类结果
Fig.9  优化前后的结果变化
数据集 指标 SD SWD SWAB SWC SS-BiDocv
DVD Acc/% 75.22 76.00 76.39 80.97 83.52
Pre/% 74.15 74.89 76.60 76.54 82.02
Rec/% 77.12 78.22 76.00 76.54 82.00
Book Acc/% 76.47 78.25 80.93 78.26 84.26
Pre/% 74.68 75.33 89.58 79.28 83.62
Rec/% 80.10 84.00 70.00 76.52 85.18
Music Acc/% 75.25 78.31 77.39 79.26 83.20
Pre/% 76.34 77.25 75.00 80.96 81.50
Rec/% 73.96 80.26 82.18 76.52 85.90
Table 3  特征融合分类结果
Fig.10  不同阈值下分类准确率变化趋势
方法 分类准确率/%
DVD Book Music Average
MT(En-Ch) 70.32 75.40 74.26 73.33
MT(Ch-En) 76.25 76.00 73.21 75.15
SCL-CLSC 82.60 82.90 78.95 81.48
CLWEs 82.92 83.00 81.13 82.35
BLSE 79.36 77.95 81.20 79.50
AttLSTM-CLSC 81.22 82.50 80.66 81.46
ACNN-AMT 80.58 81.85 81.22 81.22
BSWE 81.60 81.05 79.40 80.68
本文方法 83.52 84.26 83.20 83.66
Table 4  不同跨语言情感分类模型分类结果
[1] Kornai A. Formal Phonology[M]. London: Routledge, 2018.
[2] Barnes J, Lambert P, Badia T. Exploring Distributional Representations and Machine Translation for Aspect-based Cross-lingual Sentiment Classification[C]// Proceedings of the 26th International Conference on Computational Linguistics. 2016: 1613-1623.
[3] Fei H L, Li P. Cross-Lingual Unsupervised Sentiment Classification with Multi-view Transfer Learning[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 5759-5771.
[4] Otani N, Ozaki S, Zhao X Y, et al. Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020: 4451-4464.
[5] Ormazabal A, Artetxe M, Labaka G, et al. Analyzing the Limitations of Cross-lingual Word Embedding Mappings[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 4990-4995.
[6] Wang H Z, Henderson J, Merlo P. Weakly-supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4418-4429.
[7] Liu X B, Wong D F, Liu Y, et al. Shared-private Bilingual Word Embeddings for Neural Machine Translation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 3613-3622.
[8] Artetxe M, Labaka G, Agirre E. A Robust Self-learning Method for Fully Unsupervised Cross-lingual Mappings of Word Embeddings[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 789-798.
[9] Li N N, Zhai S F, Zhang Z F, et al. Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-Many Mappings[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017: 3490-3496.
[10] Cao H L, Zhao T J. Word Embedding Transformation for Robust Unsupervised Bilingual Lexicon Induction[OL]. arXiv Preprint, arXiv: 2105.12297.
[11] Ni J, Florian R. Neural Cross-lingual Relation Extraction Based on Bilingual Word Embedding Mapping[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 399-409.
[12] Artetxe M, Labaka G, Agirre E. Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-step Framework of Linear Transformations[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5012-5019.
[13] Singh P, Lefever E. LT3 at SemEval-2020 Task 9:Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text[C]// Proceedings of the 14th Workshop on Semantic Evaluation. 2020: 1288-1293.
[14] Marie B, Fujita A. Unsupervised Joint Training of Bilingual Word Embeddings[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics. 2019: 3224-3230.
[15] Ri R, Tsuruoka Y. Revisiting the Context Window for Cross-lingual Word Embeddings[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 995-1005.
[16] Zhang M Z, Fujinuma Y, Paul M J, et al. Why Overfitting isn’t Always Bad: Retrofitting Cross-lingual Word Embeddings to Dictionaries[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 2214-2220.
[17] Barnes J, Klinger R, im Walde S S. Bilingual Sentiment Embeddings: Joint Projection of Sentiment across Languages[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 2483-2493.
[18] Nishikawa S, Ri R, Tsuruoka Y. Data Augmentation with Unsupervised Machine Translation Improves the Structural Similarity of Cross-lingual Word Embeddings[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing:Student Research Workshop. 2021: 163-173.
[19] Chen Z P, Shen S, Hu Z N, et al. Emoji-powered Representation Learning for Cross-lingual Sentiment Classification (Extended Abstract)[C]// Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2020: 4701-4705.
[20] Dong X, Melo G. Cross-lingual Propagation for Deep Sentiment Analysis[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5771-5778.
[21] Zhou X J, Wan X J, Xiao J G. Attention-based LSTM Network for Cross-lingual Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 246-256.
[22] Lyu C Y, Foster J, Graham Y. Improving Document-level Sentiment Analysis with User and Product Context[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020: 6724-6729.
[23] Chen X L, Sun Y, Athiwaratkun B, et al. Adversarial Deep Averaging Networks for Cross-lingual Sentiment Classification[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 557-570.
doi: 10.1162/tacl_a_00039
[24] Esuli A, Sebastiani F. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining[C]// Proceedings of the 5th International Conference on Language Resources and Evaluation. 2006: 417-422.
[25] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[26] Gers F A, Schmidhuber J, Cummins F. Learning to Forget: Continual Prediction with LSTM[C]// Proceedings of the 9th International Conference on Artificial Neural Networks. 1999: 850-855.
[27] Wang W C, Feng S, Gao W, et al. Personalized Microblog Sentiment Classification via Adversarial Cross-lingual Multi-task Learning[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 338-348.
[28] Zhou H W, Chen L, Shi F L, et al. Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). 2015: 430-440.
[1] 纪有书, 王东波, 黄水清. 基于词对齐的古汉语同义词自动抽取研究*——以前四史典籍为例[J]. 数据分析与知识发现, 2021, 5(11): 135-144.
[2] 余传明, 龚雨田, 赵晓莉, 安璐. 基于多特征融合的金融领域科研合作推荐研究*[J]. 数据分析与知识发现, 2017, 1(8): 39-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn