School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper tries to identify sentiment metaphors from Chinese idioms and build an idiom knowledge graph integrating external things (source) and users’ internal attitudes or sentiments (target). [Methods] We proposed a recognition scheme for metaphors of Chinese idioms based on transfer learning and text augmentation. First, we retrieved the idioms and their external categories to obtain the external knowledge and the learning corpus with the help of sentiment dictionary. Then, we matched idioms with the dictionary, which were used for the first round of transfer learning. All other sentiment words in the sentiment dictionary were the training set for the second round of transfer. Third, we introduced Chinese language knowledge to augment the texts with the weak sentiment semantics due to the metaphorical characteristics. Fourth, we compared the CLS of the BERT text embedding with the average pooling schemes using mainstream deep learning models. Finally, we hierarchically classified the un-matched idioms with the optimal model and merged them with the matched idioms to obtain internal knowledge. [Results] The average pooling accuracy was 4.69% higher than the [CLS], which was further improved by 13% by adding idiom interpretation. The sentiment accuracy at all levels of the second transfer reached 80%, and the highest improvement was up to 6.25% for small corpus. [Limitations] The classification accuracy of sentiment categories could be improved with larger corpus. [Conclusions] Our scheme can effectively identify the sentiment metaphor knowledge of Chinese idioms, and the association of internal and external knowledge lays the foundation for better knowledge services.
张卫, 王昊, 陈玥彤, 范涛, 邓三鸿. 融合迁移学习与文本增强的中文成语隐喻知识识别与关联研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 167-183.
Zhang Wei, Wang Hao, Chen Yuetong, Fan Tao, Deng Sanhong. Identifying Metaphors and Association of Chinese Idioms with Transfer Learning and Text Augmentation. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 167-183.
Wang Y. A Contrastive Research on the Definitions and Categories of Chinese and English Idioms[C]// Proceedings of the 7th International Conference on Humanities and Social Science Research. 2021: 423-426.
[2]
Espinal M T, Mateu J. Idioms and Phraseology[J/OL]. Oxford Research Encyclopedia of Linguistics, https://doi.org/10.1093/acrefore/9780199384655.013.51.
[3]
van den Heever C M. Idioms in Biblical Hebrew: Towards Their Identification and Classification with Special Reference to 1 and 2 Samuel[D]. Stellenbosch: Stellenbosch University, 2013.
( Ning Zuoquan. The Cultural Implications of Natural Phenomena Chinese Idioms[J]. Journal of Hubei University (Philosophy and Social Sciences), 2017, 44(3):136-141.)
[5]
Liu P F, Qian K Y, Qiu X P, et al. Idiom-Aware Compositional Distributed Semantics[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1204-1213.
[6]
Kiran R, Kumar P, Bhasker B. OSLCFit (Organic Simultaneous LSTM and CNN Fit): A Novel Deep Learning Based Solution for Sentiment Polarity Classification of Reviews[J]. Expert Systems with Applications, 2020, 157:113488.
doi: 10.1016/j.eswa.2020.113488
[7]
Kuczok M. The Interplay of Metaphor and Metonymy in Christian Symbols[J]. Metaphor and Symbol, 2020, 35(4):236-249.
doi: 10.1080/10926488.2020.1809313
[8]
Briskilal J, Subalalitha C N. Classification of Idiomatic Sentences Using AWD-LSTM[A]//Expert Clouds and Applications[M]. Springer, 2022: 113-124.
[9]
Salton G, Ross R, Kelleher J. Idiom Token Classification Using Sentential Distributed Semantics[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 194-204.
[10]
Robo L. A Diachronic and Source Approach of Phraseological Units-Theories of Definition, Criteria and Structure Analysis in English and Albanian Language[J]. Academic Journal of Interdisciplinary Studies, 2013, 2(9):589-596.
[11]
Cook P, Fazly A, Stevenson S. Pulling Their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context[C]// Proceedings of the Workshop on a Broader Perspective on Multiword Expressions. 2007: 41-48.
[12]
Haagsma H, Bos J, Nissim M. MAGPIE: A Large Corpus of Potentially Idiomatic Expressions[C]// Proceedings of the 12th Language Resources and Evaluation Conference. 2020: 279-287.
[13]
Mundra S, Mannarswamy S, Sinha M, et al. Embedding Learning of Figurative Phrases for Emotion Classification in Micro-Blog Texts[C]// Proceedings of the 4th ACM IKDD Conferences on Data Sciences. 2017: 1-9.
[14]
黄秋实. 简谈成语的分类[J]. 科技创新导报, 2013, 10(4):178.
[14]
( Huang Qiushi. A Brief Discussion on the Classification of Idioms[J]. Science and Technology Innovation Herald, 2013, 10(4):178.)
[15]
丁艳. 植物成语的文化解读与教学建议[J]. 语文建设, 2018(9):63-66.
[15]
( Ding Yan. Cultural Interpretation of Plant Idioms and Teaching Suggestions[J]. Language Planning, 2018(9):63-66.)
( Zhao Meiyan. Chinese and Western Cultural Differences from the Image of Animals in Idioms[J]. Forum on Chinese Culture, 2016, 5(6):80-84.)
[17]
傅义春. 《红楼梦》中成语三探[J]. 明清小说研究, 2018(2):37-57.
[17]
( Fu Yichun. Three Explorations of Idioms in a Dream of the Red Chamber[J]. Journal of Ming-Qing Fiction Studies, 2018(2):37-57.)
[18]
Blanke T, Bryant M, Hedges M. Understanding Memories of the Holocaust—A New Approach to Neural Networks in the Digital Humanities[J]. Digital Scholarship in the Humanities, 2020, 35(1):17-33.
doi: 10.1093/llc/fqy082
( Zhang Wei, Wang Hao, Deng Sanhong, et al. Sentiment Term Extraction and Application of Chinese Ancient Poetry Text for Digital Humanities[J]. Journal of Library Science in China, 2021, 47(4):113-131.)
[20]
Ibrahim H S, Abdou S M, Gheith M. Idioms-Proverbs Lexicon for Modern Standard Arabic and Colloquial Sentiment Analysis[OL]. arXiv Preprint, arXiv:1506.01906.
[21]
Sánchez B P, Pinto D. Idiom Polarity Identification Using Contextual Information[J]. Computación y Sistemas, 2018, 22(1):27-33.
[22]
Williams L, Bannister C, Arribas-Ayllon M, et al. The Role of Idioms in Sentiment Analysis[J]. Expert Systems with Applications, 2015, 42(21):7375-7385.
doi: 10.1016/j.eswa.2015.05.039
[23]
Spasić I, Williams L, Buerki A. Idiom-Based Features in Sentiment Analysis: Cutting the Gordian Knot[J]. IEEE Transactions on Affective Computing, 2017, 11(2):189-199.
doi: 10.1109/TAFFC.2017.2777842
( Wang Hao, Wang Miping, Su Xinning. A Study on Chinese Patent Terms Extraction for Ontology Learning[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6):573-585.)
[25]
Wang L, Yu S W. Construction of Chinese Idiom Knowledge-Base and Its Applications[C]// Proceedings of the 2010 Workshop on Multiword Expressions: From Theory to Applications. 2010: 11-18.
[26]
Abdi A, Shamsuddin S M, Hasan S, et al. Deep Learning-Based Sentiment Classification of Evaluative Text Based on Multi-Feature Fusion[J]. Information Processing & Management, 2019, 56(4):1245-1259.
doi: 10.1016/j.ipm.2019.02.018
[27]
Seo S, Kim C, Kim H, et al. Comparative Study of Deep Learning-Based Sentiment Classification[J]. IEEE Access, 2020, 8:6861-6875.
doi: 10.1109/ACCESS.2019.2963426
[28]
Sharma M, Kandasamy I, Vasantha W B. Comparison of Neutrosophic Approach to Various Deep Learning Models for Sentiment Analysis[J]. Knowledge-Based Systems, 2021, 223:107058.
doi: 10.1016/j.knosys.2021.107058
[29]
Smetanin S, Komarov M. Deep Transfer Learning Baselines for Sentiment Analysis in Russian[J]. Information Processing & Management, 2021, 58(3):102484.
doi: 10.1016/j.ipm.2020.102484
( Hu Jiaheng, Cen Yonghua, Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. Data Analysis and Knowledge Discovery, 2018, 2(10):95-102.)
[31]
Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
doi: 10.1145/3065386
[32]
Xie Z A, Wang S I, Li J W, et al. Data Noising as Smoothing in Neural Network Language Models[OL]. arXiv Preprint, arXiv:1703.02573.
[33]
Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[OL]. arXiv Preprint, arXiv: 1509.01626.
[34]
Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[OL]. arXiv Preprint, arXiv:1901.11196.
[35]
Yu A W, Dohan D, Luong M T, et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension[OL]. arXiv Preprint, arXiv:1804.09541.
( Yang Jing, Fang Yingqian. Automatic Recognition of Handwritten Chinese Text Based on Linguistic Knowledge[J]. Journal of Computer Research and Development, 1998, 35(7):93-97.)
[37]
Li S X, Shi W X, Wang J C, et al. A Deep Learning-Based Approach to Constructing a Domain Sentiment Lexicon: A Case Study in Financial Distress Prediction[J]. Information Processing & Management, 2021, 58(5):102673.
doi: 10.1016/j.ipm.2021.102673
[38]
Ekman P. An Argument for Basic Emotions[J]. Cognition & Emotion, 1992, 6(3-4):169-200.
[39]
Wang P, Xu B, Xu J M, et al. Semantic Expansion Using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification[J]. Neurocomputing, 2016, 174:806-814.
doi: 10.1016/j.neucom.2015.09.096
[40]
Liu G, Guo J B. Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification[J]. Neurocomputing, 2019, 337:325-338.
doi: 10.1016/j.neucom.2019.01.078
[41]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[42]
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese Bert-Networks[OL]. arXiv Preprint, arXiv:1908.10084.