Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 167-183    DOI: 10.11925/infotech.2096-3467.2021.1020
Current Issue | Archive | Adv Search |
Identifying Metaphors and Association of Chinese Idioms with Transfer Learning and Text Augmentation
Zhang Wei,Wang Hao(),Chen Yuetong,Fan Tao,Deng Sanhong
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
Download: PDF (14336 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to identify sentiment metaphors from Chinese idioms and build an idiom knowledge graph integrating external things (source) and users’ internal attitudes or sentiments (target). [Methods] We proposed a recognition scheme for metaphors of Chinese idioms based on transfer learning and text augmentation. First, we retrieved the idioms and their external categories to obtain the external knowledge and the learning corpus with the help of sentiment dictionary. Then, we matched idioms with the dictionary, which were used for the first round of transfer learning. All other sentiment words in the sentiment dictionary were the training set for the second round of transfer. Third, we introduced Chinese language knowledge to augment the texts with the weak sentiment semantics due to the metaphorical characteristics. Fourth, we compared the CLS of the BERT text embedding with the average pooling schemes using mainstream deep learning models. Finally, we hierarchically classified the un-matched idioms with the optimal model and merged them with the matched idioms to obtain internal knowledge. [Results] The average pooling accuracy was 4.69% higher than the [CLS], which was further improved by 13% by adding idiom interpretation. The sentiment accuracy at all levels of the second transfer reached 80%, and the highest improvement was up to 6.25% for small corpus. [Limitations] The classification accuracy of sentiment categories could be improved with larger corpus. [Conclusions] Our scheme can effectively identify the sentiment metaphor knowledge of Chinese idioms, and the association of internal and external knowledge lays the foundation for better knowledge services.

Key wordsIdiom Knowledge Graph      Metaphor Knowledge      Transfer Learning      Text Augmentation      Multi-Layer Sentiment Classification     
Received: 11 September 2021      Published: 14 April 2022
ZTFLH:  G202  
Fund:National Natural Science Foundation of China(72074108);Graduate Research and Innovation Projects of Jiangsu Province(KYCX21_0026);Fundamental Research Funds for the Central Universities(010814370113)
Corresponding Authors: Wang Hao,ORCID:0000-0002-0131-0823     E-mail: ywhaowang@nju.edu.cn

Cite this article:

Zhang Wei, Wang Hao, Chen Yuetong, Fan Tao, Deng Sanhong. Identifying Metaphors and Association of Chinese Idioms with Transfer Learning and Text Augmentation. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 167-183.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.1020     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/167

A Framework for Metaphor Knowledge Recognition and Association of Chinese Idioms
Chinese Idiom Ontology Modelling and Application Patterns of Internal and External Associations
一级类 二级类 三级类 成语例词
快乐(PA) 春风得意 逍遥自得 无拘无束 逍遥自在
安心(PE) 安闲自得 乐天知命 独善其身 清风朗月
尊敬(PD) 永垂不朽 王公大人 非同寻常 至高无上
赞扬(PH) 百折不挠 义无反顾 建功立业 文质彬彬
相信(PG) 季布一诺 肝胆相照 山盟海誓 心领神会
喜爱(PB) 如痴如醉 手足之情 依依不舍 一往情深
祝愿(PK) 万寿无疆 千秋万岁 马到成功 鹏程万里
惊* 惊奇(PC) 惊天动地 光怪陆离 不期而遇 匪夷所思
悲伤(NB) 百业萧条 切肤之痛 逝者如斯 肝肠寸断
失望(NJ) 心灰意懒 付之东流 一蹶不振 心若死灰
内疚(NH) 一念之差 引咎自责 后悔莫及 负荆请罪
思念(PF) 睹物思人 牵肠挂肚 白云亲舍 望穿秋水
慌乱(NI) 不知所措 失魂落魄 心神不定 燃眉之急
恐惧(NC) 战战兢兢 惶恐不安 危在旦夕 不寒而栗
羞愧(NG) 无地自处 面红耳赤 羞面见人 狼狈万状
怒* 愤怒(NA) 怒发冲冠 愤愤不平 气势汹汹 怒不可遏
烦闷(NE) 百无聊赖 忧心忡忡 辗转反侧 怅然若失
憎恶(ND) 阿谀谄媚 乌合之众 欺世盗名 不屑一顾
贬责(NN) 目光短浅 争名夺利 不可一世 穷奢极欲
妒忌(NK) 妒火中烧 爱毛反裘 拈酸吃醋 避面尹邢
怀疑(NL) 迟疑不决 莫测高深 众口纷纭 弓影杯蛇
Transfer Learning Corpus of Idiom Sentiment Metaphor Recognition
Text Data Augmentation of Idioms/Sentiment Words Based on Chinese Language Knowledge
A Deep Learning-Based Sentiment Metaphor Recognition Model for Chinese Idioms
模型 Acc/% Macro_P/% Macro_R/% Macro_F1/% 正(4 513) 负(3 568)
P/% R /% F1/% P/% R/% F1 /%
CNN [CLS] 73.07 73.07 72.93 72.96 73.01 69.47 71.20 73.12 76.38 74.72
CNN_[AVG] 77.76 78.10 77.49 77.55 80.33 70.95 75.35 75.87 84.03 79.74
CNN_AS 80.84 81.00 80.65 80.72 82.46 76.21 79.21 79.55 85.09 82.23
CNN_AE 90.02 90.08 90.13 90.01 87.15 92.84 89.91 93.00 87.42 90.12
CNN_AES 90.42 90.79 90.23 90.35 93.78 85.68 89.55 87.80 94.77 91.15
RNN_AES 89.56 89.75 89.42 89.51 91.60 86.11 88.77 87.89 92.74 90.25
LSTM_AES 90.67 90.78 90.57 90.63 92.08 88.11 90.05 89.48 93.03 91.22
LSTM_AES 90.87 90.85 90.87 90.86 90.18 90.84 90.51 91.52 90.90 91.21
BiLSTM_AES 91.43 91.41 91.42 91.41 90.97 91.16 91.06 91.85 91.67 91.76
BiLSTM_Att_AES 91.73 91.70 91.74 91.72 90.77 92.11 91.43 92.64 91.38 92.01
Results of Idiom Sentiment Metaphor Recognition Based on Text Data Augmentation
Performance of Transfer Learning-Based Hierarchical Sentiment Classification of Idioms
层次 父类 子类 ST1 ST2
P /% R /% F1/% P/% R/% F1 /%
第一层
(9 923/27 377)
90.77 92.11 91.43 91.82 90.95 91.38
92.64 91.38 92.01 91.75 92.55 92.14
第二层
(4 755/13 257)
92.23 95.72 93.94 91.49 97.48 94.39
82.35 77.78 80.00 78.95 83.33 81.08
70.37 55.47 62.04 79.76 48.91 60.63

(5 168/14 120)
72.19 63.87 67.78 77.02 64.92 70.45
67.44 60.42 63.74 73.68 58.33 65.12
77.78 29.17 42.42 84.62 45.83 59.46
85.92 91.53 88.63 86.30 93.61 89.81
第三层
(690/1 955)
安心 75.68 57.14 65.12 82.50 67.35 74.16
快乐 79.00 89.77 84.04 83.51 92.05 87.57

(3 972/11 074)
喜爱 69.23 31.03 42.86 63.16 41.38 50.00
相信 75.00 9.68 17.14 40.00 6.45 11.11
赞扬 86.34 98.94 92.21 87.03 97.87 92.13
祝愿 25.00 14.29 18.18 100.00 14.29 25.00
尊敬 100.00 8.11 15.00 71.43 13.51 22.73
惊(93/228) 惊奇 82.35 77.78 80.00 78.95 83.33 81.08

(960/2 307)
悲伤 77.86 90.83 83.85 80.45 89.17 84.58
内疚 75.00 37.50 50.00 80.00 50.00 61.54
失望 64.29 42.86 51.43 63.89 54.76 58.97
思念 89.47 80.95 85.00 94.12 76.19 84.21

(485/1 177)
慌乱 71.88 69.70 70.77 80.65 75.76 78.12
恐惧 81.67 84.48 83.05 86.67 89.66 88.14
羞愧 100.00 80.00 88.89 100.00 100.00 100.00
怒(121/387) 愤怒 77.78 29.17 42.42 84.62 45.83 59.46

(3 602/10 249)
贬责 81.34 96.23 88.16 83.60 94.25 88.61
烦闷 66.67 53.57 59.41 75.76 44.64 56.18
怀疑 50.00 14.29 22.22 40.00 28.57 33.33
妒忌 0.00 0.00 0.00 100.00 40.00 57.14
憎恶 62.50 10.31 17.70 50.00 27.84 35.76
Specific Performance of Transfer Learning-Based Hierarchical Sentiment Classification of Idioms
Prediction Results of Sentiment Metaphor of Unlabeled Idioms
A Knowledge Graph of Chinese Idioms with Internal and External Features
Detecting External Things by Internal Sentiments (Query Keyword “Miss”)
Detecting Internal Sentiments by External Things (Query Keyword “Cloud”)
Humanistic Knowledge Service by Internal and External Knowledge (Keyword “Cloud” and “Miss”)
[1] Wang Y. A Contrastive Research on the Definitions and Categories of Chinese and English Idioms[C]// Proceedings of the 7th International Conference on Humanities and Social Science Research. 2021: 423-426.
[2] Espinal M T, Mateu J. Idioms and Phraseology[J/OL]. Oxford Research Encyclopedia of Linguistics, https://doi.org/10.1093/acrefore/9780199384655.013.51.
[3] van den Heever C M. Idioms in Biblical Hebrew: Towards Their Identification and Classification with Special Reference to 1 and 2 Samuel[D]. Stellenbosch: Stellenbosch University, 2013.
[4] 宁佐权. 自然现象类汉语成语的文化意蕴[J]. 湖北大学学报(哲学社会科学版), 2017, 44(3):136-141.
[4] ( Ning Zuoquan. The Cultural Implications of Natural Phenomena Chinese Idioms[J]. Journal of Hubei University (Philosophy and Social Sciences), 2017, 44(3):136-141.)
[5] Liu P F, Qian K Y, Qiu X P, et al. Idiom-Aware Compositional Distributed Semantics[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1204-1213.
[6] Kiran R, Kumar P, Bhasker B. OSLCFit (Organic Simultaneous LSTM and CNN Fit): A Novel Deep Learning Based Solution for Sentiment Polarity Classification of Reviews[J]. Expert Systems with Applications, 2020, 157:113488.
doi: 10.1016/j.eswa.2020.113488
[7] Kuczok M. The Interplay of Metaphor and Metonymy in Christian Symbols[J]. Metaphor and Symbol, 2020, 35(4):236-249.
doi: 10.1080/10926488.2020.1809313
[8] Briskilal J, Subalalitha C N. Classification of Idiomatic Sentences Using AWD-LSTM[A]//Expert Clouds and Applications[M]. Springer, 2022: 113-124.
[9] Salton G, Ross R, Kelleher J. Idiom Token Classification Using Sentential Distributed Semantics[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 194-204.
[10] Robo L. A Diachronic and Source Approach of Phraseological Units-Theories of Definition, Criteria and Structure Analysis in English and Albanian Language[J]. Academic Journal of Interdisciplinary Studies, 2013, 2(9):589-596.
[11] Cook P, Fazly A, Stevenson S. Pulling Their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context[C]// Proceedings of the Workshop on a Broader Perspective on Multiword Expressions. 2007: 41-48.
[12] Haagsma H, Bos J, Nissim M. MAGPIE: A Large Corpus of Potentially Idiomatic Expressions[C]// Proceedings of the 12th Language Resources and Evaluation Conference. 2020: 279-287.
[13] Mundra S, Mannarswamy S, Sinha M, et al. Embedding Learning of Figurative Phrases for Emotion Classification in Micro-Blog Texts[C]// Proceedings of the 4th ACM IKDD Conferences on Data Sciences. 2017: 1-9.
[14] 黄秋实. 简谈成语的分类[J]. 科技创新导报, 2013, 10(4):178.
[14] ( Huang Qiushi. A Brief Discussion on the Classification of Idioms[J]. Science and Technology Innovation Herald, 2013, 10(4):178.)
[15] 丁艳. 植物成语的文化解读与教学建议[J]. 语文建设, 2018(9):63-66.
[15] ( Ding Yan. Cultural Interpretation of Plant Idioms and Teaching Suggestions[J]. Language Planning, 2018(9):63-66.)
[16] 赵梅艳. 从成语中的动物形象看中西方文化差异[J]. 中华文化论坛, 2016, 5(6):80-84.
[16] ( Zhao Meiyan. Chinese and Western Cultural Differences from the Image of Animals in Idioms[J]. Forum on Chinese Culture, 2016, 5(6):80-84.)
[17] 傅义春. 《红楼梦》中成语三探[J]. 明清小说研究, 2018(2):37-57.
[17] ( Fu Yichun. Three Explorations of Idioms in a Dream of the Red Chamber[J]. Journal of Ming-Qing Fiction Studies, 2018(2):37-57.)
[18] Blanke T, Bryant M, Hedges M. Understanding Memories of the Holocaust—A New Approach to Neural Networks in the Digital Humanities[J]. Digital Scholarship in the Humanities, 2020, 35(1):17-33.
doi: 10.1093/llc/fqy082
[19] 张卫, 王昊, 邓三鸿, 等. 面向数字人文的古诗文本情感术语抽取与应用研究[J]. 中国图书馆学报, 2021, 47(4):113-131.
[19] ( Zhang Wei, Wang Hao, Deng Sanhong, et al. Sentiment Term Extraction and Application of Chinese Ancient Poetry Text for Digital Humanities[J]. Journal of Library Science in China, 2021, 47(4):113-131.)
[20] Ibrahim H S, Abdou S M, Gheith M. Idioms-Proverbs Lexicon for Modern Standard Arabic and Colloquial Sentiment Analysis[OL]. arXiv Preprint, arXiv:1506.01906.
[21] Sánchez B P, Pinto D. Idiom Polarity Identification Using Contextual Information[J]. Computación y Sistemas, 2018, 22(1):27-33.
[22] Williams L, Bannister C, Arribas-Ayllon M, et al. The Role of Idioms in Sentiment Analysis[J]. Expert Systems with Applications, 2015, 42(21):7375-7385.
doi: 10.1016/j.eswa.2015.05.039
[23] Spasić I, Williams L, Buerki A. Idiom-Based Features in Sentiment Analysis: Cutting the Gordian Knot[J]. IEEE Transactions on Affective Computing, 2017, 11(2):189-199.
doi: 10.1109/TAFFC.2017.2777842
[24] 王昊, 王密平, 苏新宁. 面向本体学习的中文专利术语抽取研究[J]. 情报学报, 2016, 35(6):573-585.
[24] ( Wang Hao, Wang Miping, Su Xinning. A Study on Chinese Patent Terms Extraction for Ontology Learning[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6):573-585.)
[25] Wang L, Yu S W. Construction of Chinese Idiom Knowledge-Base and Its Applications[C]// Proceedings of the 2010 Workshop on Multiword Expressions: From Theory to Applications. 2010: 11-18.
[26] Abdi A, Shamsuddin S M, Hasan S, et al. Deep Learning-Based Sentiment Classification of Evaluative Text Based on Multi-Feature Fusion[J]. Information Processing & Management, 2019, 56(4):1245-1259.
doi: 10.1016/j.ipm.2019.02.018
[27] Seo S, Kim C, Kim H, et al. Comparative Study of Deep Learning-Based Sentiment Classification[J]. IEEE Access, 2020, 8:6861-6875.
doi: 10.1109/ACCESS.2019.2963426
[28] Sharma M, Kandasamy I, Vasantha W B. Comparison of Neutrosophic Approach to Various Deep Learning Models for Sentiment Analysis[J]. Knowledge-Based Systems, 2021, 223:107058.
doi: 10.1016/j.knosys.2021.107058
[29] Smetanin S, Komarov M. Deep Transfer Learning Baselines for Sentiment Analysis in Russian[J]. Information Processing & Management, 2021, 58(3):102484.
doi: 10.1016/j.ipm.2020.102484
[30] 胡家珩, 岑咏华, 吴承尧. 基于深度学习的领域情感词典自动构建——以金融领域为例[J]. 数据分析与知识发现, 2018, 2(10):95-102.
[30] ( Hu Jiaheng, Cen Yonghua, Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. Data Analysis and Knowledge Discovery, 2018, 2(10):95-102.)
[31] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
doi: 10.1145/3065386
[32] Xie Z A, Wang S I, Li J W, et al. Data Noising as Smoothing in Neural Network Language Models[OL]. arXiv Preprint, arXiv:1703.02573.
[33] Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[OL]. arXiv Preprint, arXiv: 1509.01626.
[34] Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[OL]. arXiv Preprint, arXiv:1901.11196.
[35] Yu A W, Dohan D, Luong M T, et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension[OL]. arXiv Preprint, arXiv:1804.09541.
[36] 杨经, 方应谦. 基于语言知识的手写汉语文本自动识别初探[J]. 计算机研究与发展, 1998, 35(7):93-97.
[36] ( Yang Jing, Fang Yingqian. Automatic Recognition of Handwritten Chinese Text Based on Linguistic Knowledge[J]. Journal of Computer Research and Development, 1998, 35(7):93-97.)
[37] Li S X, Shi W X, Wang J C, et al. A Deep Learning-Based Approach to Constructing a Domain Sentiment Lexicon: A Case Study in Financial Distress Prediction[J]. Information Processing & Management, 2021, 58(5):102673.
doi: 10.1016/j.ipm.2021.102673
[38] Ekman P. An Argument for Basic Emotions[J]. Cognition & Emotion, 1992, 6(3-4):169-200.
[39] Wang P, Xu B, Xu J M, et al. Semantic Expansion Using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification[J]. Neurocomputing, 2016, 174:806-814.
doi: 10.1016/j.neucom.2015.09.096
[40] Liu G, Guo J B. Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification[J]. Neurocomputing, 2019, 337:325-338.
doi: 10.1016/j.neucom.2019.01.078
[41] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[42] Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese Bert-Networks[OL]. arXiv Preprint, arXiv:1908.10084.
[1] Yi Xinhe, Yang Peng, Wen Yimin. Cross-domain Transfer Learning for Recognizing Professional Skills from Chinese Job Postings[J]. 数据分析与知识发现, 2022, 6(2/3): 274-288.
[2] Zhou Zeyu, Wang Hao, Zhang Xiaoqin, Tao Fao, Ren Qiutong. Classification Model for Chinese Traditional Embroidery Based on Xception-TD[J]. 数据分析与知识发现, 2022, 6(2/3): 338-347.
[3] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[4] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[5] Liu Tong,Ni Weijian,Sun Yujian,Zeng Qingtian. Predicting Remaining Business Time with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(2/3): 134-142.
[6] Xiang Fei,Xie Yaotan. Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning[J]. 数据分析与知识发现, 2020, 4(2/3): 39-47.
[7] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[8] Meishan Chen,Chenxi Xia. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[9] Wu Jiehua,Shen Jing,Zhou Bei. Classifying Multilayer Social Network Links Based on Transfer Component Analysis[J]. 数据分析与知识发现, 2018, 2(9): 88-99.
[10] Yu Chuanming,Feng Bolin,An Lu. Sentiment Analysis in Cross-Domain Environment with Deep Representative Learning[J]. 数据分析与知识发现, 2017, 1(7): 73-81.
[11] Zhang Zhiwu. Sentiment Analysis of Product Reviews by means of Cross-domain Transfer Learning[J]. 现代图书情报技术, 2013, (6): 49-54.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn