Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (2/3): 167-183     https://doi.org/10.11925/infotech.2096-3467.2021.1020
  专辑 本期目录 | 过刊浏览 | 高级检索 |
融合迁移学习与文本增强的中文成语隐喻知识识别与关联研究*
张卫,王昊(),陈玥彤,范涛,邓三鸿
南京大学信息管理学院 南京 210023
江苏省数据工程与知识服务重点实验室 南京 210023
Identifying Metaphors and Association of Chinese Idioms with Transfer Learning and Text Augmentation
Zhang Wei,Wang Hao(),Chen Yuetong,Fan Tao,Deng Sanhong
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
全文: PDF (14336 KB)   HTML ( 10
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 识别中文成语情感隐喻,构建融合字面外在事物(源域知识)以及隐喻使用者内在态度或情感(目标域知识)的成语知识图谱。【方法】 提出一套融合迁移学习和文本增强的成语情感隐喻知识识别方案。首先,爬取成语及其外在事物类别获取外部知识;随后,迁移情感词典获取学习语料,将成语集和情感词典匹配到的成语用于第一轮迁移学习,将情感词典中除首轮测试集的所有情感词作为训练集进行第二轮迁移;引入汉语知识增强文本数据克服成语隐喻特征所导致的弱情感语义,对比BERT嵌入的[CLS]与平均池化方案,采取主流深度学习模型进行验证;归纳最优模型对未匹配的成语进行层次分类并将其与匹配到的合并获得成语内部知识。【结果】 平均池化的准确率相较[CLS]提升4.69个百分点,加入成语释义使准确率提升超过13个百分点;第二轮迁移的各级情感精度多在80%以上,原先语料规模较小的情感类别提升最高可达6.25个百分点。【局限】 受限于部分情感类别语料数量,分类精度有待提升。【结论】 本文方案能够有效识别成语的情感隐喻知识,内外知识的关联为成语知识服务打下了基础。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张卫
王昊
陈玥彤
范涛
邓三鸿
关键词 成语知识图谱隐喻知识迁移学习文本增强多层情感分类    
Abstract

[Objective] This paper tries to identify sentiment metaphors from Chinese idioms and build an idiom knowledge graph integrating external things (source) and users’ internal attitudes or sentiments (target). [Methods] We proposed a recognition scheme for metaphors of Chinese idioms based on transfer learning and text augmentation. First, we retrieved the idioms and their external categories to obtain the external knowledge and the learning corpus with the help of sentiment dictionary. Then, we matched idioms with the dictionary, which were used for the first round of transfer learning. All other sentiment words in the sentiment dictionary were the training set for the second round of transfer. Third, we introduced Chinese language knowledge to augment the texts with the weak sentiment semantics due to the metaphorical characteristics. Fourth, we compared the CLS of the BERT text embedding with the average pooling schemes using mainstream deep learning models. Finally, we hierarchically classified the un-matched idioms with the optimal model and merged them with the matched idioms to obtain internal knowledge. [Results] The average pooling accuracy was 4.69% higher than the [CLS], which was further improved by 13% by adding idiom interpretation. The sentiment accuracy at all levels of the second transfer reached 80%, and the highest improvement was up to 6.25% for small corpus. [Limitations] The classification accuracy of sentiment categories could be improved with larger corpus. [Conclusions] Our scheme can effectively identify the sentiment metaphor knowledge of Chinese idioms, and the association of internal and external knowledge lays the foundation for better knowledge services.

Key wordsIdiom Knowledge Graph    Metaphor Knowledge    Transfer Learning    Text Augmentation    Multi-Layer Sentiment Classification
收稿日期: 2021-09-11      出版日期: 2022-04-14
ZTFLH:  G202  
基金资助:*国家自然科学基金项目(72074108);江苏省研究生科研创新计划项目(KYCX21_0026);中央高校基本科研业务费项目的研究成果之一(010814370113)
通讯作者: 王昊,ORCID:0000-0002-0131-0823     E-mail: ywhaowang@nju.edu.cn
引用本文:   
张卫, 王昊, 陈玥彤, 范涛, 邓三鸿. 融合迁移学习与文本增强的中文成语隐喻知识识别与关联研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 167-183.
Zhang Wei, Wang Hao, Chen Yuetong, Fan Tao, Deng Sanhong. Identifying Metaphors and Association of Chinese Idioms with Transfer Learning and Text Augmentation. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 167-183.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1020      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I2/3/167
Fig.1  中文成语隐喻知识识别与关联框架
Fig.2  中文成语本体建模与内外关联应用模式
一级类 二级类 三级类 成语例词
快乐(PA) 春风得意 逍遥自得 无拘无束 逍遥自在
安心(PE) 安闲自得 乐天知命 独善其身 清风朗月
尊敬(PD) 永垂不朽 王公大人 非同寻常 至高无上
赞扬(PH) 百折不挠 义无反顾 建功立业 文质彬彬
相信(PG) 季布一诺 肝胆相照 山盟海誓 心领神会
喜爱(PB) 如痴如醉 手足之情 依依不舍 一往情深
祝愿(PK) 万寿无疆 千秋万岁 马到成功 鹏程万里
惊* 惊奇(PC) 惊天动地 光怪陆离 不期而遇 匪夷所思
悲伤(NB) 百业萧条 切肤之痛 逝者如斯 肝肠寸断
失望(NJ) 心灰意懒 付之东流 一蹶不振 心若死灰
内疚(NH) 一念之差 引咎自责 后悔莫及 负荆请罪
思念(PF) 睹物思人 牵肠挂肚 白云亲舍 望穿秋水
慌乱(NI) 不知所措 失魂落魄 心神不定 燃眉之急
恐惧(NC) 战战兢兢 惶恐不安 危在旦夕 不寒而栗
羞愧(NG) 无地自处 面红耳赤 羞面见人 狼狈万状
怒* 愤怒(NA) 怒发冲冠 愤愤不平 气势汹汹 怒不可遏
烦闷(NE) 百无聊赖 忧心忡忡 辗转反侧 怅然若失
憎恶(ND) 阿谀谄媚 乌合之众 欺世盗名 不屑一顾
贬责(NN) 目光短浅 争名夺利 不可一世 穷奢极欲
妒忌(NK) 妒火中烧 爱毛反裘 拈酸吃醋 避面尹邢
怀疑(NL) 迟疑不决 莫测高深 众口纷纭 弓影杯蛇
Table 1  成语情感隐喻识别的迁移学习语料
Fig.3  基于中文语言知识的成语/情感词语文本数据增强
Fig.4  基于深度学习的中文成语情感隐喻识别模型
模型 Acc/% Macro_P/% Macro_R/% Macro_F1/% 正(4 513) 负(3 568)
P/% R /% F1/% P/% R/% F1 /%
CNN [CLS] 73.07 73.07 72.93 72.96 73.01 69.47 71.20 73.12 76.38 74.72
CNN_[AVG] 77.76 78.10 77.49 77.55 80.33 70.95 75.35 75.87 84.03 79.74
CNN_AS 80.84 81.00 80.65 80.72 82.46 76.21 79.21 79.55 85.09 82.23
CNN_AE 90.02 90.08 90.13 90.01 87.15 92.84 89.91 93.00 87.42 90.12
CNN_AES 90.42 90.79 90.23 90.35 93.78 85.68 89.55 87.80 94.77 91.15
RNN_AES 89.56 89.75 89.42 89.51 91.60 86.11 88.77 87.89 92.74 90.25
LSTM_AES 90.67 90.78 90.57 90.63 92.08 88.11 90.05 89.48 93.03 91.22
LSTM_AES 90.87 90.85 90.87 90.86 90.18 90.84 90.51 91.52 90.90 91.21
BiLSTM_AES 91.43 91.41 91.42 91.41 90.97 91.16 91.06 91.85 91.67 91.76
BiLSTM_Att_AES 91.73 91.70 91.74 91.72 90.77 92.11 91.43 92.64 91.38 92.01
Table 2  基于文本数据增强的成语情感隐喻识别结果
Fig.5  基于迁移学习的成语情感层次分类整体性能
层次 父类 子类 ST1 ST2
P /% R /% F1/% P/% R/% F1 /%
第一层
(9 923/27 377)
90.77 92.11 91.43 91.82 90.95 91.38
92.64 91.38 92.01 91.75 92.55 92.14
第二层
(4 755/13 257)
92.23 95.72 93.94 91.49 97.48 94.39
82.35 77.78 80.00 78.95 83.33 81.08
70.37 55.47 62.04 79.76 48.91 60.63

(5 168/14 120)
72.19 63.87 67.78 77.02 64.92 70.45
67.44 60.42 63.74 73.68 58.33 65.12
77.78 29.17 42.42 84.62 45.83 59.46
85.92 91.53 88.63 86.30 93.61 89.81
第三层
(690/1 955)
安心 75.68 57.14 65.12 82.50 67.35 74.16
快乐 79.00 89.77 84.04 83.51 92.05 87.57

(3 972/11 074)
喜爱 69.23 31.03 42.86 63.16 41.38 50.00
相信 75.00 9.68 17.14 40.00 6.45 11.11
赞扬 86.34 98.94 92.21 87.03 97.87 92.13
祝愿 25.00 14.29 18.18 100.00 14.29 25.00
尊敬 100.00 8.11 15.00 71.43 13.51 22.73
惊(93/228) 惊奇 82.35 77.78 80.00 78.95 83.33 81.08

(960/2 307)
悲伤 77.86 90.83 83.85 80.45 89.17 84.58
内疚 75.00 37.50 50.00 80.00 50.00 61.54
失望 64.29 42.86 51.43 63.89 54.76 58.97
思念 89.47 80.95 85.00 94.12 76.19 84.21

(485/1 177)
慌乱 71.88 69.70 70.77 80.65 75.76 78.12
恐惧 81.67 84.48 83.05 86.67 89.66 88.14
羞愧 100.00 80.00 88.89 100.00 100.00 100.00
怒(121/387) 愤怒 77.78 29.17 42.42 84.62 45.83 59.46

(3 602/10 249)
贬责 81.34 96.23 88.16 83.60 94.25 88.61
烦闷 66.67 53.57 59.41 75.76 44.64 56.18
怀疑 50.00 14.29 22.22 40.00 28.57 33.33
妒忌 0.00 0.00 0.00 100.00 40.00 57.14
憎恶 62.50 10.31 17.70 50.00 27.84 35.76
Table 3  基于迁移学习的成语情感层次分类具体性能
Fig.6  无标签成语情感隐喻的预测结果
Fig.7  融合内外知识的中文成语知识图谱
Fig.8  以内在情感映射外在事物(查询关键词“思念”)
Fig.9  以外在事物隐喻内在情感(查询关键词“云”)
Fig.10  融合内外知识的人文知识服务模式(查询关键词“云”和“思念”)
[1] Wang Y. A Contrastive Research on the Definitions and Categories of Chinese and English Idioms[C]// Proceedings of the 7th International Conference on Humanities and Social Science Research. 2021: 423-426.
[2] Espinal M T, Mateu J. Idioms and Phraseology[J/OL]. Oxford Research Encyclopedia of Linguistics, https://doi.org/10.1093/acrefore/9780199384655.013.51.
[3] van den Heever C M. Idioms in Biblical Hebrew: Towards Their Identification and Classification with Special Reference to 1 and 2 Samuel[D]. Stellenbosch: Stellenbosch University, 2013.
[4] 宁佐权. 自然现象类汉语成语的文化意蕴[J]. 湖北大学学报(哲学社会科学版), 2017, 44(3):136-141.
[4] ( Ning Zuoquan. The Cultural Implications of Natural Phenomena Chinese Idioms[J]. Journal of Hubei University (Philosophy and Social Sciences), 2017, 44(3):136-141.)
[5] Liu P F, Qian K Y, Qiu X P, et al. Idiom-Aware Compositional Distributed Semantics[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1204-1213.
[6] Kiran R, Kumar P, Bhasker B. OSLCFit (Organic Simultaneous LSTM and CNN Fit): A Novel Deep Learning Based Solution for Sentiment Polarity Classification of Reviews[J]. Expert Systems with Applications, 2020, 157:113488.
doi: 10.1016/j.eswa.2020.113488
[7] Kuczok M. The Interplay of Metaphor and Metonymy in Christian Symbols[J]. Metaphor and Symbol, 2020, 35(4):236-249.
doi: 10.1080/10926488.2020.1809313
[8] Briskilal J, Subalalitha C N. Classification of Idiomatic Sentences Using AWD-LSTM[A]//Expert Clouds and Applications[M]. Springer, 2022: 113-124.
[9] Salton G, Ross R, Kelleher J. Idiom Token Classification Using Sentential Distributed Semantics[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 194-204.
[10] Robo L. A Diachronic and Source Approach of Phraseological Units-Theories of Definition, Criteria and Structure Analysis in English and Albanian Language[J]. Academic Journal of Interdisciplinary Studies, 2013, 2(9):589-596.
[11] Cook P, Fazly A, Stevenson S. Pulling Their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context[C]// Proceedings of the Workshop on a Broader Perspective on Multiword Expressions. 2007: 41-48.
[12] Haagsma H, Bos J, Nissim M. MAGPIE: A Large Corpus of Potentially Idiomatic Expressions[C]// Proceedings of the 12th Language Resources and Evaluation Conference. 2020: 279-287.
[13] Mundra S, Mannarswamy S, Sinha M, et al. Embedding Learning of Figurative Phrases for Emotion Classification in Micro-Blog Texts[C]// Proceedings of the 4th ACM IKDD Conferences on Data Sciences. 2017: 1-9.
[14] 黄秋实. 简谈成语的分类[J]. 科技创新导报, 2013, 10(4):178.
[14] ( Huang Qiushi. A Brief Discussion on the Classification of Idioms[J]. Science and Technology Innovation Herald, 2013, 10(4):178.)
[15] 丁艳. 植物成语的文化解读与教学建议[J]. 语文建设, 2018(9):63-66.
[15] ( Ding Yan. Cultural Interpretation of Plant Idioms and Teaching Suggestions[J]. Language Planning, 2018(9):63-66.)
[16] 赵梅艳. 从成语中的动物形象看中西方文化差异[J]. 中华文化论坛, 2016, 5(6):80-84.
[16] ( Zhao Meiyan. Chinese and Western Cultural Differences from the Image of Animals in Idioms[J]. Forum on Chinese Culture, 2016, 5(6):80-84.)
[17] 傅义春. 《红楼梦》中成语三探[J]. 明清小说研究, 2018(2):37-57.
[17] ( Fu Yichun. Three Explorations of Idioms in a Dream of the Red Chamber[J]. Journal of Ming-Qing Fiction Studies, 2018(2):37-57.)
[18] Blanke T, Bryant M, Hedges M. Understanding Memories of the Holocaust—A New Approach to Neural Networks in the Digital Humanities[J]. Digital Scholarship in the Humanities, 2020, 35(1):17-33.
doi: 10.1093/llc/fqy082
[19] 张卫, 王昊, 邓三鸿, 等. 面向数字人文的古诗文本情感术语抽取与应用研究[J]. 中国图书馆学报, 2021, 47(4):113-131.
[19] ( Zhang Wei, Wang Hao, Deng Sanhong, et al. Sentiment Term Extraction and Application of Chinese Ancient Poetry Text for Digital Humanities[J]. Journal of Library Science in China, 2021, 47(4):113-131.)
[20] Ibrahim H S, Abdou S M, Gheith M. Idioms-Proverbs Lexicon for Modern Standard Arabic and Colloquial Sentiment Analysis[OL]. arXiv Preprint, arXiv:1506.01906.
[21] Sánchez B P, Pinto D. Idiom Polarity Identification Using Contextual Information[J]. Computación y Sistemas, 2018, 22(1):27-33.
[22] Williams L, Bannister C, Arribas-Ayllon M, et al. The Role of Idioms in Sentiment Analysis[J]. Expert Systems with Applications, 2015, 42(21):7375-7385.
doi: 10.1016/j.eswa.2015.05.039
[23] Spasić I, Williams L, Buerki A. Idiom-Based Features in Sentiment Analysis: Cutting the Gordian Knot[J]. IEEE Transactions on Affective Computing, 2017, 11(2):189-199.
doi: 10.1109/TAFFC.2017.2777842
[24] 王昊, 王密平, 苏新宁. 面向本体学习的中文专利术语抽取研究[J]. 情报学报, 2016, 35(6):573-585.
[24] ( Wang Hao, Wang Miping, Su Xinning. A Study on Chinese Patent Terms Extraction for Ontology Learning[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6):573-585.)
[25] Wang L, Yu S W. Construction of Chinese Idiom Knowledge-Base and Its Applications[C]// Proceedings of the 2010 Workshop on Multiword Expressions: From Theory to Applications. 2010: 11-18.
[26] Abdi A, Shamsuddin S M, Hasan S, et al. Deep Learning-Based Sentiment Classification of Evaluative Text Based on Multi-Feature Fusion[J]. Information Processing & Management, 2019, 56(4):1245-1259.
doi: 10.1016/j.ipm.2019.02.018
[27] Seo S, Kim C, Kim H, et al. Comparative Study of Deep Learning-Based Sentiment Classification[J]. IEEE Access, 2020, 8:6861-6875.
doi: 10.1109/ACCESS.2019.2963426
[28] Sharma M, Kandasamy I, Vasantha W B. Comparison of Neutrosophic Approach to Various Deep Learning Models for Sentiment Analysis[J]. Knowledge-Based Systems, 2021, 223:107058.
doi: 10.1016/j.knosys.2021.107058
[29] Smetanin S, Komarov M. Deep Transfer Learning Baselines for Sentiment Analysis in Russian[J]. Information Processing & Management, 2021, 58(3):102484.
doi: 10.1016/j.ipm.2020.102484
[30] 胡家珩, 岑咏华, 吴承尧. 基于深度学习的领域情感词典自动构建——以金融领域为例[J]. 数据分析与知识发现, 2018, 2(10):95-102.
[30] ( Hu Jiaheng, Cen Yonghua, Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. Data Analysis and Knowledge Discovery, 2018, 2(10):95-102.)
[31] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
doi: 10.1145/3065386
[32] Xie Z A, Wang S I, Li J W, et al. Data Noising as Smoothing in Neural Network Language Models[OL]. arXiv Preprint, arXiv:1703.02573.
[33] Zhang X, Zhao J B, LeCun Y. Character-Level Convolutional Networks for Text Classification[OL]. arXiv Preprint, arXiv: 1509.01626.
[34] Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[OL]. arXiv Preprint, arXiv:1901.11196.
[35] Yu A W, Dohan D, Luong M T, et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension[OL]. arXiv Preprint, arXiv:1804.09541.
[36] 杨经, 方应谦. 基于语言知识的手写汉语文本自动识别初探[J]. 计算机研究与发展, 1998, 35(7):93-97.
[36] ( Yang Jing, Fang Yingqian. Automatic Recognition of Handwritten Chinese Text Based on Linguistic Knowledge[J]. Journal of Computer Research and Development, 1998, 35(7):93-97.)
[37] Li S X, Shi W X, Wang J C, et al. A Deep Learning-Based Approach to Constructing a Domain Sentiment Lexicon: A Case Study in Financial Distress Prediction[J]. Information Processing & Management, 2021, 58(5):102673.
doi: 10.1016/j.ipm.2021.102673
[38] Ekman P. An Argument for Basic Emotions[J]. Cognition & Emotion, 1992, 6(3-4):169-200.
[39] Wang P, Xu B, Xu J M, et al. Semantic Expansion Using Word Embedding Clustering and Convolutional Neural Network for Improving Short Text Classification[J]. Neurocomputing, 2016, 174:806-814.
doi: 10.1016/j.neucom.2015.09.096
[40] Liu G, Guo J B. Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification[J]. Neurocomputing, 2019, 337:325-338.
doi: 10.1016/j.neucom.2019.01.078
[41] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[42] Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese Bert-Networks[OL]. arXiv Preprint, arXiv:1908.10084.
[1] 易新河, 杨鹏, 文益民. 中文招聘文档中专业技能词抽取的跨域迁移学习*[J]. 数据分析与知识发现, 2022, 6(2/3): 274-288.
[2] 周泽聿, 王昊, 张小琴, 范涛, 任秋彤. 基于Xception-TD的中华传统刺绣分类模型构建*[J]. 数据分析与知识发现, 2022, 6(2/3): 338-347.
[3] 陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[4] 赵平,孙连英,涂帅,卞建玲,万莹. 改进的知识迁移景点实体识别算法研究及应用*[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[5] 刘彤,倪维健,孙宇健,曾庆田. 基于深度迁移学习的业务流程实例剩余执行时间预测方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 134-142.
[6] 向菲,谢耀谈. 基于混合采样与迁移学习的患者评论识别模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 39-47.
[7] 王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[8] 陈美杉,夏晨曦. 肝癌患者在线提问的命名实体识别研究:一种基于迁移学习的方法 *[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[9] 伍杰华, 沈静, 周蓓. 基于迁移成分分析的多层社交网络链接分类*[J]. 数据分析与知识发现, 2018, 2(9): 88-99.
[10] 张志武. 跨领域迁移学习产品评论情感分析[J]. 现代图书情报技术, 2013, (6): 49-54.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn