|
|
Extracting Drama Terms with GCN Long-distance Constrain |
Ren Qiutong,Wang Hao(),Xiong Xin,Fan Tao |
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] This study proposes a new term extraction model for the intangible heritage (traditional drama), which also helps us construct a term database. [Methods] First, we analyzed the drama language characteristics from term category, semantic structure, and text length perspectives. Then, we added part of speech and domain features to the character representation obtained by the BERT-BiLSTM-CRF model. Finally, we incorporated the graph convolutional network (GCN) to the new model and captured the constraint relationship of the distant words. [Results] The F1 value of the proposed model reached 91.11%, which was 1.3 percentage points higher than the baseline BERT-BiLSTM-CRF model. [Limitations] We only retrieved the experimental data from Baidu Baike and the official website of Intangible Cultural Heritage, which should have included more free texts from other sources, more categories of drama terms, as well as the external features. [Conclusions] The proposed model and the database for traditional drama terms will help us construct the knowledge graph for traditional drama.
|
Received: 13 April 2021
Published: 29 June 2021
|
|
Fund:National Natural Science Foundation of China(72074108);Nanjing University “Special Funds for Fundamental Funds for Fundamental Scientific Research of Central Universities” Project(010814370113);Jiangsu Youth Social Science Talent Training Program |
Corresponding Authors:
Wang Hao,ORCID:0000-0002-0131-0823
E-mail: ywhaowang@nju.edu.cn
|
[1] |
孟令法. 中国文化遗产保护政策的历史演进[J]. 遗产, 2019(1): 111-135, 320.
|
[1] |
(Meng Lingfa. The Historical Evolution of China’s Cultural Heritage Protection Policy[J]. Heritage, 2019(1): 111-135,320.)
|
[2] |
李明潞. 文化产业视角下传统戏剧类非遗自救路径探究[J]. 新闻传播, 2019(7): 35-37.
|
[2] |
(Li Minglu. Research on Self-rescue Path of Intangible Cultural Heritage in Traditional Drama from the Perspective of Cultural Industry[J]. Journalism Communication, 2019(7): 35-37.)
|
[3] |
冯鸾鸾. 面向特定科技领域的技术和术语识别方法研究[D]. 苏州: 苏州大学, 2020.
|
[3] |
(Feng Luanluan. Research on Technology and Terminology Recognition Oriented Specific Science Domains[D]. Suzhou: Soochow University, 2020.)
|
[4] |
吴俊, 程垚, 郝瀚, 等. 基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究[J]. 情报学报, 2020, 39(4): 409-418.
|
[4] |
(Wu Jun, Cheng Yao, Hao Han, et al. Automatic Extraction of Chinese Terminology Based on BERT Embedding and BiLSTM-CRF Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(4): 409-418.)
|
[5] |
谢腾, 杨俊安, 刘辉. 基于BERT-BiLSTM-CRF模型的中文实体识别[J]. 计算机系统应用, 2020, 29(7): 48-55.
|
[5] |
(Xie Teng, Yang Junan, Liu Hui. Chinese Entity Recognition Based on BERT-BiLSTM-CRF Model[J]. Computer System & Applications, 2020, 29(7): 48-55.)
|
[6] |
刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340.
|
[6] |
(Liu Liu, Wang Dongbo. A Review on Named Entity Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(3): 329-340.)
|
[7] |
王健, 殷旭, 吕学强, 等. 基于CRFs的专利文献领域术语抽取方法[J]. 计算机工程与设计, 2019, 40(1): 279-284.
|
[7] |
(Wang Jian, Yin Xu, Lü Xueqiang, et al. Method of Extracting Patent Domain Terms Based on Conditional Random Fields[J]. Computer Engineering and Design, 2019, 40(1): 279-284.)
|
[8] |
丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14, 33.
|
[8] |
(Ding Junjun, Zheng Yanning, Hua Bolin. Rule-based Attribute Extraction of Academic Concepts[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14, 33.)
|
[9] |
Xie R, Liu Z, Jia J, et al. Representation Learning of Knowledge Graphs with Entity Descriptions [C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016.
|
[10] |
Zheng D, Zhao T, Yang J. Research on Domain Term Extraction Based on Conditional Random Fields [C]//Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Berlin: Springer-Verlag, 2009: 290-296.
|
[11] |
岑咏华, 韩哲, 季培培. 基于隐马尔科夫模型的中文术语识别研究[J]. 现代图书情报技术, 2008(12): 54-58.
|
[11] |
(Cen Yonghua, Han Zhe, Ji Peipei. Chinese Term Recognition Based on Hidden Markov Model[J]. New Technology of Library and Information Service, 2008(12): 54-58.)
|
[12] |
陈睿. 基于深度学习的专业领域术语识别系统设计与实现[D]. 北京: 北京邮电大学, 2019.
|
[12] |
(Chen Rui. Design and Implementation of Deep Learning Based Area Term Recognition System[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
|
[13] |
Zeng D, Sun C, Lin L, et al. LSTM-CRF for Drug-Named Entity Recognition[J]. Entropy, 2017, 19(6): 283.
doi: 10.3390/e19060283
|
[14] |
李明浩, 刘忠, 姚远哲. 基于LSTM-CRF的中医医案症状术语识别[J]. 计算机应用, 2018, 38(S2): 42-46.
|
[14] |
(Li Minghao, Liu Zhong, Yao Yuanzhe. LSTM-CRF Based Symptom Term Recognition on Traditional Chinese Medical Case[J]. Journal of Computer Applications, 2018, 38(S2): 42-46.)
|
[15] |
Lample G, Ballesteros M, Subramanian S, et al. Neural Architectures for Named Entity Recognition [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 260-270.
|
[16] |
Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
|
[17] |
成彬, 施水才, 都云程, 等. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
|
[17] |
(Cheng Bin, Shi Shuicai, Du Yuncheng, et al. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 101-108.)
|
[18] |
陈德鑫, 占袁圆, 杨兵, 等. 基于CNN-BiLSTM模型的在线医疗实体抽取研究[J]. 图书情报工作, 2019, 63(12): 105-113.
|
[18] |
(Chen Dexin, Zhan Yuanyuan, Yang Bing, et al. Research on Extraction of Online Medical Entities Based on Mixed Deep Learning Model[J]. Library and Information Service, 2019, 63(12): 105-113.)
|
[19] |
Zhang Y, Yang J. Chinese NER Using Lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia:Association for Computational Linguistics, 2018: 1554-1564.
|
[20] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
|
[21] |
刘浏, 秦天允, 王东波. 非物质文化遗产传统音乐术语自动抽取[J]. 数据分析与知识发现, 2020, 4(12): 68-75.
|
[21] |
(Liu Liu, Qin Tianyun, Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. Data Analysis and Knowledge Discovery, 2020, 4(12): 68-75.)
|
[22] |
王子牛, 姜猛, 高建瓴, 等. 基于BERT的中文命名实体识别方法[J]. 计算机科学, 2019, 46(11A): 138-142.
|
[22] |
(Wang Ziniu, Jiang Meng, Gao Jianling, et al. Chinese Named Entity Recognition Method Based on BERT[J]. Computer Science, 2019, 46(11A): 138-142.)
|
[23] |
ICOM/CIDOC Documentation Standards Group. Definition of the CIDOC Conceptual Reference Model [EB/OL].[2020-12-02]. https://cidoc-crm.org/sites/default/files/CIDOC%20CRM_v.7.0.1_%2018-10-2020.pdf.
|
[24] |
张建娥. 基于多特征融合的中文文本关键词提取方法[J]. 情报理论与实践, 2013, 36(10): 105-108.
|
[24] |
(Zhang Jian’e. Chinese Text Keyword Extraction Method Based on Multi-Feature Fusion[J]. Information Studies: Theory & Application, 2013, 36(10): 105-108.)
|
[25] |
王昊, 邓三鸿, 苏新宁, 等. 基于深度学习的情报学理论及方法术语识别研究[J]. 情报学报, 2020, 39(8): 817-828.
|
[25] |
(Wang Hao, Deng Sanhong, Su Xinning, et al. A Study on Chinese Terminology Recognition of Theory and Method from Information Science: Based on Deep Learning[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 817-828.)
|
[26] |
杨顺成, 李彦, 赵其峰. 基于GCN和Bi-LSTM的微博立场检测方法[J]. 重庆理工大学学报(自然科学), 2020, 34(6): 167-173.
|
[26] |
(Yang Shuncheng, Li Yan, Zhao Qifeng. Stance Detection Method of Chinese Micro-Blog Based on GCN and Bi-LSTM[J]. Journal of Chongqing University of Technology (Natural Science), 2020, 34(6): 167-173.)
|
[27] |
徐冰冰, 岑科廷, 黄俊杰, 等. 图卷积神经网络综述[J]. 计算机学报, 2020, 43(5): 755-780.
|
[27] |
(Xu Bingbing, Cen Keting, Huang Junjie, et al. A Survey on Graph Convolutional Neural Network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780.)
|
[28] |
Cetoli A, Bragaglia S, O'Harney A D, et al. Graph Convolutional Networks for Named Entity Recognition [C]//Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories. 2018: 37-45.
|
[29] |
张军莲, 张一帆, 汪鸣泉, 等. 基于图卷积神经网络的中文实体关系联合抽取[J/OL]. 计算机工程. DOI: 10.19678/j.issn.1000-3428.0059574.
doi: 10.19678/j.issn.1000-3428.0059574
|
[29] |
(Zhang Junlian, Zhang Yifan, Wang Mingquan, et al. Joint Extraction of Chinese Entity Relations Based on Graph Convolutional Neural Network[J/OL]. Computer Engineering. DOI: 10.19678/j.issn.1000-3428.0059574.)
doi: 10.19678/j.issn.1000-3428.0059574
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|