Extracting Drama Terms with GCN Long-distance Constrain
Ren Qiutong,Wang Hao(),Xiong Xin,Fan Tao
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This study proposes a new term extraction model for the intangible heritage (traditional drama), which also helps us construct a term database. [Methods] First, we analyzed the drama language characteristics from term category, semantic structure, and text length perspectives. Then, we added part of speech and domain features to the character representation obtained by the BERT-BiLSTM-CRF model. Finally, we incorporated the graph convolutional network (GCN) to the new model and captured the constraint relationship of the distant words. [Results] The F1 value of the proposed model reached 91.11%, which was 1.3 percentage points higher than the baseline BERT-BiLSTM-CRF model. [Limitations] We only retrieved the experimental data from Baidu Baike and the official website of Intangible Cultural Heritage, which should have included more free texts from other sources, more categories of drama terms, as well as the external features. [Conclusions] The proposed model and the database for traditional drama terms will help us construct the knowledge graph for traditional drama.
(Li Minglu. Research on Self-rescue Path of Intangible Cultural Heritage in Traditional Drama from the Perspective of Cultural Industry[J]. Journalism Communication, 2019(7): 35-37.)
[3]
冯鸾鸾. 面向特定科技领域的技术和术语识别方法研究[D]. 苏州: 苏州大学, 2020.
[3]
(Feng Luanluan. Research on Technology and Terminology Recognition Oriented Specific Science Domains[D]. Suzhou: Soochow University, 2020.)
(Wu Jun, Cheng Yao, Hao Han, et al. Automatic Extraction of Chinese Terminology Based on BERT Embedding and BiLSTM-CRF Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(4): 409-418.)
(Liu Liu, Wang Dongbo. A Review on Named Entity Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(3): 329-340.)
(Wang Jian, Yin Xu, Lü Xueqiang, et al. Method of Extracting Patent Domain Terms Based on Conditional Random Fields[J]. Computer Engineering and Design, 2019, 40(1): 279-284.)
(Ding Junjun, Zheng Yanning, Hua Bolin. Rule-based Attribute Extraction of Academic Concepts[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14, 33.)
[9]
Xie R, Liu Z, Jia J, et al. Representation Learning of Knowledge Graphs with Entity Descriptions [C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016.
[10]
Zheng D, Zhao T, Yang J. Research on Domain Term Extraction Based on Conditional Random Fields [C]//Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Berlin: Springer-Verlag, 2009: 290-296.
(Cen Yonghua, Han Zhe, Ji Peipei. Chinese Term Recognition Based on Hidden Markov Model[J]. New Technology of Library and Information Service, 2008(12): 54-58.)
[12]
陈睿. 基于深度学习的专业领域术语识别系统设计与实现[D]. 北京: 北京邮电大学, 2019.
[12]
(Chen Rui. Design and Implementation of Deep Learning Based Area Term Recognition System[D]. Beijing: Beijing University of Posts and Telecommunications, 2019.)
[13]
Zeng D, Sun C, Lin L, et al. LSTM-CRF for Drug-Named Entity Recognition[J]. Entropy, 2017, 19(6): 283.
doi: 10.3390/e19060283
(Li Minghao, Liu Zhong, Yao Yuanzhe. LSTM-CRF Based Symptom Term Recognition on Traditional Chinese Medical Case[J]. Journal of Computer Applications, 2018, 38(S2): 42-46.)
[15]
Lample G, Ballesteros M, Subramanian S, et al. Neural Architectures for Named Entity Recognition [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 260-270.
[16]
Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
(Cheng Bin, Shi Shuicai, Du Yuncheng, et al. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 101-108.)
(Chen Dexin, Zhan Yuanyuan, Yang Bing, et al. Research on Extraction of Online Medical Entities Based on Mixed Deep Learning Model[J]. Library and Information Service, 2019, 63(12): 105-113.)
[19]
Zhang Y, Yang J. Chinese NER Using Lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia:Association for Computational Linguistics, 2018: 1554-1564.
[20]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
(Liu Liu, Qin Tianyun, Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. Data Analysis and Knowledge Discovery, 2020, 4(12): 68-75.)
(Wang Ziniu, Jiang Meng, Gao Jianling, et al. Chinese Named Entity Recognition Method Based on BERT[J]. Computer Science, 2019, 46(11A): 138-142.)
[23]
ICOM/CIDOC Documentation Standards Group. Definition of the CIDOC Conceptual Reference Model [EB/OL].[2020-12-02]. https://cidoc-crm.org/sites/default/files/CIDOC%20CRM_v.7.0.1_%2018-10-2020.pdf.
(Zhang Jian’e. Chinese Text Keyword Extraction Method Based on Multi-Feature Fusion[J]. Information Studies: Theory & Application, 2013, 36(10): 105-108.)
(Wang Hao, Deng Sanhong, Su Xinning, et al. A Study on Chinese Terminology Recognition of Theory and Method from Information Science: Based on Deep Learning[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 817-828.)
(Yang Shuncheng, Li Yan, Zhao Qifeng. Stance Detection Method of Chinese Micro-Blog Based on GCN and Bi-LSTM[J]. Journal of Chongqing University of Technology (Natural Science), 2020, 34(6): 167-173.)
(Xu Bingbing, Cen Keting, Huang Junjie, et al. A Survey on Graph Convolutional Neural Network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780.)
[28]
Cetoli A, Bragaglia S, O'Harney A D, et al. Graph Convolutional Networks for Named Entity Recognition [C]//Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories. 2018: 37-45.
(Zhang Junlian, Zhang Yifan, Wang Mingquan, et al. Joint Extraction of Chinese Entity Relations Based on Graph Convolutional Neural Network[J/OL]. Computer Engineering. DOI: 10.19678/j.issn.1000-3428.0059574.)
doi: 10.19678/j.issn.1000-3428.0059574