1Computer School, Beijing Information Science & Technology University, Beijing 100185, China 2Beijing TRS Information Technology Co., Ltd., Beijing 100101, China
[Objective] Utilizing the advantages of the CRF model to solve the problem of sequence labeling, by incorporating part-of-speech information and the CRF model into the BiLSTM network, automatic extraction of journal keywords is realized. [Methods] The keyword extraction problem is considered as a sequence labeling problem. Pre-processing word segmentation and part-of-speech tagging of journal text; vectorizing the pre-processed text using the Word2Vec model for Word Embedding to obtain vector expressions of words; using BiLSTM-CRF model for automatic keyword extraction. [Results] Using the part-of-speech and BiLSTM-CRF network to perform experiments on the collected China National Knowledge Infrastructure text, the accuracy on Simple Word is improved by 3% compared to the original BiLSTM model. On Complex Word, the accuracy is improved by 12%. [Limitations] The journal keyword extraction model cannot accurately extract complex keywords. In future work, it is necessary to further remind the model of the performance of complex keywords. [Conclusions] Compared with the traditional method, the BiLSTM-CRF model with part-of-speech integration has higher recognition accuracy and is an effective keyword extraction method.
成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
Cheng Bin,Shi Shuicai,Du Yuncheng,Xiao Shibin. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model. Data Analysis and Knowledge Discovery, 2021, 5(3): 101-108.
Zhang K, Xu H, Tang J, et al. Keyword Extraction Using Support Vector Machine[C]// Proceedings of the 7th International Conference on Advances in Web-Age Information Management, Hong Kong,China. Springer-Verlag, 2006: 85-96.
[2]
Al-Saleh A B, Menai M E B. Automatic Arabic Text Summarization: A Survey[J]. Artificial Intelligence Review, 2015,45(2):203-234.
[3]
Hulth A, Karlgren J, Jonsson A, et al. Automatic Keyword Extraction Using Domain Knowledge[C]// Proceedings of the 2nd International Conference on Computational Linguistics and Intelligent Text Processing. Springer, 2001: 472-482.
[4]
Marujo L, Wang L, Trancoso I, et al. Automatic Keyword Extraction on Twitter[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. ACL, 2015: 637-643.
[5]
Gollapalli S D, Li X L, Yang P. Incorporating Expert Knowledge into Keyphrase Extraction[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI, 2017: 3180-3187.
[6]
Li S J, Wang H F, Yu S W, et al. News-Oriented Automatic Chinese Keyword Indexing[C]// Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. IEEE, 2003: 92-97.
[7]
Wang H F, Li S J, Yu S W, et al. A Combining Approach to Automatic Keyphrases Indexing for Chinese News Documents[C]// Proceedings of the 5th International Conference on Intelligent Text Processing and Computational Linguistics,. Springer, 2004: 441-444.
[8]
Rumelhart D E, Hinton G E, Williams R J. Learning Representations by Back-propagating Errors[J]. Nature, 1986,323(6088):533-536.
[9]
Medelyan O, Witten I H. Thesaurus-Based Index Term Extraction for Agricultural Documents[C]// Proceedings of the 6th Agricultural Ontology Service Workshop at EFITA/WCCA. IEEE, 2005: 1122-1129.
[10]
Peter T. Learning to Extract Keyphrases from Text[R]. National Research Council, 2002.
[11]
Kim S N, Kan M Y. Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles[C]// Proceedings of the 2009 ACL-IJCNLP Workshop on Multiword Espesssions.USA:ACL, 2009: 9-16.
[12]
Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508. 01991.
( Wang Xuwen, Li Jiao, Wu Yingjie, et al. BiLSTM-CRF-Based Open Concept Relation Extraction from Chinese Biomedical Texts[J]. Chinese Journal of Medical Library and Information Science, 2018,27(11):33-39.)
[14]
Chen Y, Zhou C J, Li T X, et al. Named Entity Recognition from Chinese Adverse Drug Event Reports with Lexical Feature Based BiLSTM-CRF and Tri-training[J]. Journal of Biomedical Informatics, 2019,96:103252.
doi: 10.1016/j.jbi.2019.103252
pmid: 31323311
( Cheng Bo, Li Weihong, Tong Haoxin. Chinese Address Segmentation Based on BiLSTM-CRF[J]. Journal of Geo-Information Science, 2019,21(8):1143-1151.)
doi: 10.12082/dqxxkx.2019.180654
[16]
Alzaidy R, Caragea C, Giles C L. Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents[C]// Proceedings of the 2019 World Wide Web Conference. 2019: 2551-2557.
( Ning Jianfei, Liu Jiangzhen. Using Word2Vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6):20-27.)
[19]
Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042
pmid: 16112549
( Chen Wei, Wu Youzheng, Chen Wenliang, et al. Automatic Keyword Extraction Based on BiLSTM-CRF[J]. Computer Science, 2018,45(6A):91-96,113.)
[21]
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. EMNPL, 2004: 404-411.
[22]
Danesh S, Sumner T, Martin J H. SGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction[C]// Proceedings of the 4th Joint Conference on Lexical and Computational Semantics. ACL, 2015: 117-126.
[23]
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// Proceedings of the 23rd National Conference on Artificial IntelligenceAAAI, 2008: 855-860.