|
|
Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model |
Cheng Bin1(),Shi Shuicai1,2,Du Yuncheng1,2,Xiao Shibin1,2 |
1Computer School, Beijing Information Science & Technology University, Beijing 100185, China 2Beijing TRS Information Technology Co., Ltd., Beijing 100101, China |
|
|
Abstract [Objective] Utilizing the advantages of the CRF model to solve the problem of sequence labeling, by incorporating part-of-speech information and the CRF model into the BiLSTM network, automatic extraction of journal keywords is realized. [Methods] The keyword extraction problem is considered as a sequence labeling problem. Pre-processing word segmentation and part-of-speech tagging of journal text; vectorizing the pre-processed text using the Word2Vec model for Word Embedding to obtain vector expressions of words; using BiLSTM-CRF model for automatic keyword extraction. [Results] Using the part-of-speech and BiLSTM-CRF network to perform experiments on the collected China National Knowledge Infrastructure text, the accuracy on Simple Word is improved by 3% compared to the original BiLSTM model. On Complex Word, the accuracy is improved by 12%. [Limitations] The journal keyword extraction model cannot accurately extract complex keywords. In future work, it is necessary to further remind the model of the performance of complex keywords. [Conclusions] Compared with the traditional method, the BiLSTM-CRF model with part-of-speech integration has higher recognition accuracy and is an effective keyword extraction method.
|
Received: 06 December 2019
Published: 11 November 2020
|
|
Corresponding Authors:
Cheng Bin
E-mail: 1842729609@qq.com
|
[1] |
Zhang K, Xu H, Tang J, et al. Keyword Extraction Using Support Vector Machine[C]// Proceedings of the 7th International Conference on Advances in Web-Age Information Management, Hong Kong,China. Springer-Verlag, 2006: 85-96.
|
[2] |
Al-Saleh A B, Menai M E B. Automatic Arabic Text Summarization: A Survey[J]. Artificial Intelligence Review, 2015,45(2):203-234.
|
[3] |
Hulth A, Karlgren J, Jonsson A, et al. Automatic Keyword Extraction Using Domain Knowledge[C]// Proceedings of the 2nd International Conference on Computational Linguistics and Intelligent Text Processing. Springer, 2001: 472-482.
|
[4] |
Marujo L, Wang L, Trancoso I, et al. Automatic Keyword Extraction on Twitter[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. ACL, 2015: 637-643.
|
[5] |
Gollapalli S D, Li X L, Yang P. Incorporating Expert Knowledge into Keyphrase Extraction[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI, 2017: 3180-3187.
|
[6] |
Li S J, Wang H F, Yu S W, et al. News-Oriented Automatic Chinese Keyword Indexing[C]// Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. IEEE, 2003: 92-97.
|
[7] |
Wang H F, Li S J, Yu S W, et al. A Combining Approach to Automatic Keyphrases Indexing for Chinese News Documents[C]// Proceedings of the 5th International Conference on Intelligent Text Processing and Computational Linguistics,. Springer, 2004: 441-444.
|
[8] |
Rumelhart D E, Hinton G E, Williams R J. Learning Representations by Back-propagating Errors[J]. Nature, 1986,323(6088):533-536.
|
[9] |
Medelyan O, Witten I H. Thesaurus-Based Index Term Extraction for Agricultural Documents[C]// Proceedings of the 6th Agricultural Ontology Service Workshop at EFITA/WCCA. IEEE, 2005: 1122-1129.
|
[10] |
Peter T. Learning to Extract Keyphrases from Text[R]. National Research Council, 2002.
|
[11] |
Kim S N, Kan M Y. Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles[C]// Proceedings of the 2009 ACL-IJCNLP Workshop on Multiword Espesssions.USA:ACL, 2009: 9-16.
|
[12] |
Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508. 01991.
|
[13] |
王序文, 李姣, 吴英杰, 等. 基于BiLSTM-CRF的中文生物医学开放式概念关系抽取[J]. 中华医学图书情报杂志, 2018,27(11):33-39.
|
[13] |
( Wang Xuwen, Li Jiao, Wu Yingjie, et al. BiLSTM-CRF-Based Open Concept Relation Extraction from Chinese Biomedical Texts[J]. Chinese Journal of Medical Library and Information Science, 2018,27(11):33-39.)
|
[14] |
Chen Y, Zhou C J, Li T X, et al. Named Entity Recognition from Chinese Adverse Drug Event Reports with Lexical Feature Based BiLSTM-CRF and Tri-training[J]. Journal of Biomedical Informatics, 2019,96:103252.
doi: 10.1016/j.jbi.2019.103252
pmid: 31323311
|
[15] |
程博, 李卫红, 童昊昕. 基于BiLSTM-CRF的中文层级地址分词[J]. 地球信息科学学报, 2019,21(8):1143-1151.
doi: 10.12082/dqxxkx.2019.180654
|
[15] |
( Cheng Bo, Li Weihong, Tong Haoxin. Chinese Address Segmentation Based on BiLSTM-CRF[J]. Journal of Geo-Information Science, 2019,21(8):1143-1151.)
doi: 10.12082/dqxxkx.2019.180654
|
[16] |
Alzaidy R, Caragea C, Giles C L. Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents[C]// Proceedings of the 2019 World Wide Web Conference. 2019: 2551-2557.
|
[17] |
语言云(语言技术平台云)[EB/OL]. [2018-05-14]. http://www.ltp-cloud.com/.
|
[17] |
(LTP[EB/OL]. [2018-05-14]. http://www.ltp-cloud.com/. )
|
[18] |
宁建飞, 刘降珍. 融合Word2Vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6):20-27.
|
[18] |
( Ning Jianfei, Liu Jiangzhen. Using Word2Vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6):20-27.)
|
[19] |
Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005,18(5-6):602-610.
doi: 10.1016/j.neunet.2005.06.042
pmid: 16112549
|
[20] |
陈伟, 吴友政, 陈文亮, 等. 基于BiLSTM-CRF的关键词自动抽取[J]. 计算机科学, 2018,45(6A):91-96,113.
|
[20] |
( Chen Wei, Wu Youzheng, Chen Wenliang, et al. Automatic Keyword Extraction Based on BiLSTM-CRF[J]. Computer Science, 2018,45(6A):91-96,113.)
|
[21] |
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. EMNPL, 2004: 404-411.
|
[22] |
Danesh S, Sumner T, Martin J H. SGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction[C]// Proceedings of the 4th Joint Conference on Lexical and Computational Semantics. ACL, 2015: 117-126.
|
[23] |
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// Proceedings of the 23rd National Conference on Artificial IntelligenceAAAI, 2008: 855-860.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|