Keyword Extraction for Journals Based on Part-of-speech and BiLSTM-CRF Combined Model
Cheng Bin,Shi Shuicai,Du YunCheng,Xiao Shibin
(Computer School,Beijing Information Science and Technology University , Beijing 100185, China)
(Beijing TRS Information Technology Co., Ltd., Beijing 100101, China)
[Objective] Utilizing the advantages of the CRF model to solve the problem of sequence labeling, by incorporating part-of-speech information and the CRF model into the BiLSTM network, automatic extraction of journal keywords is realized.
[Methods] The keyword extraction problem is considered as a sequence labeling problem. Pre-processing word segmentation and part-of-speech tagging of journal text; vectorizing the pre-processed text using the word2vec model for Word Embedding to obtain vector expressions of words; using BiLSTM-CRF model for automatic keyword extraction
[Results] Using the part-of-speech and BiLSTM-CRF network to perform experiments on the collected China National Knowledge Infrastructure text, the accuracy on SW is improved by 3% compared to the original BiLSTM model. On CW, the accuracy is improved by 12%.
[Limitations] The journal keyword extraction model cannot accurately extract complex keywords. In future work, it is necessary to further remind the model of the performance of complex keywords.
[Conclusions] Compared with the traditional method, the BiLSTM-CRF model with part-of-speech integration has higher recognition accuracy and is an effective keyword extraction method.
成彬, 施水才, 都云程, 肖诗斌.
基于融合词性的BiLSTM-CRF的期刊关键词抽取方法
[J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2019.1306.
Cheng Bin, Shi Shuicai, Du YunCheng, Xiao Shibin.
Keyword Extraction for Journals Based on Part-of-speech and BiLSTM-CRF Combined Model
. Data Analysis and Knowledge Discovery, 0, (): 1-.