[Objective] This study extracts keywords through combining the internal structure of each single document and the word vector of the corpus. [Methods] First, we used Word2vec to represent all words’ vector from the document corpus and then calculated their similarities. Second, modified the TextRank algorithm and assigned weights to the keywords in accordance with their similarities and adjacency relations. Finally, we built a probability transfer matrix for the iterative calculation of the lexical graph model and then extracted keywords. [Results] The Word2vec and TextRank were integrated and extracted keywords effectively. [Limitations] The proposed method needs much training with the corpus to establish word vector and relation matrix. [Conclusions] The relationship among words from the document sets could help us modify the words relationship from a single document, and then increase the accuracy of extracting keywords from the individual document.
宁建飞,刘降珍. 融合Word2vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016, 32(6): 20-27.
Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords. New Technology of Library and Information Service, DOI：10.11925/infotech.1003-3513.2016.06.03.
(Gu Yijun, Xia Tian.Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7-8): 41-47.)
Goldberg Y, Levy O. Word2vec Explained: Deriving Mikolov et al. 's Negative-sampling Word-embedding Method [OL]. ArXiv, 2014. arXiv: 1402.3722v1.
Frank E, Paynter G W, Witten I H, et al.Domain-Specific Keyphrase Extraction [C]. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. San Francisco: Morgan Kaufmann Publishers Inc., 1999: 668-673.
Turney P D.Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000, 2(4): 303-336.
(Geng Huantong, Cai Qingsheng, Yu Kun, et al.A Method Based on the Co-occurrence of Automatic Text Keyphrase Extraction Method[J]. Journal of Nanjing University: Natural Science Edition, 2006, 42(2): 156-162.)