New Technology of Library and Information Service  2016, Vol. 32 Issue (6): 20-27    DOI: 10.11925/infotech.1003-3513.2016.06.03
Using Word2vec with TextRank to Extract Keywords
Ning Jianfei(),Liu Jiangzhen
Department of Electronic Information, Luoding Polytechnic, Luoding 527200, China
[Objective] This study extracts keywords through combining the internal structure of each single document and the word vector of the corpus. [Methods] First, we used Word2vec to represent all words’ vector from the document corpus and then calculated their similarities. Second, modified the TextRank algorithm and assigned weights to the keywords in accordance with their similarities and adjacency relations. Finally, we built a probability transfer matrix for the iterative calculation of the lexical graph model and then extracted keywords. [Results] The Word2vec and TextRank were integrated and extracted keywords effectively. [Limitations] The proposed method needs much training with the corpus to establish word vector and relation matrix. [Conclusions] The relationship among words from the document sets could help us modify the words relationship from a single document, and then increase the accuracy of extracting keywords from the individual document.

Key wordsKeyword extraction      Word2vec      TextRank      Graphical model      Word vector     
Received: 01 March 2016      Published: 18 July 2016

Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords. New Technology of Library and Information Service, 2016, 32(6): 20-27.

