Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China, Beijing 100872, China School of Information Resource Management, Renmin University of China, Beijing 100872, China
[Objective] This study aims to improve the single document keyword extraction algorithm by adding the world knowledge vector from the Wikipedia to the TextRank model. [Methods] First, we created a new word embedding model based on the Word2Vec model with Wikipedia’s Chinese data. Second, we clustered the nodes of TextRank wordgraph to adjust the voting importance of each cluster. Third, we calculated the random walk probability with additional factors of coverage and location. Finally, we got the node score with iterative computation of the transition matrix, and then selected the Top N words as the needed keywords. [Results] The performance of the new TextRank model was much better than other methods when the Top N value was less than or equal to 7. If we only retrieved three keywords, the F measure reached its maximum value, which was 3.374% higher than the best existing results. When the Top N value was larger than 7, the results were similar to the traditional TextRank method. [Limitations] The computation cost was increased due to the cluster analysis. [Conclusions] The new weighted TextRank model could extract keywords effectively.
(Gu Yijun, Xia Tian.Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7/8): 41-47.)
(Li Peng, Wang Bin, Shi Zhiwei, et al.Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags[J]. Journal of Computer Research and Development, 2012, 49(11): 2344-2351.)
(Xie Wei, Shen Yi, Ma Yongzheng.Recommendation System for Paper Reviewing Based on Graph Computing[J]. Application Research of Computers, 2016, 33(3): 798-801.)
doi: 10.3969/j.issn.1001-3695.2016.03.035
(Li Yuepeng, Jin Cui, Ji Junchuan.A Keyword Extraction Algorithm Based on Word2vec[J]. e-Science Technology & Application, 2015,6(4): 54-59.)
doi: 10.11871/j.issn.1674-9480.2015.04.007
(Ning Jianfei, Liu Jiangzhen.Using Word2vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6): 20-27.)
[8]
Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[C]//Proceedings of Workshop at International Conference on Learning Representations. 2013.
[9]
Ansj Lexical Parser [EB/OL]. [2016-10-01]..
[10]
Deep Learning with Word2vec [EB/OL]. [2016-10-01]. .