Extracting Keywords with TextRank and Weighted Word Positions
Liu Zhuchen1, Chen Hao2, Yu Yanhua1(), Li Jie1
1School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China 2China Shipbuilding Industry System Engineering Research Institute, Beijing 100094, China
[Objective] This study integrates the position and distance attributes of words into the TextRank model, aiming to extract keywords from single document more effectively. [Methods] First, we constructed the word graph for candidates based on the TextRank method. Then, we merged the position information of the words, and calculated their probability transfer matrix. Finally, we obtained the points of candidate words by iterative calculation, and retrieved the top K of keywords with the highest scores. [Results] We found that the weighted TextRank method yielded better results than the traditional algorithms. When the K values were 3, 5, 7 and 10, the increment of F value were 1.29%, 3.14%, 5.43% and 5.88% respectively. [Limitations] This study did not include knowledge base and did not fully utilize the external lexical relationship information. [Conclusions] The position and distribution of words can help us extract keywords more effectively.
刘竹辰, 陈浩, 于艳华, 李劼. 词位置分布加权TextRank的关键词提取*[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
Liu Zhuchen,Chen Hao,Yu Yanhua,Li Jie. Extracting Keywords with TextRank and Weighted Word Positions. Data Analysis and Knowledge Discovery, 2018, 2(9): 74-79.
Guo A Z, Yang T.Research and Improvement of Feature Words Weight Based on TFIDF Algorithm[C]//Proceedings of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, Chongqing, China. IEEE, 2016: 415-419.
[2]
Wei H X, Gao G L, Su X D.LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents[A]// Neural Information Processing[M]. Springer, 2016.
(Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al.Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017, 28(9): 2431-2449.)
[5]
Boudin F.A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013.
[6]
Bougouin A, Boudin F, Daille B.TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013: 543-551.
[7]
Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004.
(Gu Yijun, Xia Tian.Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7/8): 41-47.)
(Li Peng, Wang Bin, Shi Zhiwei, et al.Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags[J]. Journal of Computer Research and Development, 2012, 49(11): 2344-2351.)
(Xie Wei,Shen Yi, Ma Yongzheng.Recommendation System for Paper Reviewing Based on Graph Computing[J]. Application Research of Computers, 2016, 33(3): 798-801.)
doi: 10.3969/j.issn.1001-3695.2016.03.035