[Objective] This study integrates the position and distance attributes of words into the TextRank model, aiming to extract keywords from single document more effectively. [Methods] First, we constructed the word graph for candidates based on the TextRank method. Then, we merged the position information of the words, and calculated their probability transfer matrix. Finally, we obtained the points of candidate words by iterative calculation, and retrieved the top K of keywords with the highest scores. [Results] We found that the weighted TextRank method yielded better results than the traditional algorithms. When the K values were 3, 5, 7 and 10, the increment of F value were 1.29%, 3.14%, 5.43% and 5.88% respectively. [Limitations] This study did not include knowledge base and did not fully utilize the external lexical relationship information. [Conclusions] The position and distribution of words can help us extract keywords more effectively.
刘竹辰, 陈浩, 于艳华, 李劼. 词位置分布加权TextRank的关键词提取*[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
Liu Zhuchen,Chen Hao,Yu Yanhua,Li Jie. Extracting Keywords with TextRank and Weighted Word Positions. Data Analysis and Knowledge Discovery, 2018, 2(9): 74-79.
Guo A Z, Yang T.Research and Improvement of Feature Words Weight Based on TFIDF Algorithm[C]//Proceedings of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, Chongqing, China. IEEE, 2016: 415-419.
Wei H X, Gao G L, Su X D.LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents[A]// Neural Information Processing[M]. Springer, 2016.
(Xie Wei,Shen Yi, Ma Yongzheng.Recommendation System for Paper Reviewing Based on Graph Computing[J]. Application Research of Computers, 2016, 33(3): 798-801.)