Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (9): 74-79    DOI: 10.11925/infotech.2096-3467.2018.0271
Current Issue | Archive | Adv Search |
Extracting Keywords with TextRank and Weighted Word Positions
Liu Zhuchen1, Chen Hao2, Yu Yanhua1(), Li Jie1
1School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
2China Shipbuilding Industry System Engineering Research Institute, Beijing 100094, China
Download: PDF (581 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study integrates the position and distance attributes of words into the TextRank model, aiming to extract keywords from single document more effectively. [Methods] First, we constructed the word graph for candidates based on the TextRank method. Then, we merged the position information of the words, and calculated their probability transfer matrix. Finally, we obtained the points of candidate words by iterative calculation, and retrieved the top K of keywords with the highest scores. [Results] We found that the weighted TextRank method yielded better results than the traditional algorithms. When the K values were 3, 5, 7 and 10, the increment of F value were 1.29%, 3.14%, 5.43% and 5.88% respectively. [Limitations] This study did not include knowledge base and did not fully utilize the external lexical relationship information. [Conclusions] The position and distribution of words can help us extract keywords more effectively.

Key wordsKeyword Extraction      TextRank      Word Location Distribution      Word Distance     
Received: 12 March 2018      Published: 25 October 2018
ZTFLH:  分类号: G353 TP391  

Cite this article:

Liu Zhuchen,Chen Hao,Yu Yanhua,Li Jie. Extracting Keywords with TextRank and Weighted Word Positions. Data Analysis and Knowledge Discovery, 2018, 2(9): 74-79.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0271     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I9/74

位置 词最后出现
Q Z H
词首次出现 Q 1.06 1.15 1.6
Z 1.15 1.04 1.10
H 1.6 1.10 1.02
N
方法
准确率P 召回率R F值
3 5 7 10 3 5 7 10 3 5 7 10
TF_IDF 0.235 0.172 0.135 0.104 0.198 0.241 0.265 0.291 0.215 0.201 0.179 0.153
TextRank 0.300 0.220 0.177 0.137 0.253 0.309 0.347 0.384 0.274 0.257 0.234 0.202
PositionRank 0.334 0.247 0.195 0.150 0.281 0.348 0.383 0.420 0.305 0.289 0.258 0.221
NingJianfei 0.048 0.048 0.046 0.042 0.040 0.068 0.091 0.117 0.044 0.056 0.061 0.062
ClusterRank 0.293 0.217 0.174 0.136 0.247 0.305 0.343 0.38 0.268 0.254 0.231 0.200
Cluster PositionRank 0.338 0.246 0.195 0.150 0.285 0.345 0.383 0.420 0.309 0.287 0.258 0.221
MyWPMWRank 0.342 0.253 0.205 0.159 0.288 0.356 0.403 0.446 0.313 0.296 0.272 0.234
[1] Guo A Z, Yang T.Research and Improvement of Feature Words Weight Based on TFIDF Algorithm[C]//Proceedings of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, Chongqing, China. IEEE, 2016: 415-419.
[2] Wei H X, Gao G L, Su X D.LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents[A]// Neural Information Processing[M]. Springer, 2016.
[3] 刘通. 基于复杂网络的文本关键词提取算法研究[J]. 计算机应用研究, 2016, 33(2): 365-369.
[3] (Liu Tong.Algorithm Research of Text Key Word Extraction Based on Complex Network[J]. Application Research of Computers, 2016, 33(2): 365-369.)
[4] 赵京胜, 朱巧明, 周国栋, 等. 自动关键词抽取研究综述[J].软件学报, 2017, 28(9): 2431-2449.
[4] (Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al.Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017, 28(9): 2431-2449.)
[5] Boudin F.A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013.
[6] Bougouin A, Boudin F, Daille B.TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013: 543-551.
[7] Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004.
[8] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34.
[8] (Xia Tian.Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9): 30-34.)
[9] 顾益军, 夏天. 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7/8): 41-47.
[9] (Gu Yijun, Xia Tian.Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7/8): 41-47.)
[10] 李鹏, 王斌, 石志伟, 等. Tag-TextRank: 一种基于Tag的网页关键词抽取方法[J]. 计算机研究与发展, 2012, 49(11): 2344-2351.
[10] (Li Peng, Wang Bin, Shi Zhiwei, et al.Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags[J]. Journal of Computer Research and Development, 2012, 49(11): 2344-2351.)
[11] 谢玮, 沈一, 马永征. 基于图计算的论文审稿自动推荐系统[J]. 计算机应用研究, 2016, 33(3): 798-801.
doi: 10.3969/j.issn.1001-3695.2016.03.035
[11] (Xie Wei,Shen Yi, Ma Yongzheng.Recommendation System for Paper Reviewing Based on Graph Computing[J]. Application Research of Computers, 2016, 33(3): 798-801.)
doi: 10.3969/j.issn.1001-3695.2016.03.035
[12] 李跃鹏, 金翠, 及俊川. 基于Word2Vec的关键词提取算法[J]. 科研信息化技术与应用, 2015, 6(4): 54-59.
[12] (Li Yuepeng, Jin Cui, Ji Junchuan.A Keyword Extraction Algorithm Based on Word2vec[J]. E-science Technology & Application, 2015,6(4): 54-59.)
[13] 宁建飞, 刘降珍. 融合Word2vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6): 20-27.
[13] (Ning Jianfei, Liu Jiangzhen.Using Word2vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6): 20-27.)
[14] 夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[14] (Xia Tian.Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(2): 28-34.)
[1] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[2] Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[3] Mingzhu Sun,Jing Ma,Lingfei Qian. Extracting Keywords Based on Topic Structure and Word Diagram Iteration[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
[4] An Wang,Yijun Gu,Kunming Li,Wenzheng Li. Extracting Keywords Based on Removed Network Word Nodes[J]. 数据分析与知识发现, 2019, 3(11): 35-44.
[5] Wang Zixuan,Le Xiaoqiu,He Yuanbiao. Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[6] Xia Tian. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[7] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[8] Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. 现代图书情报技术, 2014, 30(7): 41-47.
[9] Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J]. 现代图书情报技术, 2013, 29(9): 30-34.
[10] Ye Chunlei, Leng Fuhai. Study on the Keyword Extraction from Roadmap Based on the Lexical Chains[J]. 现代图书情报技术, 2013, 29(1): 50-56.
[11] Yin Shumei,Zhang Zhixiong,Wu Zhenxin. A Method for Automatic Keyword Extraction and Filtration from Medical Texts[J]. 现代图书情报技术, 2008, 24(8): 31-36.
[12] Zhang Chengzhi. Review and Prospect of Automatic Indexing Research[J]. 现代图书情报技术, 2007, 2(11): 33-39.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn