|
|
Extracting Keywords with TextRank and Weighted Word Positions |
Liu Zhuchen1, Chen Hao2, Yu Yanhua1(), Li Jie1 |
1School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China 2China Shipbuilding Industry System Engineering Research Institute, Beijing 100094, China |
|
|
Abstract [Objective] This study integrates the position and distance attributes of words into the TextRank model, aiming to extract keywords from single document more effectively. [Methods] First, we constructed the word graph for candidates based on the TextRank method. Then, we merged the position information of the words, and calculated their probability transfer matrix. Finally, we obtained the points of candidate words by iterative calculation, and retrieved the top K of keywords with the highest scores. [Results] We found that the weighted TextRank method yielded better results than the traditional algorithms. When the K values were 3, 5, 7 and 10, the increment of F value were 1.29%, 3.14%, 5.43% and 5.88% respectively. [Limitations] This study did not include knowledge base and did not fully utilize the external lexical relationship information. [Conclusions] The position and distribution of words can help us extract keywords more effectively.
|
Received: 12 March 2018
Published: 25 October 2018
|
|
[1] |
Guo A Z, Yang T.Research and Improvement of Feature Words Weight Based on TFIDF Algorithm[C]//Proceedings of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, Chongqing, China. IEEE, 2016: 415-419.
|
[2] |
Wei H X, Gao G L, Su X D.LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents[A]// Neural Information Processing[M]. Springer, 2016.
|
[3] |
刘通. 基于复杂网络的文本关键词提取算法研究[J]. 计算机应用研究, 2016, 33(2): 365-369.
|
[3] |
(Liu Tong.Algorithm Research of Text Key Word Extraction Based on Complex Network[J]. Application Research of Computers, 2016, 33(2): 365-369.)
|
[4] |
赵京胜, 朱巧明, 周国栋, 等. 自动关键词抽取研究综述[J].软件学报, 2017, 28(9): 2431-2449.
|
[4] |
(Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al.Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017, 28(9): 2431-2449.)
|
[5] |
Boudin F.A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013.
|
[6] |
Bougouin A, Boudin F, Daille B.TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013: 543-551.
|
[7] |
Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004.
|
[8] |
夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34.
|
[8] |
(Xia Tian.Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9): 30-34.)
|
[9] |
顾益军, 夏天. 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7/8): 41-47.
|
[9] |
(Gu Yijun, Xia Tian.Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7/8): 41-47.)
|
[10] |
李鹏, 王斌, 石志伟, 等. Tag-TextRank: 一种基于Tag的网页关键词抽取方法[J]. 计算机研究与发展, 2012, 49(11): 2344-2351.
|
[10] |
(Li Peng, Wang Bin, Shi Zhiwei, et al.Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags[J]. Journal of Computer Research and Development, 2012, 49(11): 2344-2351.)
|
[11] |
谢玮, 沈一, 马永征. 基于图计算的论文审稿自动推荐系统[J]. 计算机应用研究, 2016, 33(3): 798-801.
doi: 10.3969/j.issn.1001-3695.2016.03.035
|
[11] |
(Xie Wei,Shen Yi, Ma Yongzheng.Recommendation System for Paper Reviewing Based on Graph Computing[J]. Application Research of Computers, 2016, 33(3): 798-801.)
doi: 10.3969/j.issn.1001-3695.2016.03.035
|
[12] |
李跃鹏, 金翠, 及俊川. 基于Word2Vec的关键词提取算法[J]. 科研信息化技术与应用, 2015, 6(4): 54-59.
|
[12] |
(Li Yuepeng, Jin Cui, Ji Junchuan.A Keyword Extraction Algorithm Based on Word2vec[J]. E-science Technology & Application, 2015,6(4): 54-59.)
|
[13] |
宁建飞, 刘降珍. 融合Word2vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6): 20-27.
|
[13] |
(Ning Jianfei, Liu Jiangzhen.Using Word2vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6): 20-27.)
|
[14] |
夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
|
[14] |
(Xia Tian.Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(2): 28-34.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|