[Objective] This paper integrates the topic information to the TextRank model, aiming to improve the precision and recall of automatic keyword extraction. [Methods] First, we used the LDA to create a model for document topics, and obtained the topic distribution of the candidate keywords. Then, we calculated the node weights with the topic-word probability distribution features. Third, we weighted the probability distributions of document-topic and topic-word characteristics as the node’s random jump probability. Finally, we constructed a new transition matrix for word graph iteration to improve the TextRank model. [Results] We examined the proposed model with 1559 news articles from the website of Southern Weekly. When the number of extracted keywords was three, the model’s keyword extraction precision values were 4.7% and 6.5% higher than those of the original TextRank and TF-IDF algorithms. [Limitations] The fusion algorithm increased computational complexity. [Conclusions] The proposed algorithm could extract keywords more effectively.
孙明珠,马静,钱玲飞. 基于文档主题结构和词图迭代的关键词抽取方法研究 *[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
Mingzhu Sun,Jing Ma,Lingfei Qian. Extracting Keywords Based on Topic Structure and Word Diagram Iteration. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0765.
赵京胜, 朱巧明, 周国栋 , 等. 自动关键词抽取研究综述[J]. 软件学报, 2017,28(9):2431-2449. ( Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong , et al. Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017,28(9):2431-2449.)
[2]
Mihalcea R, Tarau P. TextRank: Bringing Order into Texts [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[3]
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003,3:993-1022.
[4]
Turney P D . Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
[5]
Frank E, Paynter G W, Witten I H, et al. Domain-Specific Keyphrase Extraction [C]// Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1999: 668-673.
[6]
钱爱兵, 江岚 . 基于改进TF-IDF的中文网页关键词抽取——以新闻网页为例[J]. 情报理论与实践, 2008,31(6):945-950. ( Qian Aibing, Jiang Lan . Chinese Webpage Keyword Extraction Based on Improved TF-IDF—Taking News Webpage as an Example[J]. Information Studies: Theory & Application, 2008,31(6):945-950.)
[7]
杨凯艳 . 基于改进的TFIDF关键词自动提取算法研究[D]. 湘潭: 湘潭大学, 2015. ( Yang Kaiyan . Research on Automatic Keyword Extraction Algorithm Based on Improved TFIDF[D]. Xiangtan: Xiangtan University, 2015.)
[8]
朱泽德, 李淼, 张健 , 等. 一种基于LDA模型的关键词抽取方法[J]. 中南大学学报: 自然科学版, 2015,46(6):2142-2148. ( Zhu Zede, Li Miao, Zhang Jian , et al. A LDA-Based Approach to Keyphrase Extraction[J]. Journal of Central South University: Science and Technology, 2015,46(6):2142-2148.)
[9]
丁卓冶 . 面向主题的关键词抽取方法研究[D]. 上海: 复旦大学, 2013. ( Ding Zhuoye . Research on Keyword Extraction Methods for Topics[D]. Shanghai: Fudan University, 2013.)
[10]
夏天 . 词语位置加权TextRank的关键词抽取研究[J].现代图书情报技术, 2013(9):30-34. ( Xia Tian . Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9):30-34.)
[11]
夏天 . 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017,1(2):28-34. ( Xia Tian . Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.)
[12]
Bougouin A, Boudin F, Daille B. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction [C]// Proceedings of the 2013 International Joint Conference on Natural Language Processing. 2013: 543-551.
[13]
顾益军, 夏天 . 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7-8):41-47. ( Gu Yijun, Xia Tian . Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7-8):41-47.)
[14]
刘啸剑, 谢飞, 吴信东 . 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016,35(6):664-672. ( Liu Xiaojian, Xie Fei, Wu Xindong . Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(6):664-672.)
[15]
Liu Z, Huang W, Zheng Y, et al. Automatic Keyphrase Extraction via Topic Decomposition [C]// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 366-376.