[Objective] This paper integrates the topic information to the TextRank model, aiming to improve the precision and recall of automatic keyword extraction. [Methods] First, we used the LDA to create a model for document topics, and obtained the topic distribution of the candidate keywords. Then, we calculated the node weights with the topic-word probability distribution features. Third, we weighted the probability distributions of document-topic and topic-word characteristics as the node’s random jump probability. Finally, we constructed a new transition matrix for word graph iteration to improve the TextRank model. [Results] We examined the proposed model with 1559 news articles from the website of Southern Weekly. When the number of extracted keywords was three, the model’s keyword extraction precision values were 4.7% and 6.5% higher than those of the original TextRank and TF-IDF algorithms. [Limitations] The fusion algorithm increased computational complexity. [Conclusions] The proposed algorithm could extract keywords more effectively.
孙明珠,马静,钱玲飞. 基于文档主题结构和词图迭代的关键词抽取方法研究 *[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
Mingzhu Sun,Jing Ma,Lingfei Qian. Extracting Keywords Based on Topic Structure and Word Diagram Iteration. Data Analysis and Knowledge Discovery, 2019, 3(8): 68-76.
( Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong , et al. Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017,28(9):2431-2449.)
[2]
Mihalcea R, Tarau P. TextRank: Bringing Order into Texts [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[3]
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003,3:993-1022.
[4]
Turney P D . Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
[5]
Frank E, Paynter G W, Witten I H, et al. Domain-Specific Keyphrase Extraction [C]// Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1999: 668-673.
( Qian Aibing, Jiang Lan . Chinese Webpage Keyword Extraction Based on Improved TF-IDF—Taking News Webpage as an Example[J]. Information Studies: Theory & Application, 2008,31(6):945-950.)
[7]
杨凯艳 . 基于改进的TFIDF关键词自动提取算法研究[D]. 湘潭: 湘潭大学, 2015.
[7]
( Yang Kaiyan . Research on Automatic Keyword Extraction Algorithm Based on Improved TFIDF[D]. Xiangtan: Xiangtan University, 2015.)
( Zhu Zede, Li Miao, Zhang Jian , et al. A LDA-Based Approach to Keyphrase Extraction[J]. Journal of Central South University: Science and Technology, 2015,46(6):2142-2148.)
[9]
丁卓冶 . 面向主题的关键词抽取方法研究[D]. 上海: 复旦大学, 2013.
[9]
( Ding Zhuoye . Research on Keyword Extraction Methods for Topics[D]. Shanghai: Fudan University, 2013.)
( Xia Tian . Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.)
[12]
Bougouin A, Boudin F, Daille B. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction [C]// Proceedings of the 2013 International Joint Conference on Natural Language Processing. 2013: 543-551.
( Gu Yijun, Xia Tian . Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7-8):41-47.)
( Liu Xiaojian, Xie Fei, Wu Xindong . Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(6):664-672.)
[15]
Liu Z, Huang W, Zheng Y, et al. Automatic Keyphrase Extraction via Topic Decomposition [C]// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 366-376.