|
|
Extracting Keywords Based on Topic Structure and Word Diagram Iteration |
Mingzhu Sun,Jing Ma(),Lingfei Qian |
School of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China |
|
|
Abstract [Objective] This paper integrates the topic information to the TextRank model, aiming to improve the precision and recall of automatic keyword extraction. [Methods] First, we used the LDA to create a model for document topics, and obtained the topic distribution of the candidate keywords. Then, we calculated the node weights with the topic-word probability distribution features. Third, we weighted the probability distributions of document-topic and topic-word characteristics as the node’s random jump probability. Finally, we constructed a new transition matrix for word graph iteration to improve the TextRank model. [Results] We examined the proposed model with 1559 news articles from the website of Southern Weekly. When the number of extracted keywords was three, the model’s keyword extraction precision values were 4.7% and 6.5% higher than those of the original TextRank and TF-IDF algorithms. [Limitations] The fusion algorithm increased computational complexity. [Conclusions] The proposed algorithm could extract keywords more effectively.
|
Received: 15 July 2018
Published: 29 September 2019
|
|
Corresponding Authors:
Jing Ma
E-mail: majing5525@126.com
|
[1] |
赵京胜, 朱巧明, 周国栋 , 等. 自动关键词抽取研究综述[J]. 软件学报, 2017,28(9):2431-2449.
|
[1] |
( Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong , et al. Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017,28(9):2431-2449.)
|
[2] |
Mihalcea R, Tarau P. TextRank: Bringing Order into Texts [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
|
[3] |
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003,3:993-1022.
|
[4] |
Turney P D . Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
|
[5] |
Frank E, Paynter G W, Witten I H, et al. Domain-Specific Keyphrase Extraction [C]// Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1999: 668-673.
|
[6] |
钱爱兵, 江岚 . 基于改进TF-IDF的中文网页关键词抽取——以新闻网页为例[J]. 情报理论与实践, 2008,31(6):945-950.
|
[6] |
( Qian Aibing, Jiang Lan . Chinese Webpage Keyword Extraction Based on Improved TF-IDF—Taking News Webpage as an Example[J]. Information Studies: Theory & Application, 2008,31(6):945-950.)
|
[7] |
杨凯艳 . 基于改进的TFIDF关键词自动提取算法研究[D]. 湘潭: 湘潭大学, 2015.
|
[7] |
( Yang Kaiyan . Research on Automatic Keyword Extraction Algorithm Based on Improved TFIDF[D]. Xiangtan: Xiangtan University, 2015.)
|
[8] |
朱泽德, 李淼, 张健 , 等. 一种基于LDA模型的关键词抽取方法[J]. 中南大学学报: 自然科学版, 2015,46(6):2142-2148.
|
[8] |
( Zhu Zede, Li Miao, Zhang Jian , et al. A LDA-Based Approach to Keyphrase Extraction[J]. Journal of Central South University: Science and Technology, 2015,46(6):2142-2148.)
|
[9] |
丁卓冶 . 面向主题的关键词抽取方法研究[D]. 上海: 复旦大学, 2013.
|
[9] |
( Ding Zhuoye . Research on Keyword Extraction Methods for Topics[D]. Shanghai: Fudan University, 2013.)
|
[10] |
夏天 . 词语位置加权TextRank的关键词抽取研究[J].现代图书情报技术, 2013(9):30-34.
|
[10] |
( Xia Tian . Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9):30-34.)
|
[11] |
夏天 . 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017,1(2):28-34.
|
[11] |
( Xia Tian . Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.)
|
[12] |
Bougouin A, Boudin F, Daille B. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction [C]// Proceedings of the 2013 International Joint Conference on Natural Language Processing. 2013: 543-551.
|
[13] |
顾益军, 夏天 . 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7-8):41-47.
|
[13] |
( Gu Yijun, Xia Tian . Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7-8):41-47.)
|
[14] |
刘啸剑, 谢飞, 吴信东 . 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016,35(6):664-672.
|
[14] |
( Liu Xiaojian, Xie Fei, Wu Xindong . Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(6):664-672.)
|
[15] |
Liu Z, Huang W, Zheng Y, et al. Automatic Keyphrase Extraction via Topic Decomposition [C]// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 366-376.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|