Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (9): 30-34    DOI: 10.11925/infotech.1003-3513.2013.09.05
Current Issue | Archive | Adv Search |
Study on Keyword Extraction Using Word Position Weighted TextRank
Xia Tian1,2
Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China, Beijing 100872, China) (School of Information Resource Management, Renmin University of China, Beijing 100872, China
Download: PDF(530 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The keyword extraction problem is taken as a word importance ranking problem. In this paper,candidate keyword graph is constructed based on TextRank, and the influences of word coverage, location and frequency are used to calculate the probability transition matrix, then, the word score is calculated by iterative method, and the top N candidate keywords are picked as the final results. Experimental results show that the proposed word position weighted TextRank method is better than the traditional TextRank method and LDA topic model method.
Key wordsKeyword extraction      Word rank      TextRank      Graph model      LDA     
Received: 01 July 2013      Published: 27 September 2013
:  G350  

Cite this article:

Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank. New Technology of Library and Information Service, 2013, 29(9): 30-34.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.09.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I9/30

[1] Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]. In: Proceedings of Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004:404-411.
[2] Frank E, Paynter G W, Witten I H, et al. Domain-Specific Keyphrase Extraction[C]. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. 1999: 668-673.
[3] Turney P D. Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000, 2(4):303-336.
[4] Pasquier C. Task 5: Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation[C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 154-157.
[5] 石晶,李万龙. 基于LDA模型的主题词抽取方法[J]. 计算机工程, 2010, 36(19):81-83.(Shi Jing, Li Wanlong. Topic Words Extraction Method Based on LDA Model[J]. Computer Engineering, 2010, 36(19): 81-83.)
[6] 刘俊,邹东升,邢欣来,等. 基于主题特征的关键词抽取[J]. 计算机应用研究, 2012, 29(11): 4224-4227. (Liu Jun, Zou Dongsheng, Xing Xinlai, et al. Keyphrase Extraction Based on Topic Feature [J]. Application Research of Computers, 2012, 29(11): 4224-4227.)
[7] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[8] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web [R]. Stanford Digital Library Technologies Project,1998.
[9] Rajaraman A, Ullman J D. Mining of Massive Datasets[M]. Cambridge University Press, 2012: 171-173.
[10] 夏天. 中心网页中主题网页链接的自动抽取[J]. 山东大学学报:理学版, 2012, 47(5): 25-31. (Xia Tian. Automatic Extracting Topic Page Links from Hub Page[J]. Journal of Shandong University: Natural Science, 2012, 47(5): 25-31.)
[11] 夏天. 基于扩展标记树的网页正文抽取[J]. 广西师范大学学报:自然科学版, 2011, 29(1): 133-137. (Xia Tian. Content Extraction of Web Page Based on Extended Label Tree[J]. Journal of Guangxi Normal University: Natural Science Edition, 2011, 29(1): 133-137.)
[1] Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[2] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[3] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[4] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[5] Junwan Liu,Zhixin Long,Feifei Wang. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[6] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[7] Yue He,Yue Feng,Shupeng Zhao,Yufeng Ma. Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[8] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[9] Zhuchen Liu,Hao Chen,Yanhua Yu,Jie Li. Extracting Keywords with TextRank and Weighted Word Positions[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[10] Yanhua Xu,Yujie Miao,Lin Miao,Xueqiang Lv. Generating HSK Writing Essays with LDA Model[J]. 数据分析与知识发现, 2018, 2(9): 80-87.
[11] Ziming Zeng,Qianwen Yang. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[12] Beibei Pang,Juanqiong Gou,Wenxin Mu. Extracting Topics and Their Relationship from College Student Mentoring[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
[13] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[14] Jingqi Wang,Rui Li,Huayi Wu. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[15] He Li,Linlin Zhu,Min Yan,Jincheng Liu,Chuang Hong. Identifying Useful Information from Open Innovation Community[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn