[Objective] This study modifies the TextRank algorithm with a method of removing word nodes, aiming to improve the results of keyword extraction from Chinese documents. [Methods] We proposed an updated RemoveRank algorithm to collect Chinese keywords and alternately carried out the sorting and removing steps. Based on the complex network structure characteristics of word graph, we used the removal queue as the sorting results for word nodes to extract keywords. [Results] We examined the proposed method on dataset with marked keywords from Southern Weekend. The new algorithm had better performance than the traditional methods. When the number of extracted keywords were 3, 5, and 7, their F values were 4%, 6%, and 5% higher than those of the TextRank. [Limitations] Our word graph did not include the weight of edges. [Conclusions] The RemoveRank method could effectively extract keywords from Chinese documents with the appropriate sliding window values.
王安,顾益军,李坤明,李文政. 基于复杂网络词节点移除的关键词抽取方法 *[J]. 数据分析与知识发现, 2019, 3(11): 35-44.
An Wang,Yijun Gu,Kunming Li,Wenzheng Li. Extracting Keywords Based on Removed Network Word Nodes. Data Analysis and Knowledge Discovery, 2019, 3(11): 35-44.
Mihalcea R, Tarau P . TextRank: Bringing Order into Texts [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004: 404-411.
( Li Peng, Wang Bin, Shi Zhiwei , et al. Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags[J]. Journal of Computer Research and Development, 2012,49(11):2344-2351.)
( Liu Linqing, Yu Han, Fei Ning , et al. Key-word Extracting Algorithm from Single Text Based on TextRank[J]. Application Research of Computers, 2018,35(3):705-710.)
( Gu Yiran, Xu Mengxin . Keyword Extraction from News Articles Based on PageRank Algorithm[J]. Journal of University of Electronic Science and Technology of China, 2017,46(5):777-783.)
( Ning Jianfei, Liu Jiangzhen . Using Word2Vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6):20-27.)
( Xia Tian . Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.)
[10]
Wan X, Xiao J . Single Document Keyphrase Extraction Using Neighborhood Knowledge [C]// Proceedings of the 23rd National Conference on Artificial Intelligence. 2008: 855-860.
[11]
Gollapalli S D, Caragea C . Extracting Keyphrases from Research Papers Using Citation Networks [C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014.
( Zhao Jingsheng, Zhang Li, Xiao Na . Research on the Chinese Text Keyword Extraction Based on Complex Network[J]. Journal of Qingdao University of Technology, 2018,39(3):102-108.)
( Chang Yaocheng, Zhang Yuxiang, Wang Hong , et al. Features Oriented Survey of State-of-the-Art Keyphrase Extraction Algorithms[J]. Journal of Software, 2018,29(7):2046-2070.)
[14]
Page L, Brin S, Motwani R , et al. The PageRank Citation Ranking: Bringing Order to the Web[R]. Stanford InfoLab, 1999.
( Lu Wei, Cheng Qikai . An Information Retrieval Model Based on Weighted Graph and Sentence[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(8):797-804.)
[17]
刘知远 . 基于文档主题结构的关键词抽取方法研究[D]. 北京: 清华大学, 2011.
[17]
( Liu Zhiyuan . Research on Keyword Extraction Using Document Topical Structure[D]. Beijing: Tsinghua University, 2011.)
[18]
姜雅文 . 复杂网络社区发现若干问题研究[D]. 北京: 北京交通大学, 2014.
[18]
( Jiang Yawen . Community Detection in Complex Networks[D]. Beijing: Beijing Jiaotong University, 2014.)