|
|
Extracting Keywords Based on Removed Network Word Nodes |
An Wang,Yijun Gu(),Kunming Li,Wenzheng Li |
College of Information Technology and Cyber Security, People’s Public Security University of China, Beijing 102600, China |
|
|
Abstract [Objective] This study modifies the TextRank algorithm with a method of removing word nodes, aiming to improve the results of keyword extraction from Chinese documents. [Methods] We proposed an updated RemoveRank algorithm to collect Chinese keywords and alternately carried out the sorting and removing steps. Based on the complex network structure characteristics of word graph, we used the removal queue as the sorting results for word nodes to extract keywords. [Results] We examined the proposed method on dataset with marked keywords from Southern Weekend. The new algorithm had better performance than the traditional methods. When the number of extracted keywords were 3, 5, and 7, their F values were 4%, 6%, and 5% higher than those of the TextRank. [Limitations] Our word graph did not include the weight of edges. [Conclusions] The RemoveRank method could effectively extract keywords from Chinese documents with the appropriate sliding window values.
|
Received: 31 January 2019
Published: 18 December 2019
|
|
Corresponding Authors:
Yijun Gu
E-mail: guyijun@ppsuc.edu.cn
|
[1] |
Salton G . Developments in Automatic Text Retrieval[J]. Science, 1991,253(5023):974-979.
doi: 10.1126/science.253.5023.974
pmid: 17775340
|
[2] |
Mihalcea R, Tarau P . TextRank: Bringing Order into Texts [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004: 404-411.
|
[3] |
夏天 . 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9):30-34.
|
[3] |
( Xia Tian . Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9):30-34.)
|
[4] |
顾益军 . 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7):41-47.
|
[4] |
( Gu Yijun . Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7):41-47.)
|
[5] |
李鹏, 王斌, 石志伟 , 等. Tag-TextRank: 一种基于Tag的网页关键词抽取方法[J]. 计算机研究与发展, 2012,49(11):2344-2351.
|
[5] |
( Li Peng, Wang Bin, Shi Zhiwei , et al. Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags[J]. Journal of Computer Research and Development, 2012,49(11):2344-2351.)
|
[6] |
柳林青, 余瀚, 费宁 , 等. 一种基于TextRank的单文本关键字提取算法[J]. 计算机应用研究, 2018,35(3):705-710.
|
[6] |
( Liu Linqing, Yu Han, Fei Ning , et al. Key-word Extracting Algorithm from Single Text Based on TextRank[J]. Application Research of Computers, 2018,35(3):705-710.)
|
[7] |
顾亦然, 许梦馨 . 基于PageRank的新闻关键词提取算法[J]. 电子科技大学学报, 2017,46(5):777-783.
|
[7] |
( Gu Yiran, Xu Mengxin . Keyword Extraction from News Articles Based on PageRank Algorithm[J]. Journal of University of Electronic Science and Technology of China, 2017,46(5):777-783.)
|
[8] |
宁建飞, 刘降珍 . 融合Word2Vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6):20-27.
|
[8] |
( Ning Jianfei, Liu Jiangzhen . Using Word2Vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6):20-27.)
|
[9] |
夏天 . 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017,1(2):28-34.
|
[9] |
( Xia Tian . Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.)
|
[10] |
Wan X, Xiao J . Single Document Keyphrase Extraction Using Neighborhood Knowledge [C]// Proceedings of the 23rd National Conference on Artificial Intelligence. 2008: 855-860.
|
[11] |
Gollapalli S D, Caragea C . Extracting Keyphrases from Research Papers Using Citation Networks [C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014.
|
[12] |
赵京胜, 张丽, 肖娜 . 基于复杂网络的中文文本关键词提取研究[J]. 青岛理工大学学报, 2018,39(3):102-108.
|
[12] |
( Zhao Jingsheng, Zhang Li, Xiao Na . Research on the Chinese Text Keyword Extraction Based on Complex Network[J]. Journal of Qingdao University of Technology, 2018,39(3):102-108.)
|
[13] |
常耀成, 张宇翔, 王红 , 等. 特征驱动的关键词提取算法综述[J]. 软件学报, 2018,29(7):2046-2070.
|
[13] |
( Chang Yaocheng, Zhang Yuxiang, Wang Hong , et al. Features Oriented Survey of State-of-the-Art Keyphrase Extraction Algorithms[J]. Journal of Software, 2018,29(7):2046-2070.)
|
[14] |
Page L, Brin S, Motwani R , et al. The PageRank Citation Ranking: Bringing Order to the Web[R]. Stanford InfoLab, 1999.
|
[15] |
任晓龙, 吕琳媛 . 网络重要节点排序方法综述[J]. 科学通报, 2014,59(13):1175-1197.
doi: 10.1360/972013-1280
|
[15] |
( Ren Xiaolong, Lv Linyuan . Review of Ranking Nodes in Complex Networks[J]. Chinese Science Bulletin, 2014,59(13):1175-1197.)
doi: 10.1360/972013-1280
|
[16] |
陆伟, 程齐凯 . 一种基于加权网络和句子窗口方案的信息检索模型[J]. 情报学报, 2013,32(8):797-804.
|
[16] |
( Lu Wei, Cheng Qikai . An Information Retrieval Model Based on Weighted Graph and Sentence[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(8):797-804.)
|
[17] |
刘知远 . 基于文档主题结构的关键词抽取方法研究[D]. 北京: 清华大学, 2011.
|
[17] |
( Liu Zhiyuan . Research on Keyword Extraction Using Document Topical Structure[D]. Beijing: Tsinghua University, 2011.)
|
[18] |
姜雅文 . 复杂网络社区发现若干问题研究[D]. 北京: 北京交通大学, 2014.
|
[18] |
( Jiang Yawen . Community Detection in Complex Networks[D]. Beijing: Beijing Jiaotong University, 2014.)
|
[19] |
刘通 . 基于复杂网络的文本关键词提取算法研究[J]. 计算机应用研究, 2016,33(2):365-369.
|
[19] |
( Liu Tong . Algorithm Research of Text Key Word Extraction Based on Complex Networks[J]. Application Research of Computers, 2016,33(2):365-369.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|