|
|
Extracting Key-phrases from Chinese Scholarly Papers |
Xia Tian() |
Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,Renmin University of China, Beijing 100872, China;School of Information Resource Management, Renmin University of China, Beijing 100872, China |
|
|
Abstract [Objective] This paper propose a new method to extract key-phrases from Chinese scholarly articles, aiming to provide concept representation at phrase level for academic text mining.[Methods] First, we introduced the cohesion and freedom concepts to measure the internal tightness of phrases and free collocation ability of boundary words. It helped us compute the authority of bi-word phrases. Then, we merged our list with phrases extracted by position-weighted method. Finally, the TopN elements were retrieved as the final key phrases.[Results] We examined the proposed PhraseRank method with Chinese academic papers, and found its precision, recall and R-MAP values were significantly higher than those of the traditional WordRank algorithm. Among them, the R-MAP value increased by more than 128%.[Limitations] Our method could not identify key phrases with three or more words.[Conclusions] The keyphrases extracted by PhraseRank, which are more consistent with manually labeled results than keywords, effectively describe characteristics of Chinese scholarly papers.
|
Received: 01 February 2020
Published: 25 July 2020
|
|
Corresponding Authors:
Xia Tian
E-mail: xiat@ruc.edu.cn
|
[1] |
Chen H H, Treeratpituk P, Mitra P, et al. CSSeer: An Expert Recommendation System Based on CiteseerX[C] //Proceedings of the 13th ACM/IEEE-IC Joint Conference on Digital Libraries (JCDL 2013). 2013: 381-382.
|
[2] |
Collins A, Beel J. Document Embeddings vs. Keyphrases vs . Terms for Recommender Systems: A Large-Scale Online Evaluation[C] //Proceedings of the 18th Joint Conference on Digital Libraries (JCDL 2019). 2019: 130-133.
|
[3] |
Griffiths T L, Steyvers M. Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004,101(S1):5228-5235.
|
[4] |
Papagiannopoulou E, Tsoumakas G. A Review of Keyphrase Extraction[OL]. arXiv Preprint, arXiv:1905.05044.
|
[5] |
Sifatullah S, Aditi S. Keyword and Keyphrase Extraction Techniques: A Literature Review[J]. International Journal of Computer Applications, 2015,109(2):18-23.
|
[6] |
Hasan K S, Ng V. Automatic Keyphrase Extraction: A Survey of the State of the Art[C] //Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014: 1262-1273.
|
[7] |
Mahata D, Shah R R, Kuriakose J, et al. Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles Using Phrase Embeddings[C] //Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). 2018: 634-639.
|
[8] |
赵京胜, 朱巧明, 周国栋, 等. 自动关键词抽取研究综述[J]. 软件学报, 2017,28(9):2431-2449.
|
[8] |
( Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al. Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017,28(9):2431-2449.)
|
[9] |
Turney P D. Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
|
[10] |
Zhang Y, Xiao W. Keyphrase Generation Based on Deep Seq2seq Model[J]. IEEE Access, 2018,6:46047-46057.
|
[11] |
Mothe J, Ramiandrisoa F, Rasolomanana M. Automatic Keyphrase Extraction Using Graph-based Methods[C] //Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 2018: 728-730.
|
[12] |
El-Beltagy S R, Rafea A. KP-Miner: A Keyphrase Extraction System for English and Arabic Documents[J]. Information Systems, 2009,34(1):132-144.
|
[13] |
Liu Z, Li P, Zheng Y, et al. Clustering to Find Exemplar Terms for Keyphrase Extraction[C] //Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009: 257-266.
|
[14] |
Campos R, Mangaravite V, Pasquali A, et al. A Text Feature Based Automatic Keyword Extraction Method for Single Documents[A] //Proceedings of the 40th European Conference on IR Research. 2018: 684-691.
|
[15] |
Won M, Martins B, Raimundo F. Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition[C] //Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing. 2019.
|
[16] |
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C] //Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
|
[17] |
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C] //Proceedings of the 23rd AAAI Conference on Artificial Intelligence. 2008: 855-860.
|
[18] |
Rose S, Engel D, Cramer N, et al. Automatic Keyword Extraction from Individual Documents[A]// Text Mining: Applications and Theory[M]. Wiley, 2010,1:1-20.
|
[19] |
Danesh S, Sumner T, Martin J H. SGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction[C] //Proceedings of the 14th Joint Conference on Lexical and Computational Semantics. 2015: 117-126.
|
[20] |
Florescu C, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.
|
[21] |
刘啸剑, 谢飞. 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016,35(6):664-672.
|
[21] |
( Liu Xiaojian, Xie Fei. Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(6):664-672.)
|
[22] |
夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9):30-34.
|
[22] |
( Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9):30-34.)
|
[23] |
顾益军, 夏天. 融合LDA与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2014(7):41-47.
|
[23] |
( Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7):41-47.)
|
[24] |
夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017,1(2):28-34.
|
[24] |
( Xia Tian. Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.)
|
[25] |
李航, 唐超兰, 杨贤, 等. 融合多特征的TextRank关键词抽取方法[J]. 情报杂志, 2017,36(8):187-191.
|
[25] |
( Li Hang, Tang Chaolan, Yang Xian, et al. TextRank Keyword Extraction Based on Multi Feature Fusion[J]. Journal of Intelligence, 2017,36(8):187-191.)
|
[26] |
刘竹辰, 陈浩, 于艳华, 等. 词位置分布加权TextRank的关键词提取[J]. 数据分析与知识发现, 2018,2(9):74-79.
|
[26] |
( Liu Zhuchen, Chen Hao, Yu Yanhua, et al. Extracting Keywords with TextRank and Weighted Word Positions[J]. Data Analysis and Knowledge Discovery, 2018,2(9):74-79.)
|
[27] |
孙明珠, 马静, 钱玲飞. 基于文档主题结构和词图迭代的关键词抽取方法研究[J]. 数据分析与知识发现, 2019,3(8):68-76.
|
[27] |
( Sun Mingzhu, Ma Jing, Qian Lingfei. Extracting Keywords Based on Topic Structure and Word Diagram Iteration[J]. Data Analysis and Knowledge Discovery, 2019,3(8):68-76.)
|
[28] |
方俊伟, 崔浩冉, 贺国秀, 等. 基于先验知识TextRank的学术文本关键词抽取[J]. 情报科学, 2019,37(3):77-82.
|
[28] |
( Fang Junwei, Cui Haoran, He Guoxiu, et al. Keyword Extraction of Academic Text with TextRank Model Based on Prior Knowledge[J]. Information Science, 2019,37(3):77-82.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|