Extracting Key-phrases from Chinese Scholarly Papers
Xia Tian()
Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,Renmin University of China, Beijing 100872, China;School of Information Resource Management, Renmin University of China, Beijing 100872, China
[Objective] This paper propose a new method to extract key-phrases from Chinese scholarly articles, aiming to provide concept representation at phrase level for academic text mining.[Methods] First, we introduced the cohesion and freedom concepts to measure the internal tightness of phrases and free collocation ability of boundary words. It helped us compute the authority of bi-word phrases. Then, we merged our list with phrases extracted by position-weighted method. Finally, the TopN elements were retrieved as the final key phrases.[Results] We examined the proposed PhraseRank method with Chinese academic papers, and found its precision, recall and R-MAP values were significantly higher than those of the traditional WordRank algorithm. Among them, the R-MAP value increased by more than 128%.[Limitations] Our method could not identify key phrases with three or more words.[Conclusions] The keyphrases extracted by PhraseRank, which are more consistent with manually labeled results than keywords, effectively describe characteristics of Chinese scholarly papers.
Chen H H, Treeratpituk P, Mitra P, et al. CSSeer: An Expert Recommendation System Based on CiteseerX[C] //Proceedings of the 13th ACM/IEEE-IC Joint Conference on Digital Libraries (JCDL 2013). 2013: 381-382.
[2]
Collins A, Beel J. Document Embeddings vs. Keyphrases vs . Terms for Recommender Systems: A Large-Scale Online Evaluation[C] //Proceedings of the 18th Joint Conference on Digital Libraries (JCDL 2019). 2019: 130-133.
[3]
Griffiths T L, Steyvers M. Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004,101(S1):5228-5235.
[4]
Papagiannopoulou E, Tsoumakas G. A Review of Keyphrase Extraction[OL]. arXiv Preprint, arXiv:1905.05044.
[5]
Sifatullah S, Aditi S. Keyword and Keyphrase Extraction Techniques: A Literature Review[J]. International Journal of Computer Applications, 2015,109(2):18-23.
[6]
Hasan K S, Ng V. Automatic Keyphrase Extraction: A Survey of the State of the Art[C] //Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014: 1262-1273.
[7]
Mahata D, Shah R R, Kuriakose J, et al. Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles Using Phrase Embeddings[C] //Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). 2018: 634-639.
( Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al. Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017,28(9):2431-2449.)
[9]
Turney P D. Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
[10]
Zhang Y, Xiao W. Keyphrase Generation Based on Deep Seq2seq Model[J]. IEEE Access, 2018,6:46047-46057.
[11]
Mothe J, Ramiandrisoa F, Rasolomanana M. Automatic Keyphrase Extraction Using Graph-based Methods[C] //Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 2018: 728-730.
[12]
El-Beltagy S R, Rafea A. KP-Miner: A Keyphrase Extraction System for English and Arabic Documents[J]. Information Systems, 2009,34(1):132-144.
[13]
Liu Z, Li P, Zheng Y, et al. Clustering to Find Exemplar Terms for Keyphrase Extraction[C] //Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009: 257-266.
[14]
Campos R, Mangaravite V, Pasquali A, et al. A Text Feature Based Automatic Keyword Extraction Method for Single Documents[A] //Proceedings of the 40th European Conference on IR Research. 2018: 684-691.
[15]
Won M, Martins B, Raimundo F. Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition[C] //Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing. 2019.
[16]
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C] //Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[17]
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C] //Proceedings of the 23rd AAAI Conference on Artificial Intelligence. 2008: 855-860.
[18]
Rose S, Engel D, Cramer N, et al. Automatic Keyword Extraction from Individual Documents[A]// Text Mining: Applications and Theory[M]. Wiley, 2010,1:1-20.
[19]
Danesh S, Sumner T, Martin J H. SGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction[C] //Proceedings of the 14th Joint Conference on Lexical and Computational Semantics. 2015: 117-126.
[20]
Florescu C, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.
( Liu Xiaojian, Xie Fei. Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(6):664-672.)
( Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7):41-47.)
( Liu Zhuchen, Chen Hao, Yu Yanhua, et al. Extracting Keywords with TextRank and Weighted Word Positions[J]. Data Analysis and Knowledge Discovery, 2018,2(9):74-79.)
( Sun Mingzhu, Ma Jing, Qian Lingfei. Extracting Keywords Based on Topic Structure and Word Diagram Iteration[J]. Data Analysis and Knowledge Discovery, 2019,3(8):68-76.)
( Fang Junwei, Cui Haoran, He Guoxiu, et al. Keyword Extraction of Academic Text with TextRank Model Based on Prior Knowledge[J]. Information Science, 2019,37(3):77-82.)