Extracting Key-phrases from Chinese Scholarly Papers
Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education,Renmin University of China, Beijing 100872, China;School of Information Resource Management, Renmin University of China, Beijing 100872, China
[Objective] This paper propose a new method to extract key-phrases from Chinese scholarly articles, aiming to provide concept representation at phrase level for academic text mining.[Methods] First, we introduced the cohesion and freedom concepts to measure the internal tightness of phrases and free collocation ability of boundary words. It helped us compute the authority of bi-word phrases. Then, we merged our list with phrases extracted by position-weighted method. Finally, the TopN elements were retrieved as the final key phrases.[Results] We examined the proposed PhraseRank method with Chinese academic papers, and found its precision, recall and R-MAP values were significantly higher than those of the traditional WordRank algorithm. Among them, the R-MAP value increased by more than 128%.[Limitations] Our method could not identify key phrases with three or more words.[Conclusions] The keyphrases extracted by PhraseRank, which are more consistent with manually labeled results than keywords, effectively describe characteristics of Chinese scholarly papers.
Chen H H, Treeratpituk P, Mitra P, et al. CSSeer: An Expert Recommendation System Based on CiteseerX[C] //Proceedings of the 13th ACM/IEEE-IC Joint Conference on Digital Libraries (JCDL 2013). 2013: 381-382.
Collins A, Beel J. Document Embeddings vs. Keyphrases vs . Terms for Recommender Systems: A Large-Scale Online Evaluation[C] //Proceedings of the 18th Joint Conference on Digital Libraries (JCDL 2019). 2019: 130-133.
Griffiths T L, Steyvers M. Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004,101(S1):5228-5235.
Papagiannopoulou E, Tsoumakas G. A Review of Keyphrase Extraction[OL]. arXiv Preprint, arXiv:1905.05044.
Sifatullah S, Aditi S. Keyword and Keyphrase Extraction Techniques: A Literature Review[J]. International Journal of Computer Applications, 2015,109(2):18-23.
Hasan K S, Ng V. Automatic Keyphrase Extraction: A Survey of the State of the Art[C] //Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014: 1262-1273.
Mahata D, Shah R R, Kuriakose J, et al. Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles Using Phrase Embeddings[C] //Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). 2018: 634-639.
( Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al. Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017,28(9):2431-2449.)
Turney P D. Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000,2(4):303-336.
Zhang Y, Xiao W. Keyphrase Generation Based on Deep Seq2seq Model[J]. IEEE Access, 2018,6:46047-46057.
Mothe J, Ramiandrisoa F, Rasolomanana M. Automatic Keyphrase Extraction Using Graph-based Methods[C] //Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 2018: 728-730.
El-Beltagy S R, Rafea A. KP-Miner: A Keyphrase Extraction System for English and Arabic Documents[J]. Information Systems, 2009,34(1):132-144.
Liu Z, Li P, Zheng Y, et al. Clustering to Find Exemplar Terms for Keyphrase Extraction[C] //Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009: 257-266.
Campos R, Mangaravite V, Pasquali A, et al. A Text Feature Based Automatic Keyword Extraction Method for Single Documents[A] //Proceedings of the 40th European Conference on IR Research. 2018: 684-691.
Won M, Martins B, Raimundo F. Automatic Extraction of Relevant Keyphrases for the Study of Issue Competition[C] //Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing. 2019.
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C] //Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C] //Proceedings of the 23rd AAAI Conference on Artificial Intelligence. 2008: 855-860.
Rose S, Engel D, Cramer N, et al. Automatic Keyword Extraction from Individual Documents[A]// Text Mining: Applications and Theory[M]. Wiley, 2010,1:1-20.
Danesh S, Sumner T, Martin J H. SGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction[C] //Proceedings of the 14th Joint Conference on Lexical and Computational Semantics. 2015: 117-126.
Florescu C, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.