[Objective] This paper reviews the methods, features and evaluation procedures of keyword extraction research, aiming to provide reference for future studies. [Coverage] We searched the Web of Science, DBLP, Engineering Index, Google Scholar, CNKI and Wanfang Data with “Keyword Extraction”, “Keyword Generation”,“Keyphrase Extraction”, and “Keyphrase Generation”, etc. A total of 89 representative literature were retrieved. [Methods] First, we analyzed the development of keyword extraction techniques. Then, we summarized related studies from the perspectives of research methods, characteristics and evaluation process. [Results] The keyword extraction methods, which gradually shifted from feature-driven models to data-driven models due to the development of machine learning, also faced problems like data labeling and evaluation criteria. [Limitations] We examined more mainstream methods for keyword extraction. [Conclusions] This paper summarizes the developing trends of keyword extraction methods, as well as the dis-advantages of existing evaluation mechanism.
( Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al. Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017,28(9):2431-2449.)
[4]
Liu Z, Huang W, Zheng Y, et al. Automatic Keyphrase Extraction via Topic Decomposition[C]// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Massachusetts, USA. Association for Computational Linguistics, 2010: 366-376.
[5]
Hassaïne A, Mecheter S, Jaoua A. Text Categorization Using Hyper Rectangular Keyword Extraction: Application to News Articles Classification[C]// Proceedings of the 15th International Conference on Relational and Algebraic Methods in Computer Science, Braga, Portugal. Springer, 2015,9348:312-325.
[6]
Luhn H P. A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J]. IBM Journal of Research and Development, 1957,1(4):309-317.
[7]
Merrouni Z A, Frikh B, Ouhbi B. Automatic Keyphrase Extraction: A Survey and Trends[J]. Journal of Intelligent Information Systems, 2020,54(2):391-424.
( Chang Yaocheng, Zhang Yuxiang, Wan Huaiyu, et al. Features Oriented Survey of State-of-the-Art Keyphrase Extraction Algorithms[J]. Journal of Software, 2018,29(7):2046-2070.)
[9]
Papagiannopoulou E, Tsoumakas G. A Review of Keyphrase Extraction[J]. Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery, 2020,10(2):e1339.
[10]
Meng R, Zhao S, Han S, et al. Deep Keyphrase Generation[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver,Canada. Association for Computational Linguistics, 2017: 582-592.
[11]
Cohen J D. Highlights: Language- and Domain-Independent Automatic Indexing Terms for Abstracting[J]. Journal of the American Society for Information Science, 1995,46(3):162-174.
[12]
Salton G, Yang C S, Yu C T. A Theory of Term Importance in Automatic Text Analysis[J]. Journal of the American Society for Information Science, 1975,26(1):33-44.
[13]
Matsuo Y, Ishizuka M. Keyword Extracyion from a Single Document Using Word Co-occurrence Statistical Information[J]. International Journal on Artificial Intelligence Tools, 2008,13(1):157-169.
[14]
Barker K, Cornacchia N. Using Noun Phrase Heads to Extract Document Keyphrases[C]// Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, Quebec, Canada. Springer, 2000:40-52.
[15]
Edmundson H P. New Method in Automatic Abstracting[J]. Journal of the ACM, 1969,16(2):264-285.
[16]
Campos R, Mangaravite V, Pasquali A, et al. YAKE! Collection-independent Automatic Keyword Extractor[C]// Proceedings of the 40th European Conference on IR Research, Grenoble, France. Springer, 2018:806-810.
[17]
Mihalcea R, Tarau P. TextRank: Bringing Order into Text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. Association for Computational Linguistics, 2004: 404-411.
[18]
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]// Proceedings of the 23rd AAAI Conference on Artificial Intelligence,Illinois, USA. AAAI Press, 2008: 855-860.
[19]
Danesh S, Sumner T, Martin J H. SGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction[C]// Proceedings of the 4th Joint Conference on Lexical and Computational Semantics, Colorado,USA. 2015: 117-126.
[20]
Florescu C, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada. Association for Computational Linguistics, 2017: 1105-1115.
[21]
Gollapalli S D, Caragea C. Extracting Keyphrases from Research Papers Using Citation Networks[C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence, Quebec, Canada. AAAI Press, 2014: 1629-1635.
[22]
Liu Z, Li P, Zheng Y, et al. Clustering to Find Exemplar Terms for Keyphrase Extraction[C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Suntec, Singapore. ACL, 2009: 257-266.
[23]
Bougouin A, Boudin F, Daille B. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction[C]// Proceedings of the 6th International Joint Conference on Natural Language Processing, Nagoya, Japan. ACL, 2013: 543-551.
[24]
Boudin F. Unsupervised Keyphrase Extraction with Multipartite Graphs[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Louisiana, USA. Association for Computational Linguistics, 2018: 667-672.
[25]
Sterckx L, Demeester T, Deleu J, et al. Topical Word Importance for Fast Keyphrase Extraction[C]// Proceedings of the 24th International Conference on World Wide Web, Florence, Italy. ACM, 2015: 121-122.
[26]
Teneva N, Cheng W. Salience Rank: Efficient Keyphrase Extraction with Topic Modeling[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada. Association for Computational Linguistics, 2017: 530-535.
[27]
Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[J]. Journal of Machine Learning Research, 2011,12:2493-2537.
[28]
Wang R, Liu W, McDonald C. Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors[C]. Software Engineering Research Conference, 2014,39:1-8.
[29]
Wang R, Liu W, McDonald C. Using Word Embeddings to Enhance Keyword Identification for Scientific Publications[C]// Proceedings of the 26th Australasian Database Conference, Melbourne, Australia. Springer, 2015: 257-268.
[30]
Mahata D, Kuriakose J, Shah R R, et al. Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles Using Phrase Embeddings[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, USA. Association for Computational Linguistics, 2018: 634-639.
[31]
Shi W, Zheng W, Yu J X, et al. Keyphrase Extraction Using Knowledge Graphs[J]. Data Science and Engineering, 2017,2(4):275-288.
[32]
Yu Y, Ng V. WikiRank: Improving Keyphrase Extraction Based on Background Knowledge[C]// Proceedings of the 11th Edition of the Language Resources and Evaluation Conference, Miyazaki, Japan. European Language Resources Association, 2018: 3723-3727.
[33]
Tomokiyo T, Hurst M. A Language Model Approach to Keyphrase Extraction[C]// Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan. Association for Computational Linguistics, 2003,18:33-40.
[34]
Frank E, Paynter G W, Witten I H, et al. Domain-Specific Keyphrase Extraction[C]// Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Morgan Kaufmann, 1999: 668-673.
[35]
Wang J, Peng H. Keyphrases Extraction from Web Document by the Least Squares Support Vector Machine[C]// Proceedings of the 2005 IEEE / WIC / ACM International Conference on Web Intelligence, Compiegne, France. IEEE Computer Society, 2005: 293-296.
[36]
Zhang C, Wang H, Liu Y, et al. Automatic Keyword Extraction from Documents Using Conditional Random Fields[J]. Journal of Computer Information Systems, 2008,4(3):1169-1180.
[37]
Ding Z, Zhang Q, Huang X. Keyphrase Extraction from Online News Using Binary Integer Programming[C]// Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand. Association for Computer Linguistics, 2011: 165-173.
[38]
Haddoud M, Mokhtari A, Lecroq T, et al. Accurate Keyphrase Extraction from Scientific Papers by Mining Linguistic Information[C]// Proceedings of the 1st Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics Co-located with 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey. 2015: 12-17.
[39]
Turney P D. Coherent Keyphrase Extraction via Web Mining[C]// Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico. Morgan Kaufmann, 2003: 434-442.
[40]
Nguyen T D, Kan M Y. Keyphrase Extraction in Scientific Publications[C]// Proceedings of the 10th International Conference on Asian Digital Libraries, Hanoi, Vietnam. Springer, 2007: 317-326.
[41]
Medelyan O, Frank E, Witten I H. Human-competitive Tagging Using Automatic Keyphrase Extraction[C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Suntec, Singapore. Association for Computational Linguistics, 2009: 1318-1327.
[42]
Haddoud M, Abdeddaïm S. Accurate Keyphrase Extraction by Discriminating Overlapping Phrases[J]. Journal of Information Science, 2014,40(4):488-500.
[43]
Caragea C, Bulgarov F A, Godea A, et al. Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. Association for Computational Linguistics, 2014: 1435-1446.
[44]
Zhang K, Xu H, Tang J, et al. Keyword Extraction Using Support Vector Machine[C]// Proceedings of the 7th International Conference of Web-Age Information Management, Hong Kong, China. Springer, 2006: 85-96.
[45]
章成志. 基于集成学习的自动标引方法研究[J]. 情报学报, 2010,29(1):3-8.
[45]
( Zhang Chengzhi. Research on Automatic Indexing Method Based on Ensemble Learning[J]. Journal of the China Society for Scientific and Technical Information, 2010,29(1):3-8.)
[46]
Hulth A. Improved Automatic Keyword Extraction Given More Linguistic Knowledge[C]// Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan. Association for Computational Linguistics, 2003: 216-223.
[47]
Ercan G, Cicekli I. Using Lexical Chains for Keyword Extraction[J]. Information Processing & Management, 2007,43(6):1705-1714.
[48]
Sterckx L, Caragea C, Demeester T, et al. Supervised Keyphrase Extraction as Positive Unlabeled Learning[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, USA. Association for Computational Linguistics, 2016: 1924-1929.
[49]
Krapivin M, Autayeu A, Marchese M, et al. Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing[C]// Proceedings of the 12th International Conference on Asia-Pacific Digital Libraries. Springer, 2010: 102-111.
[50]
Sarkar K, Nasipuri M, Ghose S. Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks[J]. Journal of Information Processing Systems, 2012,8(4):693-712.
[51]
Aquino G O, Lanzarini L C. Keyword Identification in Spanish Documents Using Neural Networks[J]. Journal of Computer Science and Technology, 2015,15(2):55-60.
[52]
Zhang Q, Wang Y, Gong Y, et al. Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin,USA. Association for Computational Linguistics, 2016: 836-845.
[53]
Basaldella M, Antolli E, Serra G, et al. Bidirectional Lstm Recurrent Neural Network for Keyphrase Extraction[C]// Proceedings of the 14th Italian Research Conference on Digital Libraries, Udine, Italy. Springer, 2018: 180-187.
[54]
Alzaidy R, Caragea C, Giles C L. Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents[C]// Proceedings of the 2019 World Wide Web Conference. ACM, 2019: 2551-2557.
[55]
Bhaskar P, Nongmeikapam K, Bandyopadhyay S. Keyphrase Extraction in Scientific Articles: A Supervised Approach[C]// Proceedings of the 24th International Conference on Computational Linguistics, Austin, USA. Indian Institute of Technology Bombay, 2012: 17-24.
[56]
Gollapalli S D, Li X L, Yang P. Incorporating Expert Knowledge into Keyphrase Extraction[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA. AAAI Press, 2017: 3180-3187.
[57]
Liu Z, Chen X, Zheng Y, et al. Automatic Keyphrase Extraction by Bridging Vocabulary Gap[C]// Proceedings of the 15th Conference on Computational Natural Language Learning, Portland, USA. ACL, 2011: 135-144.
[58]
Koehn P. Statistical Machine Translation[M]. Cambridge,UK: Cambridge University Press, 2010.
[59]
Brown P F, Pietra S D, Pietra V J D, et al. The Mathematics of Statistical Machine Translation: Parameter Estimation[J]. Computational Linguistics, 1993,19(2):263-311.
[60]
Cho K, van Merrienboer B, Gülçehre Ç, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. ACL, 2014: 1724-1734.
[61]
Chen J, Zhang X, Wu Y, et al. Keyphrase Generation with Correlation Constraints[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. Association for Computational Linguistics, 2018: 4057-4066.
[62]
Zhang Y, Xiao W. Keyphrase Generation Based on Deep Seq2Seq Model[J]. IEEE Access, 2018,6:46047-46057.
[63]
Chen W, Gao Y, Zhang J, et al. Title-Guided Encoding for Keyphrase Generation[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu,USA. AAAI Press, 2019: 6268-6275.
[64]
Chen W, Chan H P, Li P, et al. Exclusive Hierarchical Decoding for Deep Keyphrase Generation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020: 1095-1105.
[65]
Chen W, Chan H P, Li P, et al. An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis,USA. Association for Computational Linguistics, 2019: 2846-2856.
[66]
Wang Y, Li J, Chan H P, et al. Topic-Aware Neural Keyphrase Generation for Social Media Language[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, 2019: 2516-2526.
[67]
Chan H P, Chen W, Wang L, et al. Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards[C]// Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, 2019: 2163-2174.
[68]
Ye H, Wang L. Semi-Supervised Learning for Neural Keyphrase Generation[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels,Belgium. Association for Computational Linguistics, 2018: 4142-4153.
[69]
Wang Y, Liu Q, Qin C, et al. Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction[C]// Proceedings of the 2018 IEEE International Conference on Data Mining, Sentosa, Singapore. IEEE Computer Society, 2018: 597-606.
[70]
Jones K S. A Statistical Interpretation of Term Specificity and Its Application in Retrieval[J]. Journal of Documentation, 1972,28(1):11-21.
[71]
Salton G, Buckley C. Term-Weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988,24(5):513-523.
[72]
Zhang W, Feng W, Wang J. Integrating Semantic Relatedness and Words’ Intrinsic Features for Keyword Extraction[C]// Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China. IJCAI, 2013: 1115-2231.
[73]
Nguyen T D, Luong M T. WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure[C]// Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden. Association for Computer Linguistics, 2010: 166-169.
[74]
Marujo L, Gershman A, Carbonell J G, et al. Supervised Topical Key Phrase Extraction of News Stories Using Crowdsourcing, Light Filtering and Co-reference Normalization[C]// Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul,Turkey. European Language Resources Association, 2012: 399-403.
[75]
Boudin F. A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction[C]// Proceedings of the 6th International Joint Conference on Natural Language Processing, Nagoya, Japan. ACL, 2013: 834-838.
[76]
Eichler K, Neumann G. DFKI KeyWE: Ranking Keyphrases Extracted from Scientific Articles[C]// Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala,Sweden. Association for Computer Linguistics, 2010: 150-153.
[77]
Berend G. Exploiting Extra-textual and Linguistic Information in Keyphrase Extraction[J]. Natural Language Engineering, 2016,22(1):73-95.
[78]
Zhang Y, Zhang C. Using Human Attention to Extract Keyphrase from Microblog Post[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence,Italy. Association for Computational Linguistics, 2019: 5867-5872.
[79]
Zhang Y, Zhang C. Enhancing Keyphrase Extraction from Microblogs Using Human Reading Time[J]. Journal of the Association for Information Science and Technology, 2020.
[80]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the 1st International Conference on Learning Representations, Scottsdale,USA. Association for Computational Linguistics, 2013: 1-12.
[81]
Pennington J, Socher R, Manning C D. Glove: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. Association for Computational Linguistics, 2014: 1532-1543.
[82]
Zhang Y, Zhang C, Li J. Joint Modeling of Characters, Words, and Conversation Contexts for Microblog Keyphrase Extraction[J]. Journal of the Association for Information Science and Technology, 2020,71(5):553-567.
[83]
Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval[M]. Cambridge,UK: Cambridge University Press, 2008.
[84]
Voorhees E M. The TREC-8 Question Answering Track Report[C]// Proceedings of the 8th Text Retrieval Conference, Gaithersburg,USA. National Institute of Standards and Technology (NIST), 1999: 246-500.
[85]
Liu L, Özsu M T. Encyclopedia of Database Systems[M]. New York,USA: Springer US, 2009.
[86]
Ristad E S, Yianilos P N. Learning String-edit Distance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(5):522-532.
[87]
Dagan I, Pereira F C N, Lee L. Similarity-Based Estimation of Word Cooccurrence Probabilities[C]// Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, USA. ACL, 1994: 272-278.
( Zhang Chengzhi, Zhou Dongmin. General Evaluation Model for Automatic Indexing[J]. Journal of the China Society for Scientific and Technical Information, 2009,28(1):40-47.)
[89]
Chen P I, Lin S J. Automatic Keyword Prediction Using Google Similarity Distance[J]. Expert Systems with Applications, 2010,37(3):1928-1938.