|
|
Review of Automatic Citation Classification Based on Machine Learning |
Zhou Zhichao( ) |
Health Science Library, Peking University, Beijing 100191, China |
|
|
Abstract [Objective] This paper summarizes the application of natural language processing and machine learning technology in automatic citation classification. [Coverage] We searched “citation classification”, “citation polarity”, “citation function” and “feature selection” with Scopus database, and retrieved a total of 46 representative literature. [Methods] These research was reviewed from the perspectives of citation classification process, tasks and methods. Then, we discussed their future development trends and challenges. [Results] The research of citation classification is shifting from multi-class to binary class. Deep learning model can classify sentiments and functions of citations simultaneously. The challenges facing automatic citation classification include single discipline corpus, controversial definition of citation contexts and unbalanced classification data. [Limitations] This review does not discuss many classification systems in the industry. [Conclusions] We need to develop the evaluation method for re-using scientific research data such as codes, data and corpus, which could help to build open science. Combining citation classification and counts could establish a multi-dimensional evaluation model. Based on the user’s search results, the system could recommend documents supporting or objecting the related research for further reading.
|
Received: 20 June 2021
Published: 20 January 2022
|
|
Fund:CALIS National Information Center in Medicine(CALIS-2020-01-003) |
Corresponding Authors:
Zhou Zhichao,ORCID:0000-0003-2498-6532
E-mail: zhouzc1987@bjmu.edu.cn
|
[1] |
Hirsch J E. An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(46): 16569-16572.
pmid: 16275915
|
[2] |
Egghe L. Theory and Practise of the G-Index[J]. Scientometrics, 2006, 69(1): 131-152.
doi: 10.1007/s11192-006-0144-7
|
[3] |
Metron R K. The Sociology of Science: Theoretical and Empirical Investigations[M]. Chicago: University of Chicago Press, 1973: 50-62.
|
[4] |
Geras A, Siudem G, Gagolewski M. Should We Introduce a Dislike Button for Academic Articles?[J]. Journal of the Association for Information Science and Technology, 2020, 71(2): 221-229.
doi: 10.1002/asi.v71.2
|
[5] |
Gilbert G N. Referencing as Persuasion[J]. Social Studies of Science, 1977, 7(1): 113-122.
doi: 10.1177/030631277700700112
|
[6] |
陆伟, 孟睿, 刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014, 40(6): 93-104.
|
[6] |
(Lu Wei, Meng Rui, Liu Xingbang. A Deep Scientific Literature Mining-Oriented Framework for Citation Content Annotation[J]. Journal of Library Science in China, 2014, 40(6): 93-104.)
|
[7] |
Aljaber B, Martinez D, Stokes N, et al. Improving MeSH Classification of Biomedical Articles Using Citation Contexts[J]. Journal of Biomedical Informatics, 2011, 44(5): 881-896.
doi: 10.1016/j.jbi.2011.05.007
pmid: 21683802
|
[8] |
Zhang G, Ding Y Milojević S. Citation Content Analysis (CCA): A Framework for Syntactic and Semantic Analysis of Citation Content[J]. Journal of the American Society for Information Science and Technology, 2013, 64(7): 1490-1503.
doi: 10.1002/asi.2013.64.issue-7
|
[9] |
Cronin B. The Citation Process: The Role and Significance of Citations in Scientific Communication[M]. London: Taylor Graham, 1984: 26-28.
|
[10] |
Abu-Jbara A, Radev D. Reference Scope Identification in Citing Sentences [C]//Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012: 80-90.
|
[11] |
Teufel S, Siddharthan A, Tidhar D. Automatic Classification of Citation Function [C]//Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006). 2006: 103-110.
|
[12] |
王文娟, 马建霞, 陈春, 等. 引文文本分类与实现方法研究综述[J]. 图书情报工作, 2016, 60(6): 118-127.
|
[12] |
(Wang Wenjuan, Ma Jianxia, Chen Chun, et al. A Review of Citation Context Classifications and Implementation Methods[J]. Library and Information Service, 2016, 60(6): 118-127.)
|
[13] |
尹莉, 郭璐, 李旭芬. 基于引用功能和引用极性的一个引用分类模型研究[J]. 情报杂志, 2018, 37(7): 139-145.
|
[13] |
(Yin Li, Guo Lu, Li Xufen. An Empirical Study on Citation Classification Based on Citation Function and Citation Polarity[J]. Journal of Intelligence, 2018, 37(7): 139-145.)
|
[14] |
王婧. 引文内容分析研究进展[J]. 内蒙古科技与经济, 2020(17): 57-59.
|
[14] |
(Wang Jing. Research Progress of Citation Content Analysis[J]. Inner Mongolia Science Technology & Economy, 2020(17): 57-59.)
|
[15] |
Bakhti K, Niu Z D, Nyamawe A S. Semi-Automatic Annotation for Citation Function Classification [C]//Proceedings of 2018 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO). 2018: 43-47.
|
[16] |
Tahamtan I, Bornmann L. What do Citation Counts Measure? An Updated Review of Studies on Citations in Scientific Documents Published Between 2006 and 2018[J]. Scientometrics, 2019, 121(3): 1635-1684.
doi: 10.1007/s11192-019-03243-4
|
[17] |
Iqbal S, Hassan S U, Aljohani N R, et al. A Decade of In-Text Citation Analysis Based on Natural Language Processing and Machine Learning Techniques: An Overview of Empirical Studies[J]. Scientometrics, 2021, 126(8): 6551-6599.
doi: 10.1007/s11192-021-04055-1
|
[18] |
Dong C, Schäfer U. Ensemble-Style Self-Training on Citation Classification [C]//Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 623-631.
|
[19] |
Athar A. Sentiment Analysis of Citations Using Sentence Structure-Based Features [C]//Proceedings of the ACL-HLT 2011 Student Session. 2011: 81-87.
|
[20] |
Goodarzi M, Mahmoudi M T, Zamani R. A Framework for Sentiment Analysis on Schema-Based Research Content via Lexica Analysis [C]//Proceedings of the 7th International Symposium on Telecommunications (IST’2014). 2014: 405-411.
|
[21] |
Xu J, Zhang Y, Wu Y, et al. Citation Sentiment Analysis in Clinical Trial Papers[J]. AMIA Annual Symposium Proceedings, 2015: 1334-1341.
|
[22] |
Ritchie A, Robertson S, Teufel S. Comparing Citation Contexts for Information Retrieval [C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008: 213-222.
|
[23] |
Athar A, Teufel S. Context-Enhanced Citation Sentiment Detection [C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012: 597-601.
|
[24] |
Teufel S, Siddharthan A, Tidhar D. An Annotation Scheme for Citation Function [C]//Proceedings of the 7th SIGDIAL Workshop on Discourse and Dialogue. 2009: 80-87.
|
[25] |
Bertin M, Atanassova I. The Context of Multiple In-Text References and Their Signification[J]. International Journal on Digital Libraries, 2018, 19(2-3): 127-138.
doi: 10.1007/s00799-017-0225-7
|
[26] |
Jochim C, Hinrich S. Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme [C]//Proceedings of the 24th International Conference on Computational Linguistics. 2012: 1343-1358.
|
[27] |
Valenzuela M, Ha V, Etzioni O. Identifying Meaningful Citations [C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 21-26.
|
[28] |
Siddharthan A, Teufel S. Whose Idea was This, and Why does IT Matter? Attributing Scientific Work to Citations [C]//Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 2007: 316-323.
|
[29] |
Hassan S, Iqbal S, Imran M, et al. Mining the Context of Citations in Scientific Publications [C]//Proceedings of the 20th International Conference on Asia Pacific Digital Libraries. 2018: 316-322.
|
[30] |
Sula C A, Miller M. Citations, Contexts, and Humanistic Discourse: Toward Automatic Extraction and Classification[J]. Literary and Linguistic Computing, 2014, 29(3): 452-464.
doi: 10.1093/llc/fqu019
|
[31] |
Hassan S U, Imran M, Iqbal S, et al. Deep Context of Citations Using Machine-Learning Models in Scholarly Full-Text Articles[J]. Scientometrics, 2018, 117(3): 1645-1662.
doi: 10.1007/s11192-018-2944-y
|
[32] |
Munkhdalai T, Lalor J, Yu H. Citation Analysis with Neural Attention Models [C]//Proceedings of the 7th International Workshop on Health Text Mining and Information Analysis. 2016: 69-77.
|
[33] |
Athar A, Teufel S. Detection of Implicit Citations for Sentiment Detection [C]//Proceedings of the Workshop on Detecting Structure in Scholarly Discourse. 2012: 18-26.
|
[34] |
Radev D R, Muthukrishnan P, Qazvinian V, et al. The ACL Anthology Network Corpus[J]. Language Resources and Evaluation, 2013, 47(4): 919-944.
doi: 10.1007/s10579-012-9211-2
|
[35] |
Parthasarathy G, Tomar D C. Sentiment Analyzer: Analysis of Journal Citations from Citation Databases [C]//Proceedings of the 5th International Conference-Confluence the Next Generation Information Technology Summit (Confluence). 2014: 923-928.
|
[36] |
Sendhilkumar S, Elakkiya E, Mahalakshmi G S. Citation Semantic Based Approaches to Identify Article Quality [C]//Proceedings of the 3rd International Conference on Computer Science, Engineering & Applications. 2013: 411-420.
|
[37] |
Kim I C, Thoma G R. Automated Classification of Author’s Sentiments in Citation Using Machine Learning Techniques: A Preliminary Study [C]//Proceedings of 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology. 2015: 1-7.
|
[38] |
Fu X H, Liu W W, Xu Y Y, et al. Combine HowNet Lexicon to Train Phrase Recursive Autoencoder for Sentence-Level Sentiment Analysis[J]. Neurocomputing, 2017, 241: 18-27.
doi: 10.1016/j.neucom.2017.01.079
|
[39] |
Huang S, Niu Z D, Shi C Y. Automatic Construction of Domain-Specific Sentiment Lexicon Based on Constrained Label Propagation[J]. Knowledge-Based Systems, 2014, 56: 191-200.
doi: 10.1016/j.knosys.2013.11.009
|
[40] |
Teufel S, Moens M. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status[J]. Computational Linguistics, 2002, 28(4): 409-445.
doi: 10.1162/089120102762671936
|
[41] |
Abu-Jbara A, Ezra J, Radev D. Purpose and Polarity of Citation: Towards NLP-Based Bibliometrics [C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013: 596-606.
|
[42] |
Jha R, Jbara A, Qazvinian V, et al. NLP-Driven Citation Analysis for Scientometrics[J]. Natural Language Engineering, 2017, 23(1): 93-130.
doi: 10.1017/S1351324915000443
|
[43] |
Agarwal S, Choubey L, Yu H. Automatically Classifying the Role of Citations in Biomedical Articles[J]. AMIA Annual Symposium Proceedings, 2010. PMCID:PMC3041379.
|
[44] |
Sugiyama K, Kumar T, Kan M Y, et al. Identifying Citing Sentences in Research Papers Using Supervised Learning [C]//Proceedings of 2010 International Conference on Information Retrieval & Knowledge Management (CAMP). 2010: 67-72.
|
[45] |
Wang W J, Villavicencio P, Watanabe T. Analysis of Reference Relationships among Research Papers, Based on Citation Context[J]. International Journal on Artificial Intelligence Tools, 2012, 21(2): 1240004.
doi: 10.1142/S0218213012400040
|
[46] |
Small H. Characterizing Highly Cited Method and Non-Method Papers Using Citation Contexts: The Role of Uncertainty[J]. Journal of Informetrics, 2018, 12(2): 461-480.
doi: 10.1016/j.joi.2018.03.007
|
[47] |
Zhu X D, Turney P, Lemire D, et al. Measuring Academic Influence: Not All Citations are Equal[J]. Journal of the Association for Information Science and Technology, 2015, 66(2): 408-427.
doi: 10.1002/asi.2015.66.issue-2
|
[48] |
Pride D, Knoth P. Incidental or Influential? A Decade of Using Text-Mining for Citation Function Classification [C]//Proceedings of the 16th International Society of Scientometrics and Informetrics Conference. 2017: 1357-1367.
|
[49] |
Hassan S U, Akram A, Haddawy P. Identifying Important Citations Using Contextual Information from Full Text [C]//Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries. 2017: 41-48.
|
[50] |
Rao G Z, Huang W H, Feng Z Y, et al. LSTM with Sentence Representations for Document-Level Sentiment Classification[J]. Neurocomputing, 2018, 308: 49-57.
doi: 10.1016/j.neucom.2018.04.045
|
[51] |
Wang J, Peng B, Zhang X J. Using a Stacked Residual LSTM Model for Sentiment Intensity Prediction[J]. Neurocomputing, 2018, 322(17): 93-101.
doi: 10.1016/j.neucom.2018.09.049
|
[52] |
Lauscher A, Glavaš G, Ponzetto S P, et al. Investigating Convolutional Networks and Domain-Specific Embeddings for Semantic Classification of Citations [C]//Proceedings of the 6th International Workshop on Mining Scientific Publications. 2017: 24-28.
|
[53] |
Yousif A, Niu Z D, Chambua J, et al. Multi-Task Learning Model Based on Recurrent Convolutional Neural Networks for Citation Sentiment and Purpose Classification[J]. Neurocomputing, 2019, 335: 195-205.
doi: 10.1016/j.neucom.2019.01.021
|
[54] |
Roman M, Shahid A, Khan S, et al. Citation Intent Classification Using Word Embedding[J]. IEEE Access, 2021, 9: 9982-9995.
doi: 10.1109/Access.6287639
|
[55] |
Aljohani N R, Fayoumi A, Hassan S U. An In-Text Citation Classification Predictive Model for a Scholarly Search System[J]. Scientometrics, 2021, 126(7): 5509-5529.
doi: 10.1007/s11192-021-03986-z
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|