Research Progress on Citation Analysis of Scientific Papers
Wang Lu,Le Xiaoqiu()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China;Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper reviews the research progress of citation content analysis in recent years and clarifies the research direction and technology development trend. [Coverage] HowNet, Scopus, Semantic Scholar, and other search platforms are used to search papers with keywords such as “citation full text”, “citation context”, “citation content” and so on, and manual screening is conducted. [Methods] Research on citation analysis is summarized and compared from four aspects: discrimination of relevant concepts, main research directions, key technologies, analysis tools and platforms, and existing problems and future research directions are raised. [Results] New ideas and methods are emerging in citation content analysis research directions such as citation motivation, citation evaluation, knowledge flow, and paper recommendation. Key common technologies for citation content analysis have achieved much progress in citation extraction, citation location identification, citation sentiment analysis, and knowledge point identification. [Limitations] It mainly summarizes and analyzes the relevant research from the macro level and does not elaborate on the content in all aspects in-depth. [Conclusions] Citation content analysis has unique advantages over citation analysis. With the rapid iteration of natural language processing technology, it will have a broad development prospect.
王露, 乐小虬. 科技论文引用内容分析研究进展[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
Wang Lu, Le Xiaoqiu. Research Progress on Citation Analysis of Scientific Papers. Data Analysis and Knowledge Discovery, 2022, 6(4): 1-15.
Small H. Citation Context Analysis[J]. Progress in Communication Sciences, 1982, 8(3):287-310.
[2]
祝清松, 冷伏海. 引文内容分析方法研究综述[J]. 情报资料工作, 2013(5):39-43.
[2]
( Zhu Qingsong, Leng Fuhai. A Review of Research on Citation Content Analysis Method[J]. Information and Documentation Services, 2013(5):39-43.)
[3]
Ding Y, Zhang G, Chambers T, et al. Content-Based Citation Analysis: The Next Generation of Citation Analysis[J]. Journal of the Association for Information Science and Technology, 2014, 65(9):1820-1833.
doi: 10.1002/asi.23256
( Zhao Rongying, Zeng Xianqin, Chen Bikun. Citation in Full-Text: The Development of Citation Analysis[J]. Library and Information Service, 2014, 58(9):129-135.)
[5]
胡志刚. 全文引文分析方法与应用[D]. 大连: 大连理工大学, 2014.
[5]
( Hu Zhigang. Full-Text Citation Analysis and Applications[D]. Dalian: Dalian University of Technology, 2014.)
( Liu Shengbo, Ding Kun, Zhang Chunbo. New Stage of Citation Analysis: From Citation Description Analysis to Citation Context Analysis[J]. Documentation, Information & Knowledge, 2015(3):25-34.)
( Liu Shengbo, Ding Kun, Tang Delong. The Theory and Method of Citation Content Analysis[J]. Information Studies: Theory & Application, 2015, 38(10):27-32.)
( Liu Liu, Wang Dongbo. Review on Citation Context Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(6):637-643.)
( Liu Xiaohui, Shen Zhesi, Liao Yu, et al. The Research About the Improved Disruption Index and Its Influencing Factors[J]. Library and Information Service, 2020, 64(24):84-91.)
( Zhai Shanshan, Ye Dingling, Hu Pan, et al. Evaluation of the Academic Impact of Data Papers Fused with Altmetrics and Citation Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(7):710-718.)
( Chen Yingfang, Ma Xiaolei. Measuring the Developmental Trend of a Knowledge Domain Through Citation Content and Citation Function Analysis[J]. Journal of Intelligence, 2020, 39(3):71-80.)
[12]
Teufel S, Siddharthan A, Tidhar D. Automatic Classification of Citation Function[C]//Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Morristown, NJ, USA: Association for Computational Linguistics, 2006: 103-110.
[13]
Garfield E. Can Citation Indexing be Automated?[A]//Statistical Association Methods for Mechanized Documentation, Symposium Proceedings[M]. 1964: 189-192.
[14]
Spiegel-Rosing I. Science Studies: Bibliometric and Content Analysis[J]. Social Studies of Science, 1977, 7(1):97-113.
doi: 10.1177/030631277700700111
[15]
Peritz B C. A Classification of Citation Roles for the Social Sciences and Related Fields[J]. Scientometrics, 1983, 5(5):303-312.
doi: 10.1007/BF02147226
[16]
Garzone M A. Automated Classification of Citations Using Linguistic Semantic Grammars[D]. Canada: The University of Western Ontario, 1997.
[17]
Nanba H, Kando N, Okumura M. Classification of Research Papers Using Citation Links and Citation Types: Towards Automatic Review Article Generation[J]. Advances in Classification Research Online, 2011, 11(1):117-134.
[18]
Pham S B, Hoffmann A. A New Approach for Scientific Citation Classification Using Cue Phrases[C]//Proceedings of the 16th Australasian Joint Conference on Artificial Intelligence. 2003: 759-771.
[19]
Dong C, Schäfer U. Ensemble-Style Self-Training on Citation Classification[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 623-631.
[20]
Hernández-Alvarez M, Gomez Soriano J M, Martínez-Barco P. Citation Function, Polarity and Influence Classification[J]. Natural Language Engineering, 2017, 23(4):561-588.
doi: 10.1017/S1351324916000346
[21]
Le X Q, Chu J D, Deng S Y, et al. CiteOpinion: Evidence-Based Evaluation Tool for Academic Contributions of Research Papers Based on Citing Sentences[J]. Journal of Data and Information Science, 2019, 4(4):26-41.
doi: 10.2478/jdis-2019-0019
( Peng Ze, Ye Guanghui, Bi Chongwu, et al. Analysis of Citation Network’s Knowledge Flow Path from the Perspective of Citation Content[J]. Information Studies: Theory & Application, 2020, 43(12):19-25,10.)
( Hua Lianlian, Zhang Wuyi. Analysis of Knowledge Flow and Its Relative Concepts[J]. Journal of Intelligence, 2010, 29(10):112-117.)
[24]
Chu K C, Yeh C C. Knowledge Flow of Biomedical Informatics Domain: Position-Based Co-Citation Analysis Approach[C]//Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2016: 1119-1126.
[25]
Wang Y Z, Zhang C Z. What Type of Domain Knowledge is Cited by Articles with High Interdisciplinary Degree?[J]. Proceedings of the Association for Information Science and Technology, 2018, 55(1):919-921.
doi: 10.1002/pra2.2018.14505501176
( Ye Guanghui, Peng Ze, Bi Chongwu, et al. Research on Citation Network Knowledge Flow from the Perspective of Citation Content[J]. Information Studies: Theory & Application, 2020, 43(12):11-18.)
( Ye Guanghui, Peng Ze, Bi Chongwu, et al. Characteristics of Knowledge Flow in Citation Network from the Perspective of Citation Content[J]. Information Studies: Theory & Application, 2020, 43(12):4-10.)
( Bi Chongwu, Ye Guanghui, Peng Ze, et al. Network Analysis on Knowledge Flow in Citation Network from the Perspective of Citation Content[J]. Information Science, 2022, 40(2):49-58.)
[29]
Aroeira R I, Castanho M. Can Citation Metrics Predict the True Impact of Scientific Papers?[J]. The FEBS Journal, 2020, 287(12):2440-2448.
doi: 10.1111/febs.15255
[30]
叶继元. 引文法既是定量又是定性的评价法[J]. 图书馆, 2005(1):43-45.
[30]
( Ye Jiyuan. Citation Method is the Quantitative is the Qualitative Evaluation Method Again[J]. Library, 2005(1):43-45.)
[31]
Sombatsompop N, Kositchaiyong A, Markpin T, et al. Scientific Evaluations of Citation Quality of International Research Articles in the SCI Database: Thailand Case Study[J]. Scientometrics, 2006, 66(3):521-535.
doi: 10.1007/s11192-006-0038-8
[32]
Wan X J, Liu F. Are All Literature Citations Equally Important? Automatic Citation Strength Estimation and Its Applications[J]. Journal of the Association for Information Science and Technology, 2014, 65(9):1929-1938.
doi: 10.1002/asi.23083
( Liu Shengbo, Ding Kun, Zhang Chunbo. Research on the Citation Evaluation Based on Citation Context Nature[J]. Information Studies: Theory & Application, 2015, 38(3):77-81.)
( Liu Shengbo, Wang Bo, Tang Delong, et al. Research on Paper Influence Based on Citation Context: A Case Study of the Nobel Prize Winner’s Paper[J]. Library and Information Service, 2015, 59(24):109-114.)
[35]
Zhu X D, Turney P, Lemire D, et al. Measuring Academic Influence: Not All Citations are Equal[J]. Journal of the Association for Information Science and Technology, 2015, 66(2):408-427.
doi: 10.1002/asi.23179
[36]
Ollagnier A, Fournier S, Bellot P. Measuring the Centrality of the References in Scientific Papers[C]//Proceedings of the 18th ACM Symposium on Document Engineering. 2018: 1-4.
( Jiang Lin, Zhang Qilin. Research on Academic Evaluation Based on Fine-Grain Citation Sentimental Quantification[J]. Data Analysis and Knowledge Discovery, 2020, 4(6):129-138.)
[38]
Bertin M, Atanassova I. Recommending Scientific Papers: The Role of Citation Contexts[C]//Proceedings of the 1st International Conference on Digital Tools & Uses Congress. 2018: 1-4.
[39]
Gipp B, Beel J, Hentschel C. Scienstein: A Research Paper Recommender System[C]//Proceedings of the 2009 International Conference on Emerging Trends in Computing, Communication and Nanotechnology. 2009: 309-315.
[40]
Eto M. Extended Co-Citation Search: Graph-Based Document Retrieval on a Co-Citation Network Containing Citation Context Information[J]. Information Processing & Management, 2019, 56(6):102046.
doi: 10.1016/j.ipm.2019.05.007
[41]
Khadka A, Cantador I, Fernandez M. Exploiting Citation Knowledge in Personalised Recommendation of Recent Scientific Publications[C]//Proceedings of the 12th Language Resources and Evaluation Conference. 2020: 2231-2240.
[42]
Sakib N, Ahmad R B, Haruna K. A Collaborative Approach Toward Scientific Paper Recommendation Using Citation Context[J]. IEEE Access, 2020, 8:51246-51255.
doi: 10.1109/ACCESS.2020.2980589
[43]
Kim H J, Jeong Y K, Song M. Content- and Proximity-Based Author Co-Citation Analysis Using Citation Sentences[J]. Journal of Informetrics, 2016, 10(4):954-966.
doi: 10.1016/j.joi.2016.07.007
( Wang Jingzhou, Cui Jianying. Selection Method of Peer Review Expert Based on Manuscript Citation Contents Analysis[J]. Acta Editologica, 2020, 32(5):539-542.)
[45]
Zhao H, Luo Z, Feng C, et al. A Context-based Framework for Resource Citation Classification in Scientific Literatures[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019:1041-1044.
[46]
Qazvinian V, Radev D R. Scientific Paper Summarization Using Citation Summary Networks[C]//Proceedings of the 22nd International Conference on Computational Linguistics. Morristown, NJ, USA: Association for Computational Linguistics, 2008: 689-696.
[47]
Elkiss A, Shen S W, Fader A, et al. Blind Men and Elephants: What do Citation Summaries Tell Us About a Research Article?[J]. Journal of the American Society for Information Science and Technology, 2008, 59(1):51-62.
doi: 10.1002/asi.20707
[48]
Widyantoro D H, Amin I. Citation Sentence Identification and Classification for Related Work Summarization[C]//Proceedings of the 2014 International Conference on Advanced Computer Science and Information System. IEEE, 2014: 291-296.
[49]
Conroy J M, Davis S T. Section Mixture Models for Scientific Document Summarization[J]. International Journal on Digital Libraries, 2018, 19(2-3):305-322.
doi: 10.1007/s00799-017-0218-6
[50]
Cohan A, Goharian N. Scientific Document Summarization via Citation Contextualization and Scientific Discourse[J]. International Journal on Digital Libraries, 2018, 19(2-3):287-303.
doi: 10.1007/s00799-017-0216-8
[51]
Abu-Jbara A, Radev D. Reference Scope Identification in Citing Sentences[C]//Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012: 80-90.
[52]
Athar A, Teufel S. Context-Enhanced Citation Sentiment Detection[C]//Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012: 597-601.
( Zhao Lei, Zhang Chengzhi. Difference Analysis of Research Topics in a Specific Domain Based on Different Content Levels[J]. Journal of Library and Information Science in Agriculture, 2021, 33(5):14-27.)
[54]
Jebari C, Cobo M, Herrera-Viedma E. A New Approach for Implicit Citation Extraction[C]//Proceedings of the 19th International Conference on Intelligent Data Engineering and Automated Learning. 2018: 121-129.
[55]
Sula C A, Miller M. Citations, Contexts, and Humanistic Discourse: Toward Automatic Extraction and Classification[J]. Literary and Linguistic Computing, 2014, 29(3):452-464.
doi: 10.1093/llc/fqu019
[56]
Hatop G. Extraction, Analysis and Publication of Bibliographical References Within an Institutional Repository[J]. Library Hi Tech, 2016, 34(2):259-267.
doi: 10.1108/LHT-01-2016-0003
[57]
An D, Gao L C, Jiang Z R, et al. Citation Metadata Extraction via Deep Neural Network-Based Segment Sequence Labeling[C]//Proceedings of the 26th ACM International Conference on Information and Knowledge Management. 2017: 1967-1970.
[58]
Tkaczyk D, Szostek P, Fedoryszak M, et al. CERMINE: Automatic Extraction of Structured Metadata from Scientific Literature[J]. International Journal on Document Analysis and Recognition, 2015, 18(4):317-335.
doi: 10.1007/s10032-015-0249-8
[59]
Lopez P. Automatic Extraction and Resolution of Bibliographical References in Patent Documents[C]//Proceedings of the 13th European Conference on Digital Library. 2010: 120-135.
[60]
Khalid A, Alam F, Ahmed I. Extracting Reference Text from Citation Contexts[J]. Cluster Computing, 2018, 21(1):605-622.
doi: 10.1007/s10586-017-0954-9
[61]
Athar A. Sentiment Analysis of Citations Using Sentence Structure-Based Features[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011:81-87.
[62]
Zhang G, Ding Y, Milojević S. Citation Content Analysis (CCA): A Framework for Syntactic and Semantic Analysis of Citation Content[J]. Journal of the American Society for Information Science and Technology, 2013, 64(7):1490-1503.
doi: 10.1002/asi.22850
[63]
Nanba H, Okumura M. Towards Multi-Paper Summarization Using Reference Information[J]. Journal of Natural Language Processing, 1999, 6(5):43-62.
[64]
Angrosh M A, Cranefield S, Stanger N. Context Identification of Sentences in Related Work Sections Using a Conditional Random Field: Towards Intelligent Digital Libraries[C]//Proceedings of the 10th Annual Joint Conference on Digital Libraries. 2010: 293-302.
[65]
Sondhi P, Zhai C X. A Constrained Hidden Markov Model Approach for Non-Explicit Citation Context Extraction[C]//Proceedings of the 2014 SIAM International Conference on Data Mining. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2014: 361-369.
( Lei Shengwei, Chen Haihua, Huang Yong, et al. Research on Automatic Recognition of Academic Citation Context[J]. Library and Information Service, 2016, 60(17):78-87.)
( Jin Xianri, Ou Shiyan . Identifying Citation Texts with Unsupervised Method[J]. Data Analysis and Knowledge Discovery, 2021, 5(1):66-77.)
[68]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[69]
Radford A, Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-training[OL]. [2020-08-17]. http://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/language-unsupervised/language_understanding_paper.pdf.
( Wang Jiamin, Li Xin, Liu Qijin. A Review of the Academic Salon on Full-Text Bibliometric Analysis[J]. Journal of Information Resources Management, 2018, 8(4):119-125.)
Ding Y, Liu X, Guo C, et al. The Distribution of References Across Texts: Some Implications for Citation Analysis[J]. Journal of Informetrics, 2013, 7(3):583-592.
doi: 10.1016/j.joi.2013.03.003
( Lu Wei, Huang Yong, Cheng Qikai. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(9):979-985.)
( Wang Dongbo, Gao Ruiqing, Ye Wenhao, et al. Research on the Structure Recognition of Academic Texts Under Different Characteristics[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(10):997-1008.)
( Wang Jiamin, Lu Wei, Liu Jiawei, et al. Research on Structure Function Recognition of Academic Text Based on Multi-Level Fusion[J]. Library and Information Service, 2019, 63(13):95-104.)
( Wang Qian, Zeng Jin, Liu Jiawei, et al. Structure Function Recognition of Academic Text Paragraph Based on Deep Learning[J]. Information Science, 2020, 38(3):64-69.)
[78]
Kim J, Le D X, Thoma G R. Automated Labeling in Document Images[C]//Proceedings of the 2000 Conference on Document Recognition and Retrieval. 2000: 111-122.
[79]
Constantin A, Pettifer S, Voronkov A. PDFX: Fully-Automated PDF-to-XML Conversion of Scientific Literature[C]//Proceedings of the 2013 ACM Symposium on Document Engineering. 2013: 177-180.
[80]
Tuarob S, Mitra P, Giles C L. A Hybrid Approach to Discover Semantic Hierarchical Sections in Scholarly Documents[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition. IEEE, 2015: 1081-1085.
[81]
Yousif A, Niu Z D, Tarus J K, et al. A Survey on Sentiment Analysis of Scientific Citations[J]. Artificial Intelligence Review, 2019, 52(3):1805-1838.
doi: 10.1007/s10462-017-9597-8
[82]
Abu-Jbara A, Ezra J, Radev D R. Purpose and Polarity of Citation: Towards NLP-Based Bibliometrics[C]//Proceedings of the 2nd Workshop on Computational Linguistics for Literature. 2013: 596-606.
[83]
Ma Z, Nam J, Weihe K. Improve Sentiment Analysis of Citations with Author Modelling[C]//Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016: 122-127.
[84]
Amjad Z, Ihsan I. VerbNet Based Citation Sentiment Class Assignment Using Machine Learning[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(9):621-627.
[85]
Munkhdalai T, Lalor J, Yu H. Citation Analysis with Neural Attention Models[C]//Proceedings of the 7th International Workshop on Health Text Mining and Information Analysis. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016: 69-77.
[86]
Ravi K, Setlur S, Ravi V, et al. Article Citation Sentiment Analysis Using Deep Learning[C]//Proceedings of the IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing. 2018: 78-85.
[87]
Yousif A, Niu Z D, Chambua J, et al. Multi-Task Learning Model Based on Recurrent Convolutional Neural Networks for Citation Sentiment and Purpose Classification[J]. Neurocomputing, 2019, 335:195-205.
doi: 10.1016/j.neucom.2019.01.021
[88]
Xu J, Zhang Y, Wu Y, et al. Citation Sentiment Analysis in Clinical Trial Papers[J]. AMIA Annual Symposium Proceedings, 2015, 2015:1334-1341.
[89]
Ikram M T, Afzal M T. Aspect Based Citation Sentiment Analysis Using Linguistic Patterns for Better Comprehension of Scientific Knowledge[J]. Scientometrics, 2019, 119(1):73-95.
doi: 10.1007/s11192-019-03028-9
[90]
Jochim C, Schütze H. Improving Citation Polarity Classification with Product Reviews[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014:42-48.
( Zhu Qingsong, Leng Fuhai. Topic Identification of Highly Cited Papers Based on Citation Content Analysis[J]. Journal of Library Science in China, 2014, 40(1):39-49.)
( Xu Shurui, Zhang Chengzhi, Lu Chao. Using Citation Contents for the Interdisciplinary Type Analysis at a Topical Level[J]. Library and Information Service, 2017, 61(23):15-24.)
[94]
Aljaber B, Stokes N, Bailey J, et al. Document Clustering of Scientific Texts Using Citation Contexts[J]. Information Retrieval, 2010, 13(2):101-131.
doi: 10.1007/s10791-009-9108-x
[95]
Bornmann L, Haunschild R, Hug S E. Visualizing the Context of Citations Referencing Papers Published by Eugene Garfield: A New Type of Keyword Co-Occurrence Analysis[J]. Scientometrics, 2018, 114(2):427-437.
doi: 10.1007/s11192-017-2591-8
pmid: 29449748
[96]
Liu S B, Chen C M. The Differences Between Latent Topics in Abstracts and Citation Contexts of Citing Papers[J]. Journal of the American Society for Information Science and Technology, 2013, 64(3):627-639.
doi: 10.1002/asi.22771
[97]
Zhou H K, Yu H M, Hu R. Topic Discovery and Evolution in Scientific Literature Based on Content and Citations[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(10):1511-1524.
[98]
Bai H L, Chen Z B, Michael R, et al. Neural Relational Topic Models for Scientific Article Analysis[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018: 27-36.
[99]
Jebari C, Herrera-Viedma E, Cobo M J. The Use of Citation Context to Detect the Evolution of Research Topics: A Large-Scale Analysis[J]. Scientometrics, 2021, 126(4):2971-2989.
doi: 10.1007/s11192-020-03858-y
[100]
Andrade C M, Gonçalves M A. Combining Representations for Effective Citation Classification[C]//Proceedings of the 8th International Workshop on Mining Scientific Publications. 2020:54-58.
[101]
Crammer K, Dekel O, Keshet J, et al. Online Passive-Aggressive Algorithms[J]. Journal of Machine Learning Research, 2006, 7:551-585.
[102]
Bakhti K, Niu Z D, Nyamawe A. A New Scheme for Citation Classification Based on Convolutional Neural Networks[C]//Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering. 2018: 131-142.