[Objective] This paper aims to accurately extract scientific citations and their context data, which significantly improves the results of citation analysis. [Methods] We divided the citation extraction task into citation sentence extraction, citation context identification, and citation metadata. Then, we proposed a coreference resolution-based method to identify and extract scientific citation context. [Results] We examined our method with the Chinese sequential coding periodicals and extracted the citation sentences and references correctly. The F1 value for identifying the citation context was between 0.780 and 0.849. [Limitations] Due to the limits of Chinese scientific citation corpus and the small scale of experimental data, the proposed method might not work effectively in other fields. [Conclusions] Our study optimizes the steps of citation content analysis and enlarges data scope. It provides support for researchers of citation content analysis.
Small H. Citations and Consilience in Science[J]. Scientometrics, 1998, 43(1):143-148.
Bergmark D, Phempoonpanich P, Zhao S M. Scraping the ACM Digital Library[J]. ACM SIGIR Forum, 2001, 35(2):1-7.
Bergmark D. Automatic Extraction of Reference Linking Information from Online Documents[R]. Cornell University, 2000.
Sarawagi S, Vydiswaran V G V, Srinivasan S, et al. Resolving Citations in a Paper Repository[J]. ACM SIGKDD Explorations Newsletter, 2003, 5(2):156-157.
Giles C L, Bollacker K D, Lawrence S. CiteSeer: An Automatic Citation Indexing System[C]// Proceedings of the 3rd ACM Conference on Digital Libraries. 1998: 89-98.
Wellner B, McCallum A, Peng F C, et al. An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching[C]// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004: 593-601.
Takasu A. Bibliographic Attribute Extraction from Erroneous References Based on a Statistical Model[C]// Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Computer Society, 2003: 49-60.
Ding Y, Chowdhury G, Foo S. Template Mining for the Extraction of Citation from Digital Documents[C]// Proceedings of the 2nd Asian Digital Library Conference. 1999: 47-62.
Nanba H, Okumura M. Towards Multi-paper Summarization Using Reference Information[C]// Proceedings of International Joint Conference on Artificial Intelligence. 1999: 926-931.
Nanba H, Kando N, Okumura M. Classification of Research Papers Using Citation Links and Citation Types: Towards Automatic Review Article Generation[J]. Advances in Classification Research Online, 2011, 11(1):117-134.
Mei Q Z, Zhai C X. Generating Impact-Based Summaries for Scientific Literature[C]// Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. 2008: 816-824.
Abu-Jbara A, Radev D. Reference Scope Identification in Citing Sentences[C]// Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012: 80-90.
Qazvinian V, Radev D R. Identifying Non-explicit Citing Sentences for Citation-based Summarization[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 555-564.
Qazvinian V, Radev D R. Scientific Paper Summarization Using Citation Summary Networks[OL]. arXiv Preprint, arXiv: 0807. 1560.
Teufel S, Siddharthan A, Tidhar D. Automatic Classification of Citation Function[C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. 2006: 103-110.
Teufel S, Siddharthan A, Tidhar D. An Annotation Scheme for Citation Function[C]// Proceedings of the 7th SIGDIAL Workshop on Discourse and Dialogue. 2006: 80-87.
Athar A, Teufel S. Context-enhanced Citation Sentiment Detection[C]// Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012: 597-601.
( Zhang Chengzhi, Xu Jin, Ma Shutian. Automatic Identification of Cited Spans in Academic Articles[J]. Information Studies: Theory & Application, 2019, 42(9):139-145.)
McCarth J F, Lenhner W G. Using Decision Trees for Coreference Resolution[OL]. arXiv Preprint, arXiv: cmp-lg/9505043, 1995.
Soon W M, NG H T, Lim D C Y. A Machine Learning Approach to Coreference Resolution of Noun Phrases[J]. Computational Linguistics, 2001, 27(4):521-544.
Ng V, Cardie C. Improving Machine Learning Approaches to Coreference Resolution[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002: 104-111.
Lee H, Peirsman Y, Chang A, et al. Stanford’s Multi-pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task[C]// Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task. 2011: 28-34.
Chen C, Ng V. Chinese Noun Phrase Coreference Resolution: Insights into the State of the Art[C]// Proceedings of COLING 2012. 2012:185-194.