[Objective]This paper intends to explore the method for automatic identification of citation texts and compare the difference in the content of different types of citation sentences.
[Methods] This paper proposed an unsupervised method for identifiying citation texts, which determines implicit citation sentences by comparing the similarity of a candidate sentence wth a citing paper and that with a cited paper. To precisely calcuate text similarity, two document vector models were propsoed by combining the vector space model and the word embedding model.
[Results] while identifying the implicit citation sentences of two higly-cited papers respectively from over 200 citing papers, the proposed unsupervised method obtained the F-value of above 92%. By comparing the content of the explicit and implicit citaiton senstences, it was found that there are significant difference in citation function and citaiton sentiment between the two types of citation sentences：the proportion of implicit citation sentences expressing research background and technical basis is higher than that of explicit citation sentences, while the proportion of implicit citation sentences expressing research basis and research comparison is lower than that of explicit citation sentence; 45.3% of explicit citation sentences were positive references while 78.8% of implicit citation sentences were neutral references.
[Limitations] This paper only identifies citation texts at sentence level. The clause-level and phrase-level identification should be explored further.
[Conclusions] It is necessary to contain implicit citation sentences while identifying citaion texts. The proposed similarity-based method is effective.
[J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2020.0548 .
Kim Hyonil, Ou Shiyan.
The Unsupervised Identification and Analysis of Citation Texts
. Data Analysis and Knowledge Discovery, 0, (): 1-.