%A Jiang Chuntao %T Automatic Annotation of Bibliographical References in Chinese Patent Documents %0 Journal Article %D 2015 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.1003-3513.2015.10.11 %P 81-87 %V 31 %N 10 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4146.shtml} %8 2015-10-25 %X

[Objective] This paper aims to automatically annotate four types of bibliographical references in Chinese patent documents, such as patents, standards, papers, and other monographs public documents. [Methods] Use a pattern matching approach to annotate the references of patents, standards, and public documents, and use a two-phase machine learning approach to annotate the paper references, firstly, automatically detecte the sentences that contain citation information, then extracte 6 categories of bibliographic features from the results. [Results] The results of ten-fold cross validation show that the accuracy for annotating patents is 100%, and the precision and recall for annotating standards is 92% and 94% respectively, while the precision and recall for annotating public documents is 80% and 71% respectively. For annotating paper references, the precision and recall in phase one is 95.7% and 96.0% and in phase two is 95.3% and 94.9% respectively. [Limitations] The pattern matching approach requires analyzing a lot of patent documents manually, and the size of the training model used by the proposed machine learning approach is relatively small. [Conclusions] The performance of annotating patents and standards using a pattern matching approach achieves over 92%, and the performance of annotating papers using a machine learning approach achieves 95%.