[Objective] This paper aims to automatically identify scientific references in patent(SRP), and then extract titles from SRP to support in-depth data mining. [Methods] Firstly, we used the Doc2Vec method to generate vectors for the patent citations. Then, we identified the SRPs with support vector machine (SVM). Third, we created vectors for the metadata (such as titles) of SRP, and extracted titles with SVM. [Results] We examined the proposed method with patent citations from the genetic field. The accuracy of SRP recognition and titles extraction reached 99.27% and 92.59% respectively. The latter was 5.96% higher than those of the traditional methods. [Limitations] Manually tagging the training set was very time consuming, and there are format requirements for the experimental data. [Conclusions] The proposed method could effectively identify and extract patent citations and titles.
张金柱,胡一鸣. 融合表示学习与机器学习的专利科学引文标题自动抽取研究*[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning. Data Analysis and Knowledge Discovery, DOI：10.11925/infotech.2096-3467.2018.0659.
(Yang Yu, Zhang Ming, Zhou Baoyao.A Rule-based Metadata Extractor for Learning Materials[J]. Computer Science, 2008, 35(3): 94-96.)
Day M Y, Tsai T H, Sung C L, et al.Reference Metadata Extraction Using a Hierarchical Knowledge Representation Framework[J]. Decision Support Systems, 2007, 43(1): 152-167.
Cortez E, Silva A S D, Mesquita F, et al. FLUX-CIM: Flexible Unsupervised Extraction of Citation Metadata[C]// Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 2007: 215-224.
Seymore K, McCallum A, Rosenfeld R. Learning Hidden Markov Model Structure for Information Extraction[C]// Proceedings of the 1999 AAAI Workshop on Machine Learning for Information Extraction. 1999: 37-42.
Nanba H, Anzen N, Okumura M.Automatic Extraction of Citation Information in Japanese Patent Applications[J]. International Journal on Digital Libraries, 2008, 9(2): 151-161.
Han H, Giles C L, Manavoglu E, et al.Automatic Document Metadata Extraction Using Support Vector Machines[C]// Proceedings of the 2003 Joint Conference on Digital Libraries. IEEE, 2003: 37-48.