Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (5): 68-76    DOI: 10.11925/infotech.2096-3467.2018.0659
Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning
Jinzhu Zhang(),Yiming Hu
School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
[Objective] This paper aims to automatically identify scientific references in patent(SRP), and then extract titles from SRP to support in-depth data mining. [Methods] Firstly, we used the Doc2Vec method to generate vectors for the patent citations. Then, we identified the SRPs with support vector machine (SVM). Third, we created vectors for the metadata (such as titles) of SRP, and extracted titles with SVM. [Results] We examined the proposed method with patent citations from the genetic field. The accuracy of SRP recognition and titles extraction reached 99.27% and 92.59% respectively. The latter was 5.96% higher than those of the traditional methods. [Limitations] Manually tagging the training set was very time consuming, and there are format requirements for the experimental data. [Conclusions] The proposed method could effectively identify and extract patent citations and titles.

Key wordsScientific References in Patent      Metadata Extraction      Machine Learning      Representation Learning     
Received: 20 June 2018      Published: 03 July 2019

Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning. Data Analysis and Knowledge Discovery, 2019, 3(5): 68-76.

