Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (1): 47-54    DOI: 10.11925/infotech.2096-3467.2017.01.06
Automatically Detecting and Tagging Foreign Language Citation Metadata
Lin Jiang1,2(),Dongbo Wang3
1School of Information Management, Nanjing University, Nanjing 210023, China
2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
3College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
[Objective]This paper proposes a new method to automatically extract bibliographic metadata, with the help of semantic knowledge and machine learning technologies. [Methods] We used the neural network model to create word vectors from manually split data, and then found that same type of metadata is relatively concentrated at certain locations in the vector space. Thus, we proposed a new SVM classification algorithm to classify and annotate the bibliographic metadata automatically. [Results] The proposed method achieved high recall and precision rates with citation data, especially for citations with various languages and abbreviations. [Limitations] The fine-grained extraction of the time related content could be improved. [Conclusions] The proposed method could effectively detect and tag bibliographic metadata, and improve the system’s compatibility and fault tolerance ability.

Key wordsBibliographic Metadata      Metadata Extraction      Machine Learning      Neural Network     
Received: 18 August 2016      Published: 22 February 2017

Lin Jiang,Dongbo Wang. Automatically Detecting and Tagging Foreign Language Citation Metadata. Data Analysis and Knowledge Discovery, 2017, 1(1): 47-54.

