New Technology of Library and Information Service  2016, Vol. 32 Issue (6): 28-36    DOI: 10.11925/infotech.1003-3513.2016.06.04
Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields
Wang Miping(),Wang Hao,Deng Sanhong,Wu Zhixiang
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper proposed a model to extract metallurgy patent terms in Chinese effectively. [Methods] We created the model to automatically identify metallurgy patent terminologies in Chinese with the help of conditional random fields(CRFs) technology. This model was tested with an incomplete core corpus. We discussed the development process and then compared the impacts of various CRFs factors to this character-role-labeled model. [Results] The new model combined the character sequences, level features, areal features and temperature features of the patent terms. Its precision rate was 94.26%, the recall rate was 94.37%, and the F1 value was 94.5%, while the length of the proximity window and the values of the parameter c and f were 3, 1, and 1 respectively. [Limitations] Some of the term labels were not accurate enough due to the incomplete core corpus. We did not compare our model with other methods to discuss the reliability of the CRFs. [Conclusions] The CRFs model could effectively identify the metallurgy patent terms in Chinese under appropriate working conditions.

Key wordsChinese patent terminology      CRFs      Terminology extraction      Sequence labeling     
Received: 01 March 2016      Published: 18 July 2016

Cite this article:

Wang Miping,Wang Hao,Deng Sanhong,Wu Zhixiang. Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields. New Technology of Library and Information Service, 2016, 32(6): 28-36.

