Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields
Wang Miping(),Wang Hao,Deng Sanhong,Wu Zhixiang
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper proposed a model to extract metallurgy patent terms in Chinese effectively. [Methods] We created the model to automatically identify metallurgy patent terminologies in Chinese with the help of conditional random fields(CRFs) technology. This model was tested with an incomplete core corpus. We discussed the development process and then compared the impacts of various CRFs factors to this character-role-labeled model. [Results] The new model combined the character sequences, level features, areal features and temperature features of the patent terms. Its precision rate was 94.26%, the recall rate was 94.37%, and the F1 value was 94.5%, while the length of the proximity window and the values of the parameter c and f were 3, 1, and 1 respectively. [Limitations] Some of the term labels were not accurate enough due to the incomplete core corpus. We did not compare our model with other methods to discuss the reliability of the CRFs. [Conclusions] The CRFs model could effectively identify the metallurgy patent terms in Chinese under appropriate working conditions.
王密平,王昊,邓三鸿,吴志祥. 基于CRFs的冶金领域中文专利术语抽取研究*[J]. 现代图书情报技术, 2016, 32(6): 28-36.
Wang Miping,Wang Hao,Deng Sanhong,Wu Zhixiang. Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields. New Technology of Library and Information Service, 2016, 32(6): 28-36.
(Ge Xu, Lu Baohua, Yang Xianghua, et al.Utilization of Patent Literature on the Development of Science and Technology in Universities[J]. Technology and Innovation Management, 2005, 26(1): 68-70.)
(Jia Zhiqi, Shao Yuejian.Enhance Enterprises’ Technological Innovative Capability Through Effective Use of Patent Documents[J]. Shanxi Science and Technology, 2008(1): 91-93.)
[4]
Uzunbas M G, Chen C, Metaxas D.An Efficient Conditional Random Field Approach for Automatic and Interactive Neuron Segmentation[J]. Medical Image Analysis, 2016, 27: 31-44.
(Zhang Leihan, Lv Xueqiang, Li Zhuo, et al.Research on Extraction Methods for Domain Ontology Terminology[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(2): 167-174.)
(Tang Qing, Lv Xueqiang, Li Zhuo, et al.Research on Domain Ontology Term Extraction[J]. New Technology of Library and Information Service, 2014(1): 43-50.)
(Wang Hao, Liu Jianhua, Su Xinning, et al.Research on Techniques and Systems of Ontology Learning for Semantic Web[J]. New Technology of Library and Information Service , 2009(1): 64-72.)
(Gu Jun, Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J]. New Technology of Library and Information Service, 2011(4): 29-34.)
(Hua Bolin.Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6): 68-75.)
[11]
Zhou H T, Chen J, Dong G M, et al. Detection and Diagnosis of Bearing Faults Using Shift-invariant Dictionary Learning and Hidden Markov Model [J]. Mechanical Systems and Signal Processing, 2016, 72-73: 65-79.
(Li Lishuang, Wang Yiwen, Huang Degen.Term Extraction Based on Information Entropy and Word Frequency Distribution Variety[J]. Journal of Chinese Information Processing, 2015, 29(1): 82-87.)
(Lu Dawei, Song Rou.Automatic Recognition of the Absent Topics in Chinese Punctuation Clauses Based on Maximum Entropy Model[J]. Computer Engineering and Science, 2015, 37(12): 2282-2293.)
(He Jingzhou, Wang Houfeng.Chinese Word Sense Disambiguation Based on Maximum Entropy Model with Feature Selection[J]. Journal of Software, 2010, 21(6): 1287-1295.)
(Wang Hao, Deng Sanhong.Comparative Study on HMM and CRFs Applying in Information Extraction[J]. New Technology of Library and Information Service, 2007(12): 57-63.)
[17]
Song D J, Liu W, Zhou T Y et al. Efficient Robust Conditional Random Fields[J]. IEEE Transactions on Image Processing, 2015, 24(10): 3124-3136.
(Deng Sanhong, Wang Hao, Qin Jiahang, et al.Research on Keywords Indexing for Chinese Bibliography Based on Word Roles Annotation[J]. Journal of Library Science in China, 2012, 38(2): 38-49.)
(Wang Hao, Su Xinning.Model for Person Name Recognition Based on Role Labeling Using CRFs and Its Application to Web Opinion Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2009, 28(1): 88-96.)
(Liu Huoyu, Wang Dongbo, Su Xinning.Research of Paragraphs Segmentation and Elements Recognition for Academic Papers Based on Multi-features[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(4): 388-397.)
(Li Peng, Gui Jie, Qiao Xiaodong, et al.Patent Summary Information Extraction Based on Conditional Random Fields and Rule Integrated[J]. Digital Library Forum, 2010(9): 2-6.)
(Huang Shaoshan, Qiao Xiaodong, Gui Jie, et al.Research on Summary of Patent Information Extraction Based on Conditional Random Field[J]. Digital Library Forum, 2010(9): 7-12.)
(Li Hongzheng, Jin Yaohong.Recognition of Chinese Patent Text Prepositional Phrase Based on conditional Random Field[J]. Modern Chinese, 2015(7): 120-122.)
[25]
Peng F, McCallum A. Infomation Extraction from Research Papers Using Conditional Random Fields[J]. Information Processing and Management, 2006, 42(4): 963-979.