Data Analysis and Knowledge Discovery  2016, Vol. 32 Issue (12): 36-43    DOI: 10.11925/infotech.1003-3513.2016.12.05
Recognizing Chinese Organization Names Based on Deep Learning: A Recurrent Network Model
Danhao Zhu1,2(),Lei Yang3,Dongbo Wang4
1Library of Jiangsu Police Institute, Nanjing 210031, China
2Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China
3Department of High Education, College of Nanjing Traffic Technician, Nanjing 210049, China
4College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
[Objective]Chinese organization names are difficult to be recognized by computers due to their complex structures and using of rare words. Successful recognition of these names plays significant roles in information extraction and retrieval, knowledge mining as well as institution research evaluation. [Methods] First, we redefined the input and output of organization names based on recurrent neural network method and nature of Chinese words or phrases. Second, we proposed a new model at the word level. [Results] Compared to the recurrent network models at the phrase level, the proposed method significantly improved the precision, recall and F value. Among them, the F value increased 1.54%. For organization names with rare words, the F value increased by 11.05%. [Limitations] We adopted a greedy strategy to find the local optimal values. A conditional random field method will yield better results from the global perspective. [Conclusions] The proposed method, which uses Chinese word level features, is easy to be implemented, and could generate better results than its phrase based counterparts.

Key wordsOrganization recognition      Recurrent Neural Network      Deep learning     
Received: 01 August 2016      Published: 22 January 2017

Danhao Zhu, Lei Yang, Dongbo Wang. Recognizing Chinese Organization Names Based on Deep Learning: A Recurrent Network Model. Data Analysis and Knowledge Discovery, 2016, 32(12): 36-43.

