Recognizing Chinese Organization Names Based on Deep Learning: A Recurrent Network Model
Danhao Zhu1,2(),Lei Yang3,Dongbo Wang4
1Library of Jiangsu Police Institute, Nanjing 210031, China 2Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China 3Department of High Education, College of Nanjing Traffic Technician, Nanjing 210049, China 4College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
[Objective]Chinese organization names are difficult to be recognized by computers due to their complex structures and using of rare words. Successful recognition of these names plays significant roles in information extraction and retrieval, knowledge mining as well as institution research evaluation. [Methods] First, we redefined the input and output of organization names based on recurrent neural network method and nature of Chinese words or phrases. Second, we proposed a new model at the word level. [Results] Compared to the recurrent network models at the phrase level, the proposed method significantly improved the precision, recall and F value. Among them, the F value increased 1.54%. For organization names with rare words, the F value increased by 11.05%. [Limitations] We adopted a greedy strategy to find the local optimal values. A conditional random field method will yield better results from the global perspective. [Conclusions] The proposed method, which uses Chinese word level features, is easy to be implemented, and could generate better results than its phrase based counterparts.
朱丹浩, 杨蕾, 王东波. 基于深度学习的中文机构名识别研究*——一种汉字级别的循环神经网络方法[J]. 数据分析与知识发现, 2016, 32(12): 36-43.
Danhao Zhu, Lei Yang, Dongbo Wang. Recognizing Chinese Organization Names Based on Deep Learning: A Recurrent Network Model. Data Analysis and Knowledge Discovery, 2016, 32(12): 36-43.
(Shen Jiayi, Li Fang, Xu Feiyu, et al.Recognition of Chinese Organization Names and Abbreviations[J]. Journal of Chinese Information Processing, 2007, 21(6): 17-21.)
(Zhou Junsheng, Dai Xinyu, Yin Cunyan, et al.Automatic Recognition of Chinese Organization Name Based on Cascaded Conditional Random Fields[J]. Acta Electronica Sinica, 2006, 34(5): 804-809.)
(Huang Degen, Li Zezhong, Wan Ru.Chinese Organization Name Recognition Using Cascaded Model Based on SVM and CRF[J]. Journal of Dalian University of Technology, 2010, 50(5): 782-787.)
(Teng Qingqing, Ji Jiuming, Zheng Yongting, et al.Applicability Analysis of Chinese Named Entity Recognition Method Based on Literatures[J]. Journal of Intelligence, 2010, 29(9): 157-161.)
Chen X, Qiu X, Zhu C, et al.Gated Recursive Neural Network for Chinese Word Segmentation [C]. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.2015: 1744-1753.
[7]
Chen X, Xu L, Liu Z, et al.Joint Learning of Character and Word Embeddings [C]. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015: 1236-1242.
[8]
Sun Y, Lin L, Yang N, et al.Radical-enhanced Chinese Character Embedding [C]. In: Proceedings of the International Conference on Neural Information Processing. Springer International Publishing, 2014: 279-286.
(Sun Zhen, Wang Huilin.Overview on the Advance of the Research on Named Entity Recognition[J]. New Technology of Library and Information Service, 2010(6): 42-47.)
(Lu Wei, Ju Yuan, Zhang Xiaojuan, et al.Research on Product Named Entity Feature Selection and Recognition[J]. Document, Information & Knowledge, 2012(3): 4-12.)
(Wu Dan, He Daqing, Lu Wei.The Extraction and Translation of Named Entity in Cross Language Information Retrieval[J]. Document, Information & Knowledge, 2012(3): 13-19.)
(Wang Wenlong, Wang Dongbo.Project Application-oriented Named Entity Extraction Model Construction[J]. Information and Documentation Services, 2015(1): 30-34.)
(Chen Feng, Zhai Yujia, Wang Fang.Automatic Theory Recognition in Academic Journals Based on CRF[J]. Library and Information Service, 2016, 60(2): 122-128.)
(Yu Hongkui, Zhang Huaping, Liu Qun.Recognition of Chinese Organization Name Based on Role Tagging [C]. In: Proceedings of the 20th International Conference on Computer Processing of Oriental Languages. 2003: 79-87.)
(Guan Xiaoda, Lv Xueqiang, Li Zhuo, et al.Chinese Organization Name Recognition in User Query Log[J]. New Technology of Library and Information Service, 2014(1): 72-78.)
Sutskever I, Vinyals O, Le Q V.Sequence to Sequence Learning with Neural Networks [A]. //Advances in Neural Information Processing Systems[M]. 2014: 3104-3112.
[21]
Pascanu R, Mikolov T, Bengio Y.On the Difficulty of Training Recurrent Neural Networks[J]. Journal of Machine Learning Research, 2013, 28(3): 1310-1318.
[22]
Srivastava N, Hinton G, Krizhevsky A, et al.Dropout: A Simple Way to Prevent Neural Networks from Overfitting[J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.