[Objective] This paper proposes a model to extract the names of Chinese historical events, aiming to reorganize knowledge from texts and construct the ontology for these events. [Methods] We built the proposed model with conditional random fields(CRFs) and automatically tagging technology, based on the historical texts of the Wei, Jin, Northern and Southern Dynasties. Then, we explored the influence of different Chinese characters and features on recognizing event names. [Results] We constructed the best model based on the features of characters and the surnames. The F1 value of this model was as high as 98.74%. This model was examined with two open scenarios and achieved good results. [Limitations] The size of our training corpus needs to be expanded. More research is needed to compare results of single Chinese character tags and the phrases. [Conclusions] The CRFs model could effectively identify the names of Chinese historical events under appropriate working conditions.
唐慧慧, 王昊, 张紫玄, 王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags. Data Analysis and Knowledge Discovery, 2018, 2(7): 89-100.
Grishman R, Sundheim B.Message Understanding Conference-6: A Brief History[C]//Proceedings of the 16th Conference on Computational Linguistics-Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics, 1996: 466-471.
Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. 1991: 29-32.
Kazama J, Torisawa K.Exploiting Wikipedia as External Knowledge for Named Entity Recognition[C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. DBLP, 2009:698-707.
Pai N S, Kuang H J, Chang T Y, et al.Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition[J]. Mathematical Problems in Engineering, 2014. Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition[J]. Mathematical Problems in Engineering, 2014. .
Ahmed I, Sathyraj R.Named Entity Recognition by Using Maximum Entropy[J]. International Journal of Database Theory & Application, 2015, 8(2): 43-50.
(Zhu Jie, Li Tianrui, Liu Shengjiu.Research on Tibetan Name Recognition Technology Under CRF[J]. Journal of Nanjing University: Natural Science, 2016, 52(2): 289-299.)
(Wan Yehao, Liu Lijun, Huang Qingsong.Name Recognition of Chinese Medical Institutions Based on Cascading Conditional Random Fields[J]. Journal of Jinan University: Science and Technology, 2017, 31(1): 61-66.)
(Huang Shuiqing, Wang Dongbo, He Lin.Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J]. Library and Information Service, 2015, 59(12): 135-140.)
(Zhang Xiaoheng, Wang Lingling.Identification and Analysis of Chinese Organization and Institution Names[J]. Journal of Chinese Information Processing, 1997, 11(4): 22-33.)
Farmakiotou D, Karkaletsis V, Koutsias J, et al.Rule-Based Named Entity Recognition for Greek Financial Texts[C]// Proceedings of the Workshop on Computational Lexicography & Multimedia Dictionaries. 2000: 75-78.
(Wang Ning, Ge Ruifang, Yuan Chunfa, et al.Company Name Identification in Chinese Financial Domain[J]. Journal of Chinese Information Processing, 2002, 16(2): 1-6.)
Piskorski J.Rule-based Named-Entity Recognition for Polish[C]//Proceedings of the Workshop on Named-Entity Recognition for NLP Applications held in Conjunction with the 1st International Joint Conference on NLP. 2004.
(Yang Jinfeng, Yu Qiubin, Guan Yi, et al.An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction[J]. Acta Automatica Sinica, 2014, 40(8): 1537-1562)
(Liu Feifan, Zhao Jun, Lv Bibo, et al.Study on Product Named Entity Recognition for Business Information Extraction[J]. Journal of Chinese Information Processing, 2006, 20(1): 7-13.)
Zhou H, Chen J, Dong G, et al.Detection and Diagnosis of Bearing Faults Using Shift-invariant Dictionary Learning and Hidden Markov Model[J]. Mechanical Systems & Signal Processing, 2015(72-73): 65-79.
(Li Lishuang, Huang Degen, Mao Tingting, et al.Auto Recognition of Person Names from Chinese Texts Based on Support Vector Machines[J]. Computer Engineering, 2006, 32(19): 188-190.)
(Wang Hao, Deng Sanhong.Comparative Study on HMM and CRFs Applying in Information Extraction[J]. New Technology of Library and Information Service, 2007(12): 57-63.)
(Shan Heyuan, Zhang Haisu, Wu Zhaolin.A Military Named Entity Recognition Method Based on CRFs with Small Granularity Strategy[J]. Journal of Armored Force Engineering Institute, 2017, 31(1): 84-89.)