Extracting Names of Historical Events Based on Chinese Character Tags
Tang Huihui, Wang Hao(), Zhang Zixuan, Wang Xueying
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper proposes a model to extract the names of Chinese historical events, aiming to reorganize knowledge from texts and construct the ontology for these events. [Methods] We built the proposed model with conditional random fields(CRFs) and automatically tagging technology, based on the historical texts of the Wei, Jin, Northern and Southern Dynasties. Then, we explored the influence of different Chinese characters and features on recognizing event names. [Results] We constructed the best model based on the features of characters and the surnames. The F1 value of this model was as high as 98.74%. This model was examined with two open scenarios and achieved good results. [Limitations] The size of our training corpus needs to be expanded. More research is needed to compare results of single Chinese character tags and the phrases. [Conclusions] The CRFs model could effectively identify the names of Chinese historical events under appropriate working conditions.
唐慧慧, 王昊, 张紫玄, 王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags. Data Analysis and Knowledge Discovery, 2018, 2(7): 89-100.
Grishman R, Sundheim B.Message Understanding Conference-6: A Brief History[C]//Proceedings of the 16th Conference on Computational Linguistics-Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics, 1996: 466-471.
[2]
Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. 1991: 29-32.
[3]
Kazama J, Torisawa K.Exploiting Wikipedia as External Knowledge for Named Entity Recognition[C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. DBLP, 2009:698-707.
[4]
Pai N S, Kuang H J, Chang T Y, et al.Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition[J]. Mathematical Problems in Engineering, 2014. Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition[J]. Mathematical Problems in Engineering, 2014. .
[5]
Ahmed I, Sathyraj R.Named Entity Recognition by Using Maximum Entropy[J]. International Journal of Database Theory & Application, 2015, 8(2): 43-50.
(Zhu Jie, Li Tianrui, Liu Shengjiu.Research on Tibetan Name Recognition Technology Under CRF[J]. Journal of Nanjing University: Natural Science, 2016, 52(2): 289-299.)
doi: 10.13232/j.cnki.jnju.2016.02.010
(Wu Lun, Liu Lei, Li Haoran, et al.A Chinese Toponym Recognition Method Based on Conditional Random Field[J]. Journal of Wuhan University: Geomatics and Infomation Science, 2017, 42(2): 150-156.)
(Wan Yehao, Liu Lijun, Huang Qingsong.Name Recognition of Chinese Medical Institutions Based on Cascading Conditional Random Fields[J]. Journal of Jinan University: Science and Technology, 2017, 31(1): 61-66.)
(Huang Shuiqing, Wang Dongbo, He Lin.Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J]. Library and Information Service, 2015, 59(12): 135-140.)
doi: 10.13266/j.issn.0252-3116.2015.012.020
(Zhang Xiaoheng, Wang Lingling.Identification and Analysis of Chinese Organization and Institution Names[J]. Journal of Chinese Information Processing, 1997, 11(4): 22-33.)
[11]
Farmakiotou D, Karkaletsis V, Koutsias J, et al.Rule-Based Named Entity Recognition for Greek Financial Texts[C]// Proceedings of the Workshop on Computational Lexicography & Multimedia Dictionaries. 2000: 75-78.
(Wang Ning, Ge Ruifang, Yuan Chunfa, et al.Company Name Identification in Chinese Financial Domain[J]. Journal of Chinese Information Processing, 2002, 16(2): 1-6.)
doi: 10.3969/j.issn.1003-0077.2002.02.001
[13]
Piskorski J.Rule-based Named-Entity Recognition for Polish[C]//Proceedings of the Workshop on Named-Entity Recognition for NLP Applications held in Conjunction with the 1st International Joint Conference on NLP. 2004.
(Yang Jinfeng, Yu Qiubin, Guan Yi, et al.An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction[J]. Acta Automatica Sinica, 2014, 40(8): 1537-1562)
doi: 10.3724/SP.J.1004.2014.01537
(Liu Feifan, Zhao Jun, Lv Bibo, et al.Study on Product Named Entity Recognition for Business Information Extraction[J]. Journal of Chinese Information Processing, 2006, 20(1): 7-13.)
[16]
Zhou H, Chen J, Dong G, et al.Detection and Diagnosis of Bearing Faults Using Shift-invariant Dictionary Learning and Hidden Markov Model[J]. Mechanical Systems & Signal Processing, 2015(72-73): 65-79.
(Lu Dawei, Song Rou.Automatic Recognition of the Absent Topics in Chinese Punctuation Clauses Based on Maximum Entropy Model[J]. Computer Engineering & Science, 2015, 37(12): 2282-2293.)
(Li Lishuang, Huang Degen, Mao Tingting, et al.Auto Recognition of Person Names from Chinese Texts Based on Support Vector Machines[J]. Computer Engineering, 2006, 32(19): 188-190.)
doi: 10.3969/j.issn.1000-3428.2006.19.069
(Li Peifeng, Zhou Guodong, Zhu Qiaoming.Semantics-Based Joint Model of Chinese Event Trigger Extraction[J]. Journal of Software, 2016, 27(2): 280-294.)
doi: 10.13328/j.cnki.jos.004833
(Xiao Sheng, He Yanxiang.Approach of Chinese Event IE Based on Verb Argument Structure[J]. Computer Science, 2012, 39(5): 161-164.)
doi: 10.3969/j.issn.1002-137X.2012.05.037
(Wei Yong, Li Xiang, Wang Feng.Study on Chinese Event Extraction Based on GATE[J]. Science of Surveying and Mapping, 2016, 41(4): 190-194.)
doi: 10.16251/j.cnki.1009-2307.2016.04.037
(Huang Hai, Zhang Haiyu.Study on the Chinese Event Extraction Model Based on GATE[J]. Journal of Shandong Agriculture and Engineering University, 2017, 34(5): 41-46.)
(Fu Jianfeng, Liu Zongtian, Fu Xuefeng, et al.Dependency Parsing Based Event Recognition[J]. Computer Science, 2009, 36(11): 217-219.)
doi: 10.3969/j.issn.1002-137X.2009.11.053
(Hu Bolei, He Ruifang, Sun Hong, et al.Chinese Event Type Recognition Based on Conditional Random Fields[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(3): 445-449.)
[25]
张贺. 基于改进HMMs的中文原子事件抽取方法[D]. 武汉:武汉科技大学, 2016.
[25]
(Zhang He.A Chinese Atomic Event Extraction Method Based on Improved HMMs[D]. Wuhan:Wuhan University of Science and Technology, 2016.)
(He Zhongshi, Liu Li, Xing Xinlai, et al.Chinese Event Recognition Based on Semantic Role[J]. Computer Engineering & Science, 2013, 35(4): 181-185.)
doi: 10.3969/j.issn.1007-130X.2013.04.032
[28]
Song D, Liu W, Zhou T, et al.Efficient Robust Conditional Random Fields[J]. IEEE Transactions on Image Processing, 2015, 24(10): 3124-3136.
doi: 10.1109/TIP.2015.2438553
pmid: 26080050
(Wang Hao, Deng Sanhong.Comparative Study on HMM and CRFs Applying in Information Extraction[J]. New Technology of Library and Information Service, 2007(12): 57-63.)
doi: 10.3969/j.issn.1003-3513.2007.12.012
(Shan Heyuan, Zhang Haisu, Wu Zhaolin.A Military Named Entity Recognition Method Based on CRFs with Small Granularity Strategy[J]. Journal of Armored Force Engineering Institute, 2017, 31(1): 84-89.)
doi: 10.3969/j.issn.1672-1497.2017.01.018
(Wang Miping, Wang Hao, Deng Sanhong, et al.Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J]. New Technology of Library and Information Service, 2016(6): 28-36.)
(Sun Xiao, Sun Chongyuan, Ren Fuji.Biomedical Named Entity Recognition Based on Deep Conditional Random Fields[J]. Pattern Recognition and Artificial Intelligence, 2016, 29(11): 997-1008.)
(Wang Hao, Wang Miping, Su Xinning.A Study on Chinese Patent Terms Extraction for Ontology Learning[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 573-585.)
(Wang Zhongluo.The History of the Wei,Jin and the Southern and Northern Dynasties [M]. Shanghai: Shanghai People’s Publishing House, 2003: 983-985, 990-994.)
[36]
Selenium[CP/OL]. [2017-09-18]. .
[37]
史仲文, 胡晓林. 中国全史[M]. 北京: 中国书籍出版社, 2011.
[37]
(Shi Zhongwen, Hu Xiaolin.The Whole History of China[M]. Beijing: China Book Publishing House, 2011.)
[38]
李延寿. 南史[M]. 北京: 中华书局, 1975.
[38]
(Li Yanshou.The History of Southern Dynasties[M]. Beijing: China Publishing House, 1975.)
[39]
范文澜. 中国通史[M]. 北京: 人民出版社, 1978.
[39]
(Fan Wenlan.General History of China[M]. Beijing: Chinese People’s Publishing House, 1978.)
[40]
姚思廉. 梁书[M]. 北京: 中华书局, 1973.
[40]
(Yao Silian.Book of Liang[M]. Beijing: China Publishing House, 1973.)
[41]
沈起炜. 细说两晋南北朝[M]. 上海:上海人民出版社, 2013.
[41]
(Shen Qiwei.Detailed Two Jins and Southern and Northern Dynasties[M]. Shanghai: Shanghai People’s Publishing House, 2013.)
[42]
姚思廉. 陈书[M]. 北京:中华书局, 1972.
[42]
(Yao Silian.Book of Chen[M]. Beijing: China Publishing House, 1972.)
[43]
陈寅恪. 魏晋南北朝史讲演录[M]. 贵阳: 贵州人民出版社, 2007.
[43]
(Chen Yinke. Wei, Jin, Southern and Northern Dynasties History Lecture[M]. Guiyang: Guizhou People’s Publishing House, 2007.)
[44]
李百药. 北齐书[M]. 北京:中华书局, 1972.
[44]
(Li Baiyao.Book of Northern Qi[M]. Beijing: China Publishing House, 1972.)
[45]
罗贯中. 三国演义[M]. 北京: 人民文学出版社, 1998.
[45]
(Luo Guanzhong.The Romance of the Three Kingdoms[M]. Beijing: People’s Literature Publishing House, 1998.)
[46]
魏收. 魏书[M]. 北京: 中华书局, 1997.
[46]
(Wei Shou.Book of Wei[M]. Beijing: China Publishing House, 1997.)
[47]
陈寿, 等. 三国志[M]. 北京:中华书局, 2006.
[47]
(Chen Shou, et al.The Records of Three Kingdoms[M]. Beijing: China Publishing House, 2006.)
[48]
房玄龄, 等. 晋书[M]. 北京: 中华书局, 1996.
[48]
(Fang Xuanling, et al.Book of Jin[M]. Beijing: China Publishing House, 1996.)