Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (7): 89-100    DOI: 10.11925/infotech.2096-3467.2018.0057
Current Issue | Archive | Adv Search |
Extracting Names of Historical Events Based on Chinese Character Tags
Huihui Tang,Hao Wang(),Zixuan Zhang,Xueying Wang
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
Download: PDF(540 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      

[Objective] This paper proposes a model to extract the names of Chinese historical events, aiming to reorganize knowledge from texts and construct the ontology for these events. [Methods] We built the proposed model with conditional random fields(CRFs) and automatically tagging technology, based on the historical texts of the Wei, Jin, Northern and Southern Dynasties. Then, we explored the influence of different Chinese characters and features on recognizing event names. [Results] We constructed the best model based on the features of characters and the surnames. The F1 value of this model was as high as 98.74%. This model was examined with two open scenarios and achieved good results. [Limitations] The size of our training corpus needs to be expanded. More research is needed to compare results of single Chinese character tags and the phrases. [Conclusions] The CRFs model could effectively identify the names of Chinese historical events under appropriate working conditions.

Key wordsHistorical Event Name      Conditional Random Fields      Chinese Character Role Labeling      Named Entity Recognition      Ontology Learning     
Received: 15 January 2018      Published: 15 August 2018

Cite this article:

Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags. Data Analysis and Knowledge Discovery, 2018, 2(7): 89-100.

URL:     OR

[1] Grishman R, Sundheim B.Message Understanding Conference-6: A Brief History[C]//Proceedings of the 16th Conference on Computational Linguistics-Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics, 1996: 466-471.
[2] Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. 1991: 29-32.
[3] Kazama J, Torisawa K.Exploiting Wikipedia as External Knowledge for Named Entity Recognition[C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. DBLP, 2009:698-707.
[4] Pai N S, Kuang H J, Chang T Y, et al.Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition[J]. Mathematical Problems in Engineering, 2014. Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition[J]. Mathematical Problems in Engineering, 2014. .
[5] Ahmed I, Sathyraj R.Named Entity Recognition by Using Maximum Entropy[J]. International Journal of Database Theory & Application, 2015, 8(2): 43-50.
[6] 珠杰, 李天瑞, 刘胜久. 基于条件随机场的藏文人名识别技术研究[J]. 南京大学学报: 自然科学版, 2016, 52(2): 289-299.
[6] (Zhu Jie, Li Tianrui, Liu Shengjiu.Research on Tibetan Name Recognition Technology Under CRF[J]. Journal of Nanjing University: Natural Science, 2016, 52(2): 289-299.)
[7] 邬伦, 刘磊, 李浩然, 等. 基于条件随机场的中文地名识别方法[J]. 武汉大学学报: 信息科学版, 2017, 42(2): 150-156.
[7] (Wu Lun, Liu Lei, Li Haoran, et al.A Chinese Toponym Recognition Method Based on Conditional Random Field[J]. Journal of Wuhan University: Geomatics and Infomation Science, 2017, 42(2): 150-156.)
[8] 万业号, 刘利军, 黄青松. 基于层叠条件随机场的中文医疗机构名识别[J]. 济南大学学报: 自然科学版, 2017, 31(1): 61-66.
[8] (Wan Yehao, Liu Lijun, Huang Qingsong.Name Recognition of Chinese Medical Institutions Based on Cascading Conditional Random Fields[J]. Journal of Jinan University: Science and Technology, 2017, 31(1): 61-66.)
[9] 黄水清, 王东波, 何琳. 基于先秦语料库的古汉语地名自动识别模型构建研究[J]. 图书情报工作, 2015, 59(12): 135-140.
[9] (Huang Shuiqing, Wang Dongbo, He Lin.Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J]. Library and Information Service, 2015, 59(12): 135-140.)
[10] 张小衡, 王玲玲. 中文机构名称的识别与分析[J]. 中文信息学报, 1997, 11(4): 22-33.
[10] (Zhang Xiaoheng, Wang Lingling.Identification and Analysis of Chinese Organization and Institution Names[J]. Journal of Chinese Information Processing, 1997, 11(4): 22-33.)
[11] Farmakiotou D, Karkaletsis V, Koutsias J, et al.Rule-Based Named Entity Recognition for Greek Financial Texts[C]// Proceedings of the Workshop on Computational Lexicography & Multimedia Dictionaries. 2000: 75-78.
[12] 王宁, 葛瑞芳, 苑春法, 等. 中文金融新闻中公司名的识别[J]. 中文信息学报, 2002, 16(2): 1-6.
[12] (Wang Ning, Ge Ruifang, Yuan Chunfa, et al.Company Name Identification in Chinese Financial Domain[J]. Journal of Chinese Information Processing, 2002, 16(2): 1-6.)
[13] Piskorski J.Rule-based Named-Entity Recognition for Polish[C]//Proceedings of the Workshop on Named-Entity Recognition for NLP Applications held in Conjunction with the 1st International Joint Conference on NLP. 2004.
[14] 杨锦锋, 于秋滨, 关毅, 等. 电子病历命名实体识别和实体关系抽取研究综述[J]. 自动化学报, 2014, 40(8): 1537-1562.
[14] (Yang Jinfeng, Yu Qiubin, Guan Yi, et al.An Overview of Research on Electronic Medical Record Oriented Named Entity Recognition and Entity Relation Extraction[J]. Acta Automatica Sinica, 2014, 40(8): 1537-1562)
[15] 刘非凡, 赵军, 吕碧波, 等. 面向商务信息抽取的产品命名实体识别研究[J]. 中文信息学报, 2006, 20(1): 7-13.
[15] (Liu Feifan, Zhao Jun, Lv Bibo, et al.Study on Product Named Entity Recognition for Business Information Extraction[J]. Journal of Chinese Information Processing, 2006, 20(1): 7-13.)
[16] Zhou H, Chen J, Dong G, et al.Detection and Diagnosis of Bearing Faults Using Shift-invariant Dictionary Learning and Hidden Markov Model[J]. Mechanical Systems & Signal Processing, 2015(72-73): 65-79.
[17] 卢达威, 宋柔. 基于最大熵模型的汉语标点句缺失话题自动识别初探[J]. 计算机工程与科学, 2015, 37(12): 2282-2293.
[17] (Lu Dawei, Song Rou.Automatic Recognition of the Absent Topics in Chinese Punctuation Clauses Based on Maximum Entropy Model[J]. Computer Engineering & Science, 2015, 37(12): 2282-2293.)
[18] 李丽双, 黄德根, 毛婷婷, 等. 基于支持向量机的中国人名的自动识别[J]. 计算机工程, 2006, 32(19): 188-190.
[18] (Li Lishuang, Huang Degen, Mao Tingting, et al.Auto Recognition of Person Names from Chinese Texts Based on Support Vector Machines[J]. Computer Engineering, 2006, 32(19): 188-190.)
[19] 李培峰, 周国栋, 朱巧明. 基于语义的中文事件触发词抽取联合模型[J]. 软件学报, 2016, 27(2): 280-294.
[19] (Li Peifeng, Zhou Guodong, Zhu Qiaoming.Semantics-Based Joint Model of Chinese Event Trigger Extraction[J]. Journal of Software, 2016, 27(2): 280-294.)
[20] 肖升, 何炎祥. 基于动词论元结构的中文事件抽取方法[J]. 计算机科学, 2012, 39(5): 161-164.
[20] (Xiao Sheng, He Yanxiang.Approach of Chinese Event IE Based on Verb Argument Structure[J]. Computer Science, 2012, 39(5): 161-164.)
[21] 魏勇, 李响, 王丰. 运用文本处理框架抽取中文事件[J]. 测绘科学, 2016, 41(4): 190-194.
[21] (Wei Yong, Li Xiang, Wang Feng.Study on Chinese Event Extraction Based on GATE[J]. Science of Surveying and Mapping, 2016, 41(4): 190-194.)
[22] 黄海, 张海玉. 基于GATE的中文事件抽取方法[J]. 山东农业工程学院学报, 2017,34(5):41-46.
[22] (Huang Hai, Zhang Haiyu.Study on the Chinese Event Extraction Model Based on GATE[J]. Journal of Shandong Agriculture and Engineering University, 2017, 34(5): 41-46.)
[23] 付剑锋, 刘宗田, 付雪峰, 等. 基于依存分析的事件识别[J]. 计算机科学, 2009, 36(11): 217-219.
[23] (Fu Jianfeng, Liu Zongtian, Fu Xuefeng, et al.Dependency Parsing Based Event Recognition[J]. Computer Science, 2009, 36(11): 217-219.)
[24] 胡博磊, 贺瑞芳, 孙宏, 等. 基于条件随机域的中文事件类型识别[J]. 模式识别与人工智能, 2012, 25(3): 445-449.
[24] (Hu Bolei, He Ruifang, Sun Hong, et al.Chinese Event Type Recognition Based on Conditional Random Fields[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(3): 445-449.)
[25] 张贺. 基于改进HMMs的中文原子事件抽取方法[D]. 武汉:武汉科技大学, 2016.
[25] (Zhang He.A Chinese Atomic Event Extraction Method Based on Improved HMMs[D]. Wuhan:Wuhan University of Science and Technology, 2016.)
[26] 赵妍妍, 秦兵, 车万翔, 等. 中文事件抽取技术研究[J]. 中文信息学报, 2008, 22(1): 3-8.
[26] (Zhao Yanyan, Qin Bing, Che Wanxiang, et al.Research on Chinese Event Extraction[J]. Journal of Chinese Information Processing, 2008, 22(1): 3-8.)
[27] 何中市, 刘莉, 邢欣来, 等. 基于语义角色的中文事件识别[J]. 计算机工程与科学, 2013, 35(4): 181-185.
[27] (He Zhongshi, Liu Li, Xing Xinlai, et al.Chinese Event Recognition Based on Semantic Role[J]. Computer Engineering & Science, 2013, 35(4): 181-185.)
[28] Song D, Liu W, Zhou T, et al.Efficient Robust Conditional Random Fields[J]. IEEE Transactions on Image Processing, 2015, 24(10): 3124-3136.
[29] 王昊, 邓三鸿. HMM和CRFs在信息抽取应用中的比较研究[J]. 现代图书情报技术, 2007(12):57-63.
[29] (Wang Hao, Deng Sanhong.Comparative Study on HMM and CRFs Applying in Information Extraction[J]. New Technology of Library and Information Service, 2007(12): 57-63.)
[30] 单赫源, 张海粟, 吴照林. 小粒度策略下基于CRFs的军事命名实体识别方法[J]. 装甲兵工程学院学报, 2017, 31(1): 84-89.
[30] (Shan Heyuan, Zhang Haisu, Wu Zhaolin.A Military Named Entity Recognition Method Based on CRFs with Small Granularity Strategy[J]. Journal of Armored Force Engineering Institute, 2017, 31(1): 84-89.)
[31] 王密平, 王昊, 邓三鸿, 等. 基于CRFs的冶金领域中文专利术语抽取研究[J]. 现代图书情报技术, 2016(6):28-36.
[31] (Wang Miping, Wang Hao, Deng Sanhong, et al.Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J]. New Technology of Library and Information Service, 2016(6): 28-36.)
[32] 孙晓, 孙重远, 任福继. 基于深层条件随机场的生物医学命名实体识别[J]. 模式识别与人工智能, 2016, 29(11): 997-1008.
[32] (Sun Xiao, Sun Chongyuan, Ren Fuji.Biomedical Named Entity Recognition Based on Deep Conditional Random Fields[J]. Pattern Recognition and Artificial Intelligence, 2016, 29(11): 997-1008.)
[33] 王昊, 王密平, 苏新宁. 面向本体学习的中文专利术语抽取研究[J]. 情报学报, 2016, 35(6): 573-585.
[33] (Wang Hao, Wang Miping, Su Xinning.A Study on Chinese Patent Terms Extraction for Ontology Learning[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 573-585.)
[34] 中国事件[OL]. [2017-09-18]..
[34] (Chinese Event[OL]. [2017-09-18]..)
[35] 王仲荦. 魏晋南北朝史[M]. 上海:上海人民出版社, 2003: 983-985, 990-994.
[35] (Wang Zhongluo.The History of the Wei,Jin and the Southern and Northern Dynasties [M]. Shanghai: Shanghai People’s Publishing House, 2003: 983-985, 990-994.)
[36] Selenium[CP/OL]. [2017-09-18]. .
[37] 史仲文, 胡晓林. 中国全史[M]. 北京: 中国书籍出版社, 2011.
[37] (Shi Zhongwen, Hu Xiaolin.The Whole History of China[M]. Beijing: China Book Publishing House, 2011.)
[38] 李延寿. 南史[M]. 北京: 中华书局, 1975.
[38] (Li Yanshou.The History of Southern Dynasties[M]. Beijing: China Publishing House, 1975.)
[39] 范文澜. 中国通史[M]. 北京: 人民出版社, 1978.
[39] (Fan Wenlan.General History of China[M]. Beijing: Chinese People’s Publishing House, 1978.)
[40] 姚思廉. 梁书[M]. 北京: 中华书局, 1973.
[40] (Yao Silian.Book of Liang[M]. Beijing: China Publishing House, 1973.)
[41] 沈起炜. 细说两晋南北朝[M]. 上海:上海人民出版社, 2013.
[41] (Shen Qiwei.Detailed Two Jins and Southern and Northern Dynasties[M]. Shanghai: Shanghai People’s Publishing House, 2013.)
[42] 姚思廉. 陈书[M]. 北京:中华书局, 1972.
[42] (Yao Silian.Book of Chen[M]. Beijing: China Publishing House, 1972.)
[43] 陈寅恪. 魏晋南北朝史讲演录[M]. 贵阳: 贵州人民出版社, 2007.
[43] (Chen Yinke. Wei, Jin, Southern and Northern Dynasties History Lecture[M]. Guiyang: Guizhou People’s Publishing House, 2007.)
[44] 李百药. 北齐书[M]. 北京:中华书局, 1972.
[44] (Li Baiyao.Book of Northern Qi[M]. Beijing: China Publishing House, 1972.)
[45] 罗贯中. 三国演义[M]. 北京: 人民文学出版社, 1998.
[45] (Luo Guanzhong.The Romance of the Three Kingdoms[M]. Beijing: People’s Literature Publishing House, 1998.)
[46] 魏收. 魏书[M]. 北京: 中华书局, 1997.
[46] (Wei Shou.Book of Wei[M]. Beijing: China Publishing House, 1997.)
[47] 陈寿, 等. 三国志[M]. 北京:中华书局, 2006.
[47] (Chen Shou, et al.The Records of Three Kingdoms[M]. Beijing: China Publishing House, 2006.)
[48] 房玄龄, 等. 晋书[M]. 北京: 中华书局, 1996.
[48] (Fang Xuanling, et al.Book of Jin[M]. Beijing: China Publishing House, 1996.)
[49] Grossman L. Wikiwand[EB/OL]. [2017-09-18]. .
[50] 中华姓氏大全[EB/OL]. [2017-09-22]..
[50] (Chinese Surname[EB/OL]. [2017-09-22]..)
[51] 百度百科[EB/OL]. [2017-10-30]..
[51] (Baidu Baike[EB/OL]. [2017-10-30]..)
[52] 中国历史事件[EB/OL]. [2017-10-30]..
[52] (Chinese Historical Events[EB/OL]. [2017-10-30]..)
[1] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[2] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[3] Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[4] Xiaoyu Wang,Bin Li. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[5] Dongbo Wang,Yi Wu,Wenhao Ye,Ruilun Liu. Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[6] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[7] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[8] Hui Zhu,Jianlin Yang,Hao Wang. Study on Construction of Domain Terminology Taxonomic Relation[J]. 现代图书情报技术, 2016, 32(1): 73-80.
[9] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[10] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] Zeng Zhen, Lv Xueqiang, Li Zhuo. The Automatic Identification of Chinese Names in Query Logs[J]. 现代图书情报技术, 2014, 30(12): 71-77.
[12] Wang Run,He Lin,Wang Dongbo,Huang Shuiqing,Fan Yuanbiao. Research on Plant Growth and Development Stage Named Entity Recognition for Text Mining[J]. 现代图书情报技术, 2014, 30(1): 24-27.
[13] Lin Chen, Wang Lancheng. Object Recognition of Network Comments Based on Conditional Random Fields[J]. 现代图书情报技术, 2013, (6): 63-67.
[14] Gu Jun, Xu Xin. Study on Ontology Relation Extraction in Chinese Patent Documents[J]. 现代图书情报技术, 2013, 29(10): 73-78.
[15] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938