Abstract:The ancient Chinese name is automatically recognized by the machine learning model of Conditional Random Field based on Pre-Qin corpus from a point on the research of text mining and analysis of digital humanities. The training model, the F-score of which is 91.52% in cross-validation corpus, is identified as the optimal performance of ancient Chinese name recognition and experimentally verified based on Pre-Qin corpus containing 187 901 words. The research is not only helpful to extract the named entity from Pre-Qin ancient literature but also beneficial to explore the relationship and background among people in other humanities and social sciences.
汤亚芬. 先秦古汉语典籍中的人名自动识别研究[J]. 现代图书情报技术, 2013, 29(7/8): 63-68.
Tang Yafen. Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics. New Technology of Library and Information Service, 2013, 29(7/8): 63-68.
[1] Busa R.The Annals of Humanities Computing: The Index Thomisticus[J].Computers and the Humanities,1980,14(2):83-90.[2] Unsworth J.What is Humanities Computing and What is Not?[EB/OL].[2013-05-26]. http://computerphilologie.uni-muenchen.de/jg02/unsworth.html.[3] Li L.Recognizing Chinese Person Names Based on Hybrid Models[J] .International Journal of Advanced Intelligence,2012,3(2):219-228.[4] Wen B,Xiao S B,Luo Y,et al.Unsupervised Chinese Personal Name Recognition Using Search Session[J].Journal of Computational Information Systems,2013,9(6):2201-2208.[5] Wan X J,Zong L,Huang X J, et al.Named Entity Recognition in Chinese News Comments on the Web[C].In:Proceedings of the 5th International Joint Conference on Natural Language Processing,Chiang Mai,Thailand.2011:856-864.[6] Tian W,Pan X,Yu Z, et al.Chinese Name Disambiguation Based on Adaptive Clustering with the Attribute Features[C].In:Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing.2012:132-137.[7] 毛婷婷,李丽双,黄德根.基于混合模型的中国人名自动识别[J]. 中文信息学报,2007,21(2):22-28.(Mao Tingting,Li Lishuang,Huang Degen.Recognizing Chinese Person Names Based on Hybrid Models[J].Journal of Chinese Information Processing,2007,21(2):22-28.)[8] 蒋才智,王浩,姚宏亮.基于知网的贝叶斯中文人名识别[J]. 南京大学学报:自然科学版,2012,48(2):147-153.(Jiang Caizhi,Wang Hao,Yao Hongliang.Chinese Name Recognition Based on HowNet and Bayesian Classifier[J].Journal of Nanjing University:Natural Sciences Edition,2012,48(2):147-153.)[9] 赵晓凡,赵丹,刘永革.利用CRF实现中文人名性别的自动识别[J]. 微电子学与计算机,2011,28(10):122-124,128.(Zhao Xiaofan,Zhao Dan,Liu Yongge.The Automatic Gender Recognition of Chinese Name Using Conditional Random Fields[J].Microelectronics & Computer,2011,28(10):122-124,128.)[10] 张华平,刘群.基于角色标注的中国人名自动识别研究[J]. 计算机学报,2004,27(1):85-91.(Zhang Huaping,Liu Qun.Automatic Recognition of Chinese Personal Name Based on Role Tagging[J].Chinese Journal of Computers,2004,27(1):85-91.)[11] 章顺瑞,游宏梁.基于层次聚类算法的中文人名消歧[J]. 现代图书情报技术,2010(11):64-68.(Zhang Shunrui,You Hongliang.Chinese People Name Disambiguation by Hierarchical Clustering[J].New Technology of Library and Information Service,2010(11):64-68.)[12] 徐润华,陈小荷.一种利用注疏的《左传》分词新方法[J]. 中文信息学报,2012,26(2):13-17,45.(Xu Runhua,Chen Xiaohe.A Method of Segmentation on "Zuo Zhuan" by Using Commentaries[J].Journal of Chinese Information Processing,2012,26(2): 13-17,45.)[13] 马创新,陈小荷.基于XML的《论语》与其注疏文献对齐语料库的知识表示[J]. 图书情报知识,2013(1):107-113.(Ma Chuangxin,Chen Xiaohe.The Knowledge Expression of the Analects of Confucius and Its Commentary Literatures Alignment Corpus Based on XML[J].Document,Information & Knowledge,2013(1):107-113.)[14] 石民,李斌,陈小荷.基于CRF的先秦汉语分词标注一体化研究[J]. 中文信息学报,2010,24(2):39-45.(Shi Min,Li Bin,Chen Xiaohe.CRF Based Research on a Unified Approach to Word Segmentation and POS Tagging for Pre-Qin Chinese[J].Journal of Chinese Information Processing,2010,24(2):39-45.)[15] 吴云芳.面向中文信息处理的现代汉语并列结构研究[D].北京:北京大学,2003.(Wu Yunfang.Coordination Study in Contemporary Chinese for Chinese Information Process[D].Beijing: Peking University,2003.)[16] Lafferty J,McCallum A,Pereira F C N.Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C].In:Proceedings of the 18th International Conference on Machine Learning.2001:282-289.[17] CRF + +[EB/OL].[2013-04-30].http://crfpp.sourceforge.net/.[18] Sha F,Pereira F. Shallow Parsing with Conditional Random Fields[C].In:Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language.2003:134-141.[19] Atterer M, Schütze H.Prepositional Phrase Attachment without Oracles[J].Computational Linguistics, 2007, 33(4):469-476.