Please wait a minute...
New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 63-68    DOI: 10.11925/infotech.1003-3513.2013.07-08.09
article Current Issue | Archive | Adv Search |
Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics
Tang Yafen
College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF(590 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The ancient Chinese name is automatically recognized by the machine learning model of Conditional Random Field based on Pre-Qin corpus from a point on the research of text mining and analysis of digital humanities. The training model, the F-score of which is 91.52% in cross-validation corpus, is identified as the optimal performance of ancient Chinese name recognition and experimentally verified based on Pre-Qin corpus containing 187 901 words. The research is not only helpful to extract the named entity from Pre-Qin ancient literature but also beneficial to explore the relationship and background among people in other humanities and social sciences.
Key wordsConditional Random Field      Ancient Chinese name      Feature template      Pre-Qin corpus     
Received: 13 June 2013      Published: 02 September 2013
: 

TP391

 
  G353.1

 

Cite this article:

Tang Yafen. Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics. New Technology of Library and Information Service, 2013, 29(7/8): 63-68.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.07-08.09     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V29/I7/8/63

[1] Busa R.The Annals of Humanities Computing: The Index Thomisticus[J].Computers and the Humanities,1980,14(2):83-90.
[2] Unsworth J.What is Humanities Computing and What is Not?[EB/OL].[2013-05-26]. http://computerphilologie.uni-muenchen.de/jg02/unsworth.html.
[3] Li L.Recognizing Chinese Person Names Based on Hybrid Models[J] .International Journal of Advanced Intelligence,2012,3(2):219-228.
[4] Wen B,Xiao S B,Luo Y,et al.Unsupervised Chinese Personal Name Recognition Using Search Session[J].Journal of Computational Information Systems,2013,9(6):2201-2208.
[5] Wan X J,Zong L,Huang X J, et al.Named Entity Recognition in Chinese News Comments on the Web[C].In:Proceedings of the 5th International Joint Conference on Natural Language Processing,Chiang Mai,Thailand.2011:856-864.
[6] Tian W,Pan X,Yu Z, et al.Chinese Name Disambiguation Based on Adaptive Clustering with the Attribute Features[C].In:Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing.2012:132-137.
[7] 毛婷婷,李丽双,黄德根.基于混合模型的中国人名自动识别[J]. 中文信息学报,2007,21(2):22-28.(Mao Tingting,Li Lishuang,Huang Degen.Recognizing Chinese Person Names Based on Hybrid Models[J].Journal of Chinese Information Processing,2007,21(2):22-28.)
[8] 蒋才智,王浩,姚宏亮.基于知网的贝叶斯中文人名识别[J]. 南京大学学报:自然科学版,2012,48(2):147-153.(Jiang Caizhi,Wang Hao,Yao Hongliang.Chinese Name Recognition Based on HowNet and Bayesian Classifier[J].Journal of Nanjing University:Natural Sciences Edition,2012,48(2):147-153.)
[9] 赵晓凡,赵丹,刘永革.利用CRF实现中文人名性别的自动识别[J]. 微电子学与计算机,2011,28(10):122-124,128.(Zhao Xiaofan,Zhao Dan,Liu Yongge.The Automatic Gender Recognition of Chinese Name Using Conditional Random Fields[J].Microelectronics & Computer,2011,28(10):122-124,128.)
[10] 张华平,刘群.基于角色标注的中国人名自动识别研究[J]. 计算机学报,2004,27(1):85-91.(Zhang Huaping,Liu Qun.Automatic Recognition of Chinese Personal Name Based on Role Tagging[J].Chinese Journal of Computers,2004,27(1):85-91.)
[11] 章顺瑞,游宏梁.基于层次聚类算法的中文人名消歧[J]. 现代图书情报技术,2010(11):64-68.(Zhang Shunrui,You Hongliang.Chinese People Name Disambiguation by Hierarchical Clustering[J].New Technology of Library and Information Service,2010(11):64-68.)
[12] 徐润华,陈小荷.一种利用注疏的《左传》分词新方法[J]. 中文信息学报,2012,26(2):13-17,45.(Xu Runhua,Chen Xiaohe.A Method of Segmentation on "Zuo Zhuan" by Using Commentaries[J].Journal of Chinese Information Processing,2012,26(2): 13-17,45.)
[13] 马创新,陈小荷.基于XML的《论语》与其注疏文献对齐语料库的知识表示[J]. 图书情报知识,2013(1):107-113.(Ma Chuangxin,Chen Xiaohe.The Knowledge Expression of the Analects of Confucius and Its Commentary Literatures Alignment Corpus Based on XML[J].Document,Information & Knowledge,2013(1):107-113.)
[14] 石民,李斌,陈小荷.基于CRF的先秦汉语分词标注一体化研究[J]. 中文信息学报,2010,24(2):39-45.(Shi Min,Li Bin,Chen Xiaohe.CRF Based Research on a Unified Approach to Word Segmentation and POS Tagging for Pre-Qin Chinese[J].Journal of Chinese Information Processing,2010,24(2):39-45.)
[15] 吴云芳.面向中文信息处理的现代汉语并列结构研究[D].北京:北京大学,2003.(Wu Yunfang.Coordination Study in Contemporary Chinese for Chinese Information Process[D].Beijing: Peking University,2003.)
[16] Lafferty J,McCallum A,Pereira F C N.Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C].In:Proceedings of the 18th International Conference on Machine Learning.2001:282-289.
[17] CRF + +[EB/OL].[2013-04-30].http://crfpp.sourceforge.net/.
[18] Sha F,Pereira F. Shallow Parsing with Conditional Random Fields[C].In:Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language.2003:134-141.
[19] Atterer M, Schütze H.Prepositional Phrase Attachment without Oracles[J].Computational Linguistics, 2007, 33(4):469-476.
[1] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[2] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[3] Xiaoyu Wang,Bin Li. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[4] Dongbo Wang,Yi Wu,Wenhao Ye,Ruilun Liu. Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[5] Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[6] Lin Zhang,Ce Qin,Wenhao Ye. Automatic Recognition of Legal Language Entities Based on Conditional Random Fields[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
[7] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[8] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[9] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[10] Zeng Zhen, Lv Xueqiang, Li Zhuo. The Automatic Identification of Chinese Names in Query Logs[J]. 现代图书情报技术, 2014, 30(12): 71-77.
[11] Wang Hao, Zou Jieli, Deng Sanhong. Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books[J]. 现代图书情报技术, 2013, 29(7/8): 55-62.
[12] Lin Chen, Wang Lancheng. Object Recognition of Network Comments Based on Conditional Random Fields[J]. 现代图书情报技术, 2013, (6): 63-67.
[13] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
[14] Wang Hao, Deng Sanhong, Su Xinning. Research on Chinese Keywords Extraction Based on Characters Sequence Annotation[J]. 现代图书情报技术, 2011, 27(12): 39-45.
[15] Zhu Danhao Wang Dongbo Xie Jing. Automatic Identification of Prepositional Phrase Based on Conditional Random Field[J]. 现代图书情报技术, 2010, 26(7/8): 79-83.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn