Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (7/8): 63-68     https://doi.org/10.11925/infotech.1003-3513.2013.07-08.09
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
先秦古汉语典籍中的人名自动识别研究
汤亚芬
南京农业大学信息科学技术学院 南京 210095
Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics
Tang Yafen
College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
全文: PDF (590 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 以数字人文的文本挖掘和分析这一研究内容为切入点,通过条件随机场这一机器学习模型,在先秦语料库的基础上自动识别古汉语人名。在规模为187 901个词汇的先秦语料上,把调和平均值为91.52%的交叉验证语料所训练的模型确定为古汉语人名自动识别的最优模型,并进行实验验证。本研究不仅有助于先秦古文献命名实体的抽取,而且也有益于其他人文学科对先秦人物关系和背景的探究。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
汤亚芬
关键词 条件随机场古汉语人名特征模板先秦语料库    
Abstract:The ancient Chinese name is automatically recognized by the machine learning model of Conditional Random Field based on Pre-Qin corpus from a point on the research of text mining and analysis of digital humanities. The training model, the F-score of which is 91.52% in cross-validation corpus, is identified as the optimal performance of ancient Chinese name recognition and experimentally verified based on Pre-Qin corpus containing 187 901 words. The research is not only helpful to extract the named entity from Pre-Qin ancient literature but also beneficial to explore the relationship and background among people in other humanities and social sciences.
Key wordsConditional Random Field    Ancient Chinese name    Feature template    Pre-Qin corpus
收稿日期: 2013-06-13      出版日期: 2013-09-02
: 

TP391

 
  G353.1

 
通讯作者: 汤亚芬     E-mail: tyf@njau.edu.cn
引用本文:   
汤亚芬. 先秦古汉语典籍中的人名自动识别研究[J]. 现代图书情报技术, 2013, 29(7/8): 63-68.
Tang Yafen. Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics. New Technology of Library and Information Service, 2013, 29(7/8): 63-68.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.07-08.09      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2013/V29/I7/8/63
[1] Busa R.The Annals of Humanities Computing: The Index Thomisticus[J].Computers and the Humanities,1980,14(2):83-90.
[2] Unsworth J.What is Humanities Computing and What is Not?[EB/OL].[2013-05-26]. http://computerphilologie.uni-muenchen.de/jg02/unsworth.html.
[3] Li L.Recognizing Chinese Person Names Based on Hybrid Models[J] .International Journal of Advanced Intelligence,2012,3(2):219-228.
[4] Wen B,Xiao S B,Luo Y,et al.Unsupervised Chinese Personal Name Recognition Using Search Session[J].Journal of Computational Information Systems,2013,9(6):2201-2208.
[5] Wan X J,Zong L,Huang X J, et al.Named Entity Recognition in Chinese News Comments on the Web[C].In:Proceedings of the 5th International Joint Conference on Natural Language Processing,Chiang Mai,Thailand.2011:856-864.
[6] Tian W,Pan X,Yu Z, et al.Chinese Name Disambiguation Based on Adaptive Clustering with the Attribute Features[C].In:Proceedings of the 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing.2012:132-137.
[7] 毛婷婷,李丽双,黄德根.基于混合模型的中国人名自动识别[J]. 中文信息学报,2007,21(2):22-28.(Mao Tingting,Li Lishuang,Huang Degen.Recognizing Chinese Person Names Based on Hybrid Models[J].Journal of Chinese Information Processing,2007,21(2):22-28.)
[8] 蒋才智,王浩,姚宏亮.基于知网的贝叶斯中文人名识别[J]. 南京大学学报:自然科学版,2012,48(2):147-153.(Jiang Caizhi,Wang Hao,Yao Hongliang.Chinese Name Recognition Based on HowNet and Bayesian Classifier[J].Journal of Nanjing University:Natural Sciences Edition,2012,48(2):147-153.)
[9] 赵晓凡,赵丹,刘永革.利用CRF实现中文人名性别的自动识别[J]. 微电子学与计算机,2011,28(10):122-124,128.(Zhao Xiaofan,Zhao Dan,Liu Yongge.The Automatic Gender Recognition of Chinese Name Using Conditional Random Fields[J].Microelectronics & Computer,2011,28(10):122-124,128.)
[10] 张华平,刘群.基于角色标注的中国人名自动识别研究[J]. 计算机学报,2004,27(1):85-91.(Zhang Huaping,Liu Qun.Automatic Recognition of Chinese Personal Name Based on Role Tagging[J].Chinese Journal of Computers,2004,27(1):85-91.)
[11] 章顺瑞,游宏梁.基于层次聚类算法的中文人名消歧[J]. 现代图书情报技术,2010(11):64-68.(Zhang Shunrui,You Hongliang.Chinese People Name Disambiguation by Hierarchical Clustering[J].New Technology of Library and Information Service,2010(11):64-68.)
[12] 徐润华,陈小荷.一种利用注疏的《左传》分词新方法[J]. 中文信息学报,2012,26(2):13-17,45.(Xu Runhua,Chen Xiaohe.A Method of Segmentation on "Zuo Zhuan" by Using Commentaries[J].Journal of Chinese Information Processing,2012,26(2): 13-17,45.)
[13] 马创新,陈小荷.基于XML的《论语》与其注疏文献对齐语料库的知识表示[J]. 图书情报知识,2013(1):107-113.(Ma Chuangxin,Chen Xiaohe.The Knowledge Expression of the Analects of Confucius and Its Commentary Literatures Alignment Corpus Based on XML[J].Document,Information & Knowledge,2013(1):107-113.)
[14] 石民,李斌,陈小荷.基于CRF的先秦汉语分词标注一体化研究[J]. 中文信息学报,2010,24(2):39-45.(Shi Min,Li Bin,Chen Xiaohe.CRF Based Research on a Unified Approach to Word Segmentation and POS Tagging for Pre-Qin Chinese[J].Journal of Chinese Information Processing,2010,24(2):39-45.)
[15] 吴云芳.面向中文信息处理的现代汉语并列结构研究[D].北京:北京大学,2003.(Wu Yunfang.Coordination Study in Contemporary Chinese for Chinese Information Process[D].Beijing: Peking University,2003.)
[16] Lafferty J,McCallum A,Pereira F C N.Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C].In:Proceedings of the 18th International Conference on Machine Learning.2001:282-289.
[17] CRF + +[EB/OL].[2013-04-30].http://crfpp.sourceforge.net/.
[18] Sha F,Pereira F. Shallow Parsing with Conditional Random Fields[C].In:Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language.2003:134-141.
[19] Atterer M, Schütze H.Prepositional Phrase Attachment without Oracles[J].Computational Linguistics, 2007, 33(4):469-476.
[1] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[2] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[3] 赵平,孙连英,涂帅,卞建玲,万莹. 改进的知识迁移景点实体识别算法研究及应用*[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[4] 李成梁,赵中英,李超,亓亮,温彦. 基于依存关系嵌入与条件随机场的商品属性抽取方法*[J]. 数据分析与知识发现, 2020, 4(5): 54-65.
[5] 黄菡,王宏宇,王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别*[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[6] 肖连杰,孟涛,王伟,吴志祥. 基于深度学习的情报分析方法识别研究 * ——以安全情报领域为例[J]. 数据分析与知识发现, 2019, 3(10): 20-28.
[7] 唐慧慧, 王昊, 张紫玄, 王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[8] 王东波, 吴毅, 叶文豪, 刘睿伦. 多特征知识下的食品安全事件实体抽取研究*[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[9] 张越, 王东波, 朱丹浩. 面向食品安全突发事件汉语分词的特征选择及模型优化研究*[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[10] 张琳, 秦策, 叶文豪. 基于条件随机场的法言法语实体自动识别模型研究*[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
[11] 王密平,王昊,邓三鸿,吴志祥. 基于CRFs的冶金领域中文专利术语抽取研究*[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[12] 贺惠新,刘丽娟. 主动学习的科技文献研究对象标引体系研究*[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[13] 隋明爽,崔雷. 结合多种特征的CRF模型用于化学物质-疾病命名实体识别[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[14] 段宇锋, 朱雯晶, 陈巧, 刘伟, 刘凤红. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[15] 姜春涛. 自动标注中文专利的引文信息[J]. 现代图书情报技术, 2015, 31(10): 81-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn