Research on Complex Time Information Extraction Based on CRF Model
Lu Wanhui1,2, Ma Jianxia1
1. Lanzhou Branch of National Science Library, Chinese Academy of Sciences, Lanzhou 730000, China;
2. Graduate University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Because of the characteristic of time-serial and polymorphism of the network information, this paper presents a model of extracting the complex time information based on Conditional Random Fields(CRF), and verifies the feasibility of this model through an experiment, compares the results through choosing the features of words (contexts) and word-POS. The experiment shows that the result will be much improved if adding the POS feature.
逯万辉, 马建霞. 基于条件随机场模型的复杂时间信息抽取研究[J]. 现代图书情报技术, 2011, 27(10): 29-33.
Lu Wanhui, Ma Jianxia. Research on Complex Time Information Extraction Based on CRF Model. New Technology of Library and Information Service, 2011, 27(10): 29-33.
[1] 贺瑞芳.时序多文本文摘相关技术研究 .哈尔滨:哈尔滨工业大学,2009.[2] 赵国荣,杨尔弘.事件类时间短语识别 .见: 全国第八届计算语言学联合学术会议, 2005:335-340.[3] Chinchor N,Brown E, Ferro L, et al.1999 Named Entity Recognition Task Definition Version1.4.ftp://jaguar.ncsl.nist.gov/ace/phase1/ne99_taskdef_v1_4.pdf.[4] 车万翔,刘挺,李生.实体关系的自动抽取[J]. 中文信息学报, 2005,19(2):1-6.[5] Past TAC(Text Analysis Conference)Data . .http://www.nist.gov/tac/data/.[6] Stevenson S, Merlo P.Automatic Verb Classification Using Distributions of Grammatical Features .In:Proceedings of the 9th Conference on European Chapter of the Association for Computational Linguistics.1999:45-52.[7] 徐永东,徐志明,王晓龙,等. 中文文本时间信息获取及语义计 算[J]. 哈尔滨工业大学学报, 2007,39(3):438-442.[8] 赵国荣.中文新闻语料中的时间短语识别方法研究 .太原:山西大学,2006.[9] 贾自艳.Web信息智能获取若干关键问题研究 .北京:中国科学院研究生院,2004.[10] 王昀,苑春法.基于转换的时间-事件关系映射[J]. 中文信息学报, 2004,18(4):23-30.[11] Banko M, Cafarella M J, Soderland S, et al. Open Information Extraction from the Web . In:Proceedings of IJCAI. 2007:2670-2676.[12] 语言技术平台.http://ir.hit.edu.cn/demo/ltp/.[13] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data .In:Proceedings of the 18th International Conference on Machine Learning. 2001.[14] 基于CRF的中文分词.http://blog.csdn.net/wen718/article/details/5960820.[15] 谈论三种CRF实现的比较.http://hi.baidu.com/jrckkyy/blog/item/18ec6bf93b231255252df 29e. html.[16] 丁晟春,刘逶迤,熊霞,等.基于领域本体和语块分析的信息抽取的研究与实现[J]. 情报学报, 2008,27(1):53-58.[17] 宗萍,施水才,王涛,等.基于条件随机场的英文地理行政实体识别[J]. 现代图书情报技术, 2009(2):51-55.[18] CRF+ +:Yet Another CRF Toolkit.http://crfpp.sourceforge.net/.[19] CoNLL-2000评测工具 . .http://www.cnts.ua.ac.be/conll2000/chunking/output.html.