信息抽取技术及其在数字图书馆中的应用前景分析
张智雄
(中国科学院文献情报中心 北京 100080)
Information Extraction and Its Functions in the Digital Library
Zhang Zhixiong
(Library of Chinese Academy of Science, Beijing 100080, China)
摘要 信息抽取的目标是自动从文本信息中抽取出预先想要得到的信息(知识) , 它提供了一条从浩瀚的信息堆积中抽取出与用户相关的信息的一条思路。文章分析了信息抽取的主要概念、主要研究活动、信息抽取的类型和信息抽取系统的一般结构, 并提出在数字图书馆的建设中, 信息抽取技术能够在数字内容的自动标引、元数据获取、数据挖掘、情报研究分析、大型知识库数值库建设、参考咨询等方面发挥重要的作用。
关键词 :
信息抽取 ,
MUC ,
数字图书馆 ,
NL P
Abstract :Information Extraction (IE) is a term which has come to be applied to the activity of automatically extracting pre-specified sorts of information from natural language texts. This paper analyses the basic concept of information extraction, the main research activities on information extraction, the type of information extraction and the system of information extraction. The author believes information extraction will play a very important role in coping with the huge collection of digital information. It can provide helps in automat ic annotation of digital materials, automatic acquisition of metadata, improving data mining in information analysis, developing knowledge base from free text, and generating answers in digital reference system.
Key words :
Information Extraction(IE)
Message understanding conference
Digital library
Natural language processing
收稿日期: 2004-03-08
出版日期: 2004-06-25
通讯作者:
张智雄
E-mail: zhangzhx@mail.las.ac.cn
作者简介 : 张智雄
1 Andrew Joscelyne and Rose Lockwood,EUROMAP Final Report: BenchmarkingHLTprogressinEurope-FullReport. http://www.hltcentral.org/usrdocs/Euromap-report/EUROMAP-Final-Report-Full-May-2003.pdf (AccessedFeb.8,2004)
2 Gate Information Extraction,http://gate.ac.uk/ie/ (Accessed Feb.8,2004)
3 NLP group of University of Sheffield,Information Extraction.
http://nlp.shef.ac.uk/research/areas/ie.html (Accessed Feb.8,2004)
4 Douglas E.Appeltand DavidJ.Israel,Introduction to Informa
tion Extraction.Technology,http://www.ai.sri.com/~appelt/ie-tutorial/ (Accessed Feb.8,2004)
5 Donna Harman,Whatis Information Extraction? http://www.itl.nist.gov/iaui/894.02/relatedprojects/muc/info/whatsie.html (Accessed Feb.8,2004)
6 Hamish Cunningham,Information Extraction-a User Guide(SecondEdition),http://www.dcs.shef.ac.uk/~hamish/IE/userguide/main.html (Accessed Feb.8,2004)
7 RALI,Bilingual Information Extraction,http://www.iro.umontreal.ca/~kosseim/Extraction/ProjetEI.en.html (Accessed Feb.8,2004)
8 Jakub Piskorski&FeiyuXu,Overview of MUC and Introduction to Text Mining,http://www.dfki.de/~feiyu/HS-TM-IE/textM.ppt (Accessed Feb.8,2004)
9 Nancy A.Chinchor,OVERVIEWOFMUC-7/MET , http://www.itl.nist.gov/iaui/894.02/relatedprojects/muc/proceedings/muc7proceedings/overview.html (Accessed Feb.
8,2004)
10 E.Marsh&D.Perzanowski,MUC-7EVALUATION OF IE TECHNOLOGY:Overviewofresults,http://www.itl.nist.gov/iaui/894.02/relatedprojects/muc/proceedings/muc7proceedings/marshslides.pdf (Accessed Feb.8,2004)
11 ACE-Automatic Content Extraction,http://nist.gov/speech/tests/ace/index.htm (Accessed Feb.8,2004)
12 Diana Maynard,Kalina Bontcheva,Hamish Cunningham,To wards a semantic extraction of named entities,In Proceedings Recent Advances in Natural,Borovets,Bulgaria,2003.http://gate.ac.uk/sale/ranlp03/ranlp03.pdf (Accessed Feb.8,2004)
13 H.Cunningham etc.,GATE:A framework and graphical development environment for robust NLP tools and applications,Proceedings of the 40th Anniversary Meeting of the Associationfor Computational Linguistics(2002).http://gate.ac.uk/sale/acl02/acl-main.pdf (Accessed Feb.8,2004)
14 Valentin Tablan,GATE and Information Extraction,http://gate.ac.uk/sale/talks/gothenburg/index.html (Accessed Feb.8,2004)
15 ANNIE.http://www.gate.ac.uk/sale/tao/index.html#annie (Accessed Feb.8,2004)
16 Kalina Bontchevaetc.,Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content.Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries(ECDL’2002),Rome, September2002.http://gate.ac.uk/sale/ecdl02/ecdl.pdf (Accessed Feb.8,2004)
17 Reinoso-Castillo,J.(2002).Ontolgy-Driven Information Extraction and Integration from Autonomous ,Heterogeneous,Distributed Data Sources--AFederatedQuery-Centric Approach.Masters Thesis.Artificial Intelligence Research Laboratory.Department of Computer Science.Iowa State University
(Accessed Feb.8,2004)
18 Vasant Honavaretc.,Ontology-Driven Information Extraction and Knowledge Acquisition from Heterogeneous,Distributed,Autonomous Biological Data Sources.http://www.cs.iastate.edu/~honavar/Papers/ijcaiworkshoppaper.pdf (AccessedFeb.8,2004)
19 Roazhon,Information Extraction:from unstructured texts to knowledge bases,http://tim.irisa.fr/veille/text-mining/thales.ppt (Accessed Feb.8,2004)
20 Rohini Srihariand WeiLi,Information Extraction Supported Question Answering.http://trec.nist.gov/pubs/trec8/papers/cymfony.pdf (Accessed Feb.8,2004)
Viewed
Full text
Abstract
Cited
Shared
Discussed