当前知识抽取的主要技术方法解析*

doi:10.11925/infotech.1003-3513.2008.08.01

现代图书情报技术

2008, Vol. 24

Issue (8): 2-11 https://doi.org/10.11925/infotech.1003-3513.2008.08.01

专题

本期目录 | 过刊浏览 | 高级检索

当前知识抽取的主要技术方法解析*

张智雄¹吴振新¹刘建华^1,2徐健^1,2,3洪娜 ^1,2赵琦^1,2

¹（中国科学院国家科学图书馆北京 100190）
²（中国科学院研究生院北京 100049）
³（中山大学资讯管理系广州 510275）

Analysis of State-of-the-Art Knowledge Extraction Technologies

Zhang Zhixiong¹Wu Zhenxin¹Liu Jianhua ^1,2Xu Jian ^1,2,3Hong Na ^1,2Zhao Qi ^1,2

¹(National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
²(Graduate University of the Chinese Academy of Sciences, Beijing 100049, China）
³(Department of Information Management，Sun Yat-Sen University, Guangzhou 510275, China）

摘要
参考文献
相关文章
Metrics

全文: PDF (620 KB)
输出: BibTeX | EndNote (RIS)

摘要

对MnM、KIM、Text2Onto、Amilcare、Melita等具有知识抽取功能的系统所应用的技术方法进行解析。提出在当前知识抽取技术中，机器学习和自然语言分析两大思路各自得到较大发展，并且在相互融合、相互借鉴中受益。在基于机器学习的知识抽取方面，出现以自适应信息抽取（Adaptive IE）、开放信息抽取（Open IE）为代表的新思路，并且有向自动本体学习（Ontology Learning）方向发展的趋势；在基于自然语言分析的知识抽取方面，基于模式标注、语义标注的方法得到广泛关注和进一步完善，并且有向基于Ontology的信息抽取（OBIE）方向发展的趋势。此外，为减少Ontology建设成本，让人们可以利用简单的自然语言构建Ontology，基于受控语言的信息抽取（CLIE）技术也得到一定的关注。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	徐健
	赵琦
	洪娜
	张智雄
	吴振新
	刘建华

关键词 ：知识抽取, 机器学习, 自然语言分析, 本体

Abstract：

Based on the analysis of some state-of-the-art knowledge extraction systems, i.e., MnM, KIM, Text2Onto, Amilcare and Melita, it brings forward that two kinds of technologies, i.e., machine learning and natural language analysis, are developed respectively and get benefits from the inter-reference. On machine learning aspect, some new methods, such as Adaptive Information Extraction, Open Information Extraction, are put forward and have a trend toward Ontology Learning. On nature language analysis aspect, the methods of Pattern-Based Annotation and Semantic Annotation get more attention than ever, and have a trend toward Ontology Based Information Extraction. Besides, Controlled Language Information Extraction method is introduced to reduce the cost of Ontology Construction and allow non-specialists to create or edit ontological data using simple nature language.

Key words： Knowledge extraction Machine learning Nature language analysis Ontology

收稿日期: 2008-06-16 出版日期: 2008-08-25

G250.73

基金资助:

*本文系国家社会科学基金项目“从数字信息资源中实现知识抽取的理论和方法研究”(项目编号：05BTQ006)的研究成果之一。

通讯作者: 张智雄 E-mail: zhangzhx@mail.las.ac.cn

作者简介: 张智雄,吴振新,刘建华,徐健,洪娜,赵琦

引用本文:

张智雄,吴振新,刘建华,徐健,洪娜,赵琦. 当前知识抽取的主要技术方法解析*[J]. 现代图书情报技术, 2008, 24(8): 2-11.
Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Xu Jian,Hong Na,Zhao Qi. Analysis of State-of-the-Art Knowledge Extraction Technologies. New Technology of Library and Information Service, 2008, 24(8): 2-11.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2008.08.01 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2008/V24/I8/2

［1］ Special Session on Signal Processing Techniques for Knowledge Extraction and Information Fusion in Frame of KES2006 ［EB/OL］.［2008-06-01］. http://www.bsp.brain.riken.jp/kes2006session/.
［2］ X-Media Project ［EB/OL］.［2008-06-01］. http://www.x-media-project.org.
［3］ K-space Project, Knowledge Space of Semantic Inference of Automatic Annotation and Retrieval of Multimedia Content ［EB/OL］.［2008-06-01］. http://kspace.qmul.net:8080/kspace/index.jsp.
［4］ Geoffrey I. Webb. Discovering Significant Patterns ［J］. Machine Learning, 2007, 68(1):1-33.
［5］ Alani H, Kim S, Millard D E, Weal M J, Lewis P H, Hall W, Shadbolt N R. Automatic Extraction of Knowledge from Web Documents ［C］. In:2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, October 20-23, 2003, Sanibel Island, Florida, USA.
［6］ Martin Rajman, Romaric Besancon. Text Mining - Knowledge Extraction from Unstructured Textual Data ［C/OL］. ［2008-06-01］.http://liawww.epfl.ch/Publications/Archive/RajmanBesancon98a.pdf.
［7］ AKT Project［EB/OL］. ［2008-06-01］. http://www.aktors.org/akt/.
［8］ CLEF:Clinical e-Science Framework ［EB/OL］.［2008-06-01］. http://www.clinical-escience.org/.
［9］ SEKT Project ［EB/OL］.［2008-06-01］. http://www.sekt-project.com/.
［10］ Dot.Kom Project ［EB/OL］.［2008-06-01］.http://nlp.shef.ac.uk/dot.kom/.
［11］ DELOS Project ［EB/OL］.［2008-06-01］. http://www.delos.info/.
［12］ OpenKnowledge ［EB/OL］.［2008-06-01］. http://openk.org/.
［13］ KnowItAll ［EB/OL］.［2008-06-01］. http://www.cs.washington.edu/research/knowitall/.
［14］ Project HALO ［EB/OL］.［2008-06-01］. http://www.projecthalo.com/.
［15］ Rapid Knowledge Formation Project ［EB/OL］.［2008-06-01］. http://projects.teknowledge.com/RKF/.
［16］ Knowledge Extraction from Document Collections ［EB/OL］.［2008-06-01］. http://www.parc.com/research/projects/knowledge_extraction/.
［17］ Vargas-Vera M, Motta E, Domingue J, Lanzoni M, Stutt A, Ciravegna F. MnM:Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup ［C］. In:The 13th International Conference on Knowledge Engineering and Management (EKAW 2002). Springer Verlag Heidelberg, 2002.
［18］ KIM Platform ［EB/OL］.［2008-06-01］. http://www.ontotext.com/kim/index.html.
［19］ ArtEquAKT［EB/OL］.［2008-06-01］. http://www.artequakt.ecs.soton.ac.uk/.
［20］ Text2Onto［EB/OL］.［2008-06-01］. http://onto-ware.org/projects/text2onto/.
［21］ PowerMagpie ［EB/OL］.［2008-06-01］. http://powermagpie.open.ac.uk/.
［22］ Amilcare ［EB/OL］.［2008-06-01］. http://nlp.shef.ac.uk/amilcare/.
［23］ Ciravegna F. Adaptive Information Extraction from Text by Rule Induction and Generalisation ［C］. In:Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle. 2001.
［24］ Melita［EB/OL］.［2008-06-01］. http://nlp.shef.ac.uk/melita/.
［25］ Ciravegna F, Dingliand A, Petrelli D. Active Document Enrichment Using Adaptive Information Extraction from Text［C］. In:1st International Semantic Web Conference (ISWC 2002), June 9-12th, 2002, Sardinia, Italia.
［26］ Handschuh S, Staab S, Ciravegna F. S-CREAM-Semi-automatic CREAtion of Metadata ［C］. In:Processdings of the European Conference on Knowledge Acquisition and Management(EKAW02), Springer, 2002.
［27］ Banko M, Cafarella M J, Soderland S, Broadhead M, Etzioni O. Open Information Extraction from the Web ［C］.In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007).
［28］ TextRunner ［EB/OL］.［2008-06-01］. http://www.cs.washington.edu/research/textrunner/.
［29］ Yates A. Information Extraction from the Web:Techniques and Applications［D/OL］. ［2008-06-01］. http://turing.cs.washington.edu/papers/yates_dissertation.pdf.
［30］ Klein D, Manning C D. Accurate Unlexicalized parsing ［C/OL］. In:Proceedings of the ACL, 2003. ［2008-06-01］.http://www.cs.berkeley.edu/~klein/papers/unlexicalized-parsing.pdf.
［31］ The OpenNLP Home ［EB/OL］.［2008-06-01］.http://opennlp.sourceforge.net/projects.html
［32］ Sunción Gómez-Pérez A, Manzano-Macho D. A survey of ontology learning methods and Techniques［R］. OntoWeb Deliverable D1.5, 2003,6.
［33］ Cimiano P. Ontology Learning and Population from Text:Algorithms, Evaluation and Applications［M］. Springer US,2006:19-34.
［34］ Buitelaar P, Cimiano P, Grobelnik M. Ontology Learning from Text ［C］. In:the ECML/PKDD 2005 Workshop on:Knowledge Discovery and Ontologies, 2005.
［35］ Buitelaar P, Cimiano P, Magnini B. Ontology Learning from Text: An Overview［M］. IOS Press, 2003.
［36］ KAON - The KArlsruhe ONtology and Semantic Web tool suite ［EB/OL］.［2008-06-01］. http://kaon.semanticweb.org/.
［37］ Cimiano P, Vlker J. Text2Onto-A Framework for Ontology Learning and Date-driven Change Discovery［C］. In:Proceedings of NLDB05,June 2005.
［38］ OntoLT ［EB/OL］.［2008-06-01］.http://olp.dfki.de/OntoLT/OntoLT.htm.
［39］ OntoBuilder ［EB/OL］.［2008-06-01］. http://iew3.technion.ac.il/OntoBuilder/.
［40］ Reeve L,Han H. Survey of Semantic Annotation Platforms［C］. In:Proceedings of the 2005 ACM symposium on Applied computing, New York:ACM Press, 2005:1634-1638.
［41］ Sergey Brin, Extracting Patterns and Relations from the World Wide Web［C］. In:WebDB Workshop at 6th International Conference on Extending Database Technology, 1998.
［42］ Armadillo ［EB/OL］.［2008-06-01］. http://www.dcs.shef.ac.uk/~sam/armadillo.html.
［43］ Ciravegna F, Chapman S, Dingli A, Wilks Y. Learning to Harvest Information for the Semantic Web［C］. In:Proceedings of the 1st European Semantic Web Symposium, Heraklion, Greece, May 10-12, 2004.
［44］ Cimiano P, Handschuh S, Staab S. Towards the Self-Annotating Web ［C］. In:Proceedings of the 13th WWW Conference, ACM, New York, 2004:462-471.
［45］ Cimiano P, Ladwig G, Staab S. Gimme’ the Context:Context-driven Automatic Semantic Annotation with C-PANKOW ［C］. In:Proceedings of the 14th International Conference on World Wide Web, New York:ACM Press, 2005:332-341.
［46］ Hearst M A. Automatic Acquisition of Hyponyms from Large Text Corpora.［C/OL］.In:Proceedings of the 14th International Conference on Computational Linguistics. ［2008-06-01］.http://acl.ldc.upenn.edu/C/C92/C92-2082.pdf.
［47］ OntoMat-Annotizer ［EB/OL］. ［2008-06-01］. http://annotation.semanticweb.org/ontomat/index.html.
［48］ Dzbor M, Motta E. Study on Integrating Semantic Applications with Magpie［C］. In:15th Conf. on AI Methodology, Systems & Applications (AIMSA), Varna, Bulgaria. 2006.
［49］ Laclavik M, Seleng M, Babik M. OnTeA:Semi-automatic Ontology Based Text Annotation Method［C］.ITAT 2006, NAZOU Workshop, 26.9 - 1.10. 2006, Chata Kosodrevina, Bystrá dolina, Nízke Tatry, 2006.
［50］ Laclavik M, Gatial E, Balogh Z, Habala O, Nguyen G, Hluchy L. Semantic Annotation Based on Regular Expressions［C］. ITAT 2005, 20-25 September 2005, Hotel Akademik, Rackova dolina, In:Proceedings of ITAT 2005 Information Technologies - Applications and Theory, Peter Vojtas (Ed.), Prirodovedecka Fakulta Univerzity Pavla Jozefa Safarika v Kosiciach. Slovakia, September 2005:305-306.
［51］ Kiryakov A, Popov B, Terziev I, Manov D, Ognyanoff D. Semantic Annotation, Indexing, and Retrieval ［C］.Elsevier’s Journal of Web Semantics, 2003:484-499.
［52］ Staab S, Maedche A, Handschuh S. An Annotation Framework for the Semantic Web ［C］. In:Proceedings of the First Workshop on Multimedia Annotation, Tokyo, Japan, January 30-31, 2001.
［53］ Popov B, Kiryakov A, Ognyanoff D, Manov D, Kirilov A, Goranov M. Towards Semantic Web Information Extraction［C］. In:Human Language Technologies Workshop at the 2nd International Semantic Web Conference (ISWC2003), October 20, 2003, Florida, USA.
［54］ Manov D, Popov B. Massive Automatic Annotation［EB/OL］. ［2008-06-01］. http://www.sekt-project.org/rd/deliverables/wp02/sekt-d-2-6-1-Massive%20Automatic%20Annotation%20V1.pdf.
［55］ Alami H, Kim S, Millard D E, et al. Automatic Ontology-based Knowledge Extraction from Web Document［J］.IEEE Intelligent Systems, 2003,18(1):14-21.
［56］ Embley D W, Campbell D M, Smith R D, Liddle S W. Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents［EB/OL］.［2008-06-01］. http://pages.cs.wisc.edu/~smithr/pubs/cikm98.pdf.
［57］ Saggion H, Funk A, Maynard D, Bontcheva K. Ontology-based Information Extraction for Business Intelligence ［EB/OL］. ［2008-06-01］. http://iswc2007.semanticweb.org/papers/837.pdf.
［58］ GATE. General Architecture for Text Engineering［EB/OL］.［2008-06-01］. http://gate.ac.uk/.
［59］ Dill S, Eiron N, Gibson D, et al. SemTag and Seeker:Bootstrapping the Semantic Web Via Automated Semantic Annotation ［C］. In:Proc. of the 12th Intl. WWW Conf. 2003. Hungary:ACM Press.
［60］ Luke K. McDowell, M.C. Ontology-driven Information Extraction with OntoSyphon［EB/OL］. The 5th International Semantic Web Conference(2006). ［2008-06-01］. http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf.
［61］ Yildiz B, Miksch S. OntoX - A Method for Ontology-Driven Information Extraction［C］. Computational Science and Its Applications - ICCSA 2007, Springer-Verlag, LNCS 4707.
［62］ Funk A, Davis B, Tablan V, Bontcheva K, Cunningham H. Controlled Language IE Components Version 2. SEKT Project Deliverable D2.2.2. January 2007 ［EB/OL］.［2008-06-01］. http://gate.ac.uk/projects/sekt/deliv2-2-2.pdf.
［63］ Líon Project［EB/OL］.［2008-06-11］. http://www.deri.ie/research/projects/.

[1]	王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2]	陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3]	车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4]	苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究^*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5]	曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型^*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6]	钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述^*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7]	向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 ^*[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8]	石湘,刘萍. *基于知识元语义描述模型的领域知识抽取与表示研究 ^——以信息检索领域为例**[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[9]	盛姝, 黄奇, 杨洋, 解绮雯, 秦新国. HL7 FHIR框架下中国医疗领域信息交换研究与解决方案[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[10]	柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[11]	曾桢,李纲,毛进,陈璟浩. 区域公共安全数据治理与业务领域本体研究^*[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[12]	陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 ^*[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[13]	梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[14]	杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[15]	王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究^*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.

Viewed

Full text

Abstract

Cited

Shared

Discussed