Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (8): 2-11    DOI: 10.11925/infotech.1003-3513.2008.08.01
article Current Issue | Archive | Adv Search |
Analysis of State-of-the-Art Knowledge Extraction Technologies
Zhang ZhixiongWu Zhenxin1   Liu Jianhua 1,2  Xu Jian 1,2,3  Hong Na 1,2  Zhao Qi 1,2
1(National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
2(Graduate University of the Chinese Academy of Sciences, Beijing 100049, China)
3(Department of Information Management,Sun Yat-Sen University, Guangzhou 510275, China)
Download: PDF(620 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

 Based on the analysis of some state-of-the-art knowledge extraction systems, i.e., MnM, KIM, Text2Onto, Amilcare and Melita, it brings forward that two kinds of technologies, i.e., machine learning and natural language analysis, are developed respectively and get benefits from the inter-reference. On machine learning aspect, some new methods, such as Adaptive Information Extraction, Open Information Extraction, are put forward and have a trend toward Ontology Learning. On nature language analysis aspect, the methods of Pattern-Based Annotation and Semantic Annotation get more attention than ever, and have a trend toward Ontology Based Information Extraction. Besides, Controlled Language Information Extraction method is introduced to reduce the cost of Ontology Construction and allow non-specialists to create or edit ontological data using simple nature language.

Key wordsKnowledge extraction      Machine learning      Nature language analysis      Ontology     
Received: 16 June 2008      Published: 25 August 2008
: 

G250.73

 
Corresponding Authors: Zhang Zhixiong     E-mail: zhangzhx@mail.las.ac.cn
About author:: Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Xu Jian,Hong Na,Zhao Qi

Cite this article:

Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Xu Jian,Hong Na,Zhao Qi. Analysis of State-of-the-Art Knowledge Extraction Technologies. New Technology of Library and Information Service, 2008, 24(8): 2-11.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.08.01     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I8/2

[1] Special Session on Signal Processing Techniques for Knowledge Extraction and Information Fusion in Frame of KES2006 [EB/OL].[2008-06-01]. http://www.bsp.brain.riken.jp/kes2006session/.
[2] X-Media Project [EB/OL].[2008-06-01]. http://www.x-media-project.org.
[3] K-space Project, Knowledge Space of Semantic Inference of Automatic Annotation and Retrieval of Multimedia Content [EB/OL].[2008-06-01]. http://kspace.qmul.net:8080/kspace/index.jsp.
[4] Geoffrey I. Webb. Discovering Significant Patterns [J]. Machine Learning, 2007, 68(1):1-33.
[5] Alani H, Kim S, Millard D E, Weal M J, Lewis P H, Hall W, Shadbolt N R. Automatic Extraction of Knowledge from Web Documents [C]. In:2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, October 20-23, 2003, Sanibel Island, Florida, USA.
[6] Martin Rajman, Romaric Besancon. Text Mining - Knowledge Extraction from Unstructured Textual Data [C/OL]. [2008-06-01].http://liawww.epfl.ch/Publications/Archive/RajmanBesancon98a.pdf.
[7] AKT Project[EB/OL]. [2008-06-01]. http://www.aktors.org/akt/.
[8] CLEF:Clinical e-Science Framework [EB/OL].[2008-06-01]. http://www.clinical-escience.org/.
[9] SEKT Project [EB/OL].[2008-06-01]. http://www.sekt-project.com/.
[10] Dot.Kom Project [EB/OL].[2008-06-01].http://nlp.shef.ac.uk/dot.kom/.
[11] DELOS Project [EB/OL].[2008-06-01]. http://www.delos.info/.
[12] OpenKnowledge [EB/OL].[2008-06-01]. http://openk.org/.
[13] KnowItAll [EB/OL].[2008-06-01]. http://www.cs.washington.edu/research/knowitall/.
[14] Project HALO [EB/OL].[2008-06-01]. http://www.projecthalo.com/.
[15] Rapid Knowledge Formation Project [EB/OL].[2008-06-01]. http://projects.teknowledge.com/RKF/.
[16] Knowledge Extraction from Document Collections [EB/OL].[2008-06-01]. http://www.parc.com/research/projects/knowledge_extraction/.
[17] Vargas-Vera M, Motta E, Domingue J, Lanzoni M, Stutt A, Ciravegna F. MnM:Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup [C]. In:The 13th International Conference on Knowledge Engineering and Management (EKAW 2002). Springer Verlag Heidelberg, 2002.
[18] KIM Platform [EB/OL].[2008-06-01]. http://www.ontotext.com/kim/index.html.
[19] ArtEquAKT[EB/OL].[2008-06-01]. http://www.artequakt.ecs.soton.ac.uk/.
[20] Text2Onto[EB/OL].[2008-06-01]. http://onto-ware.org/projects/text2onto/.
[21] PowerMagpie [EB/OL].[2008-06-01]. http://powermagpie.open.ac.uk/.
[22] Amilcare [EB/OL].[2008-06-01]. http://nlp.shef.ac.uk/amilcare/.
[23] Ciravegna F. Adaptive Information Extraction from Text by Rule Induction and Generalisation [C]. In:Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle. 2001.
[24] Melita[EB/OL].[2008-06-01]. http://nlp.shef.ac.uk/melita/.
[25] Ciravegna F,  Dingliand A, Petrelli D. Active Document Enrichment Using Adaptive Information Extraction from Text[C]. In:1st International Semantic Web Conference (ISWC 2002), June 9-12th, 2002, Sardinia, Italia.
[26] Handschuh S,  Staab S,  Ciravegna F. S-CREAM-Semi-automatic CREAtion of Metadata [C]. In:Processdings of the European Conference on Knowledge Acquisition and Management(EKAW02), Springer, 2002.
[27] Banko M, Cafarella M J,  Soderland S,  Broadhead M, Etzioni O.  Open Information Extraction from the Web [C].In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007).
[28] TextRunner [EB/OL].[2008-06-01]. http://www.cs.washington.edu/research/textrunner/.
[29] Yates A. Information Extraction from the Web:Techniques and Applications[D/OL]. [2008-06-01]. http://turing.cs.washington.edu/papers/yates_dissertation.pdf.
[30] Klein D, Manning C D. Accurate Unlexicalized parsing [C/OL]. In:Proceedings of the ACL, 2003. [2008-06-01].http://www.cs.berkeley.edu/~klein/papers/unlexicalized-parsing.pdf.
[31] The OpenNLP Home [EB/OL].[2008-06-01].http://opennlp.sourceforge.net/projects.html
[32] Sunción Gómez-Pérez A,  Manzano-Macho D. A survey of ontology learning methods and Techniques[R]. OntoWeb Deliverable D1.5, 2003,6.
[33] Cimiano P. Ontology Learning and Population from Text:Algorithms, Evaluation and Applications[M]. Springer US,2006:19-34.
[34] Buitelaar P, Cimiano P,  Grobelnik M. Ontology Learning from Text [C]. In:the ECML/PKDD 2005 Workshop on:Knowledge Discovery and Ontologies, 2005.
[35] Buitelaar P,  Cimiano P, Magnini B. Ontology Learning from Text: An Overview[M]. IOS Press, 2003.
[36] KAON - The KArlsruhe ONtology and Semantic Web tool suite [EB/OL].[2008-06-01]. http://kaon.semanticweb.org/.
[37] Cimiano P, Vlker J. Text2Onto-A Framework for Ontology Learning and Date-driven Change Discovery[C]. In:Proceedings of NLDB05,June 2005.
[38] OntoLT [EB/OL].[2008-06-01].http://olp.dfki.de/OntoLT/OntoLT.htm.
[39] OntoBuilder [EB/OL].[2008-06-01]. http://iew3.technion.ac.il/OntoBuilder/.
[40] Reeve L,Han H. Survey of Semantic Annotation Platforms[C]. In:Proceedings of the 2005 ACM symposium on Applied computing, New York:ACM Press, 2005:1634-1638.
[41] Sergey Brin, Extracting Patterns and Relations from the World Wide Web[C]. In:WebDB Workshop at 6th International Conference on Extending Database Technology, 1998.
[42] Armadillo [EB/OL].[2008-06-01]. http://www.dcs.shef.ac.uk/~sam/armadillo.html.
[43] Ciravegna F,  Chapman S, Dingli A, Wilks Y. Learning to Harvest Information for the Semantic Web[C]. In:Proceedings of the 1st European Semantic Web Symposium, Heraklion, Greece, May 10-12, 2004.
[44] Cimiano P, Handschuh S, Staab S. Towards the Self-Annotating Web [C]. In:Proceedings of the 13th WWW Conference,  ACM, New York, 2004:462-471.
[45] Cimiano P, Ladwig G, Staab S. Gimme’ the Context:Context-driven Automatic Semantic Annotation with C-PANKOW [C]. In:Proceedings of the 14th International Conference on World Wide Web, New York:ACM Press, 2005:332-341.
[46] Hearst M A. Automatic Acquisition of Hyponyms from Large Text Corpora.[C/OL].In:Proceedings of the 14th International Conference on Computational Linguistics. [2008-06-01].http://acl.ldc.upenn.edu/C/C92/C92-2082.pdf.
[47] OntoMat-Annotizer [EB/OL]. [2008-06-01]. http://annotation.semanticweb.org/ontomat/index.html.
[48] Dzbor M, Motta E. Study on Integrating Semantic Applications with Magpie[C]. In:15th Conf. on AI Methodology, Systems & Applications (AIMSA), Varna, Bulgaria. 2006.
[49] Laclavik M, Seleng M, Babik M. OnTeA:Semi-automatic Ontology Based Text Annotation Method[C].ITAT 2006, NAZOU Workshop, 26.9 - 1.10. 2006, Chata Kosodrevina, Bystrá dolina, Nízke Tatry, 2006.
[50] Laclavik M, Gatial E, Balogh Z, Habala O, Nguyen G, Hluchy L. Semantic Annotation Based on Regular Expressions[C]. ITAT 2005, 20-25 September 2005, Hotel Akademik, Rackova dolina, In:Proceedings of ITAT 2005 Information Technologies - Applications and Theory, Peter Vojtas (Ed.), Prirodovedecka Fakulta Univerzity Pavla Jozefa Safarika v Kosiciach. Slovakia, September 2005:305-306.
[51] Kiryakov A, Popov B, Terziev I,  Manov D, Ognyanoff D. Semantic Annotation, Indexing, and Retrieval [C].Elsevier’s Journal of Web Semantics, 2003:484-499.
[52] Staab S,  Maedche A, Handschuh S. An Annotation Framework for the Semantic Web [C]. In:Proceedings of the First Workshop on Multimedia Annotation, Tokyo, Japan, January 30-31, 2001.
[53] Popov B, Kiryakov A, Ognyanoff D, Manov D, Kirilov A, Goranov M. Towards Semantic Web Information Extraction[C]. In:Human Language Technologies Workshop at the 2nd International Semantic Web Conference (ISWC2003),  October 20, 2003, Florida, USA.
[54] Manov D, Popov B. Massive Automatic Annotation[EB/OL]. [2008-06-01]. http://www.sekt-project.org/rd/deliverables/wp02/sekt-d-2-6-1-Massive%20Automatic%20Annotation%20V1.pdf.
[55] Alami H, Kim S, Millard D E, et al. Automatic Ontology-based Knowledge Extraction from Web Document[J].IEEE Intelligent Systems, 2003,18(1):14-21.
[56] Embley D W, Campbell D M, Smith R D, Liddle S W. Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents[EB/OL].[2008-06-01]. http://pages.cs.wisc.edu/~smithr/pubs/cikm98.pdf.
[57] Saggion H, Funk A, Maynard D, Bontcheva K. Ontology-based Information Extraction for Business Intelligence [EB/OL]. [2008-06-01]. http://iswc2007.semanticweb.org/papers/837.pdf.
[58] GATE. General Architecture for Text Engineering[EB/OL].[2008-06-01]. http://gate.ac.uk/.
[59] Dill S, Eiron N, Gibson D, et al. SemTag and Seeker:Bootstrapping the Semantic Web Via Automated Semantic Annotation [C]. In:Proc. of the 12th Intl. WWW Conf. 2003. Hungary:ACM Press.
[60] Luke K. McDowell, M.C. Ontology-driven Information Extraction with OntoSyphon[EB/OL]. The 5th International Semantic Web Conference(2006). [2008-06-01]. http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf.
[61] Yildiz B, Miksch S. OntoX - A Method for Ontology-Driven Information Extraction[C]. Computational Science and Its Applications - ICCSA 2007, Springer-Verlag, LNCS 4707.
[62] Funk A,  Davis B,  Tablan V,  Bontcheva K,  Cunningham H. Controlled Language IE Components Version 2. SEKT Project Deliverable D2.2.2. January 2007 [EB/OL].[2008-06-01]. http://gate.ac.uk/projects/sekt/deliv2-2-2.pdf.
[63] Líon Project[EB/OL].[2008-06-11]. http://www.deri.ie/research/projects/.

[1] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[2] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[3] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[4] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[5] Guangshang Gao. A Survey of User Profiles Methods[J]. 数据分析与知识发现, 2019, 3(3): 25-35.
[6] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[7] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[8] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[9] Youshi He,Shufang He. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[10] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[11] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[12] Beibei Pang,Juanqiong Gou,Wenxin Mu. Extracting Topics and Their Relationship from College Student Mentoring[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
[13] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[14] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[15] Shengchun Ding,Menglu Liu,Zhu Fu. Unified Multidimensional Model Based on Knowledge Flow in Conceptual Design[J]. 数据分析与知识发现, 2018, 2(2): 11-19.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn