Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (8): 2-11    DOI: 10.11925/infotech.1003-3513.2008.08.01
article Current Issue | Archive | Adv Search |
Analysis of State-of-the-Art Knowledge Extraction Technologies
Zhang ZhixiongWu Zhenxin1   Liu Jianhua 1,2  Xu Jian 1,2,3  Hong Na 1,2  Zhao Qi 1,2
1(National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
2(Graduate University of the Chinese Academy of Sciences, Beijing 100049, China)
3(Department of Information Management,Sun Yat-Sen University, Guangzhou 510275, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

 Based on the analysis of some state-of-the-art knowledge extraction systems, i.e., MnM, KIM, Text2Onto, Amilcare and Melita, it brings forward that two kinds of technologies, i.e., machine learning and natural language analysis, are developed respectively and get benefits from the inter-reference. On machine learning aspect, some new methods, such as Adaptive Information Extraction, Open Information Extraction, are put forward and have a trend toward Ontology Learning. On nature language analysis aspect, the methods of Pattern-Based Annotation and Semantic Annotation get more attention than ever, and have a trend toward Ontology Based Information Extraction. Besides, Controlled Language Information Extraction method is introduced to reduce the cost of Ontology Construction and allow non-specialists to create or edit ontological data using simple nature language.

Key wordsKnowledge extraction      Machine learning      Nature language analysis      Ontology     
Received: 16 June 2008      Published: 25 August 2008
: 

G250.73

 
Corresponding Authors: Zhang Zhixiong     E-mail: zhangzhx@mail.las.ac.cn
About author:: Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Xu Jian,Hong Na,Zhao Qi

Cite this article:

Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Xu Jian,Hong Na,Zhao Qi. Analysis of State-of-the-Art Knowledge Extraction Technologies. New Technology of Library and Information Service, 2008, 24(8): 2-11.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.08.01     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I8/2

[1] Special Session on Signal Processing Techniques for Knowledge Extraction and Information Fusion in Frame of KES2006 [EB/OL].[2008-06-01]. http://www.bsp.brain.riken.jp/kes2006session/.
[2] X-Media Project [EB/OL].[2008-06-01]. http://www.x-media-project.org.
[3] K-space Project, Knowledge Space of Semantic Inference of Automatic Annotation and Retrieval of Multimedia Content [EB/OL].[2008-06-01]. http://kspace.qmul.net:8080/kspace/index.jsp.
[4] Geoffrey I. Webb. Discovering Significant Patterns [J]. Machine Learning, 2007, 68(1):1-33.
[5] Alani H, Kim S, Millard D E, Weal M J, Lewis P H, Hall W, Shadbolt N R. Automatic Extraction of Knowledge from Web Documents [C]. In:2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, October 20-23, 2003, Sanibel Island, Florida, USA.
[6] Martin Rajman, Romaric Besancon. Text Mining - Knowledge Extraction from Unstructured Textual Data [C/OL]. [2008-06-01].http://liawww.epfl.ch/Publications/Archive/RajmanBesancon98a.pdf.
[7] AKT Project[EB/OL]. [2008-06-01]. http://www.aktors.org/akt/.
[8] CLEF:Clinical e-Science Framework [EB/OL].[2008-06-01]. http://www.clinical-escience.org/.
[9] SEKT Project [EB/OL].[2008-06-01]. http://www.sekt-project.com/.
[10] Dot.Kom Project [EB/OL].[2008-06-01].http://nlp.shef.ac.uk/dot.kom/.
[11] DELOS Project [EB/OL].[2008-06-01]. http://www.delos.info/.
[12] OpenKnowledge [EB/OL].[2008-06-01]. http://openk.org/.
[13] KnowItAll [EB/OL].[2008-06-01]. http://www.cs.washington.edu/research/knowitall/.
[14] Project HALO [EB/OL].[2008-06-01]. http://www.projecthalo.com/.
[15] Rapid Knowledge Formation Project [EB/OL].[2008-06-01]. http://projects.teknowledge.com/RKF/.
[16] Knowledge Extraction from Document Collections [EB/OL].[2008-06-01]. http://www.parc.com/research/projects/knowledge_extraction/.
[17] Vargas-Vera M, Motta E, Domingue J, Lanzoni M, Stutt A, Ciravegna F. MnM:Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup [C]. In:The 13th International Conference on Knowledge Engineering and Management (EKAW 2002). Springer Verlag Heidelberg, 2002.
[18] KIM Platform [EB/OL].[2008-06-01]. http://www.ontotext.com/kim/index.html.
[19] ArtEquAKT[EB/OL].[2008-06-01]. http://www.artequakt.ecs.soton.ac.uk/.
[20] Text2Onto[EB/OL].[2008-06-01]. http://onto-ware.org/projects/text2onto/.
[21] PowerMagpie [EB/OL].[2008-06-01]. http://powermagpie.open.ac.uk/.
[22] Amilcare [EB/OL].[2008-06-01]. http://nlp.shef.ac.uk/amilcare/.
[23] Ciravegna F. Adaptive Information Extraction from Text by Rule Induction and Generalisation [C]. In:Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle. 2001.
[24] Melita[EB/OL].[2008-06-01]. http://nlp.shef.ac.uk/melita/.
[25] Ciravegna F,  Dingliand A, Petrelli D. Active Document Enrichment Using Adaptive Information Extraction from Text[C]. In:1st International Semantic Web Conference (ISWC 2002), June 9-12th, 2002, Sardinia, Italia.
[26] Handschuh S,  Staab S,  Ciravegna F. S-CREAM-Semi-automatic CREAtion of Metadata [C]. In:Processdings of the European Conference on Knowledge Acquisition and Management(EKAW02), Springer, 2002.
[27] Banko M, Cafarella M J,  Soderland S,  Broadhead M, Etzioni O.  Open Information Extraction from the Web [C].In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007).
[28] TextRunner [EB/OL].[2008-06-01]. http://www.cs.washington.edu/research/textrunner/.
[29] Yates A. Information Extraction from the Web:Techniques and Applications[D/OL]. [2008-06-01]. http://turing.cs.washington.edu/papers/yates_dissertation.pdf.
[30] Klein D, Manning C D. Accurate Unlexicalized parsing [C/OL]. In:Proceedings of the ACL, 2003. [2008-06-01].http://www.cs.berkeley.edu/~klein/papers/unlexicalized-parsing.pdf.
[31] The OpenNLP Home [EB/OL].[2008-06-01].http://opennlp.sourceforge.net/projects.html
[32] Sunción Gómez-Pérez A,  Manzano-Macho D. A survey of ontology learning methods and Techniques[R]. OntoWeb Deliverable D1.5, 2003,6.
[33] Cimiano P. Ontology Learning and Population from Text:Algorithms, Evaluation and Applications[M]. Springer US,2006:19-34.
[34] Buitelaar P, Cimiano P,  Grobelnik M. Ontology Learning from Text [C]. In:the ECML/PKDD 2005 Workshop on:Knowledge Discovery and Ontologies, 2005.
[35] Buitelaar P,  Cimiano P, Magnini B. Ontology Learning from Text: An Overview[M]. IOS Press, 2003.
[36] KAON - The KArlsruhe ONtology and Semantic Web tool suite [EB/OL].[2008-06-01]. http://kaon.semanticweb.org/.
[37] Cimiano P, Vlker J. Text2Onto-A Framework for Ontology Learning and Date-driven Change Discovery[C]. In:Proceedings of NLDB05,June 2005.
[38] OntoLT [EB/OL].[2008-06-01].http://olp.dfki.de/OntoLT/OntoLT.htm.
[39] OntoBuilder [EB/OL].[2008-06-01]. http://iew3.technion.ac.il/OntoBuilder/.
[40] Reeve L,Han H. Survey of Semantic Annotation Platforms[C]. In:Proceedings of the 2005 ACM symposium on Applied computing, New York:ACM Press, 2005:1634-1638.
[41] Sergey Brin, Extracting Patterns and Relations from the World Wide Web[C]. In:WebDB Workshop at 6th International Conference on Extending Database Technology, 1998.
[42] Armadillo [EB/OL].[2008-06-01]. http://www.dcs.shef.ac.uk/~sam/armadillo.html.
[43] Ciravegna F,  Chapman S, Dingli A, Wilks Y. Learning to Harvest Information for the Semantic Web[C]. In:Proceedings of the 1st European Semantic Web Symposium, Heraklion, Greece, May 10-12, 2004.
[44] Cimiano P, Handschuh S, Staab S. Towards the Self-Annotating Web [C]. In:Proceedings of the 13th WWW Conference,  ACM, New York, 2004:462-471.
[45] Cimiano P, Ladwig G, Staab S. Gimme’ the Context:Context-driven Automatic Semantic Annotation with C-PANKOW [C]. In:Proceedings of the 14th International Conference on World Wide Web, New York:ACM Press, 2005:332-341.
[46] Hearst M A. Automatic Acquisition of Hyponyms from Large Text Corpora.[C/OL].In:Proceedings of the 14th International Conference on Computational Linguistics. [2008-06-01].http://acl.ldc.upenn.edu/C/C92/C92-2082.pdf.
[47] OntoMat-Annotizer [EB/OL]. [2008-06-01]. http://annotation.semanticweb.org/ontomat/index.html.
[48] Dzbor M, Motta E. Study on Integrating Semantic Applications with Magpie[C]. In:15th Conf. on AI Methodology, Systems & Applications (AIMSA), Varna, Bulgaria. 2006.
[49] Laclavik M, Seleng M, Babik M. OnTeA:Semi-automatic Ontology Based Text Annotation Method[C].ITAT 2006, NAZOU Workshop, 26.9 - 1.10. 2006, Chata Kosodrevina, Bystrá dolina, Nízke Tatry, 2006.
[50] Laclavik M, Gatial E, Balogh Z, Habala O, Nguyen G, Hluchy L. Semantic Annotation Based on Regular Expressions[C]. ITAT 2005, 20-25 September 2005, Hotel Akademik, Rackova dolina, In:Proceedings of ITAT 2005 Information Technologies - Applications and Theory, Peter Vojtas (Ed.), Prirodovedecka Fakulta Univerzity Pavla Jozefa Safarika v Kosiciach. Slovakia, September 2005:305-306.
[51] Kiryakov A, Popov B, Terziev I,  Manov D, Ognyanoff D. Semantic Annotation, Indexing, and Retrieval [C].Elsevier’s Journal of Web Semantics, 2003:484-499.
[52] Staab S,  Maedche A, Handschuh S. An Annotation Framework for the Semantic Web [C]. In:Proceedings of the First Workshop on Multimedia Annotation, Tokyo, Japan, January 30-31, 2001.
[53] Popov B, Kiryakov A, Ognyanoff D, Manov D, Kirilov A, Goranov M. Towards Semantic Web Information Extraction[C]. In:Human Language Technologies Workshop at the 2nd International Semantic Web Conference (ISWC2003),  October 20, 2003, Florida, USA.
[54] Manov D, Popov B. Massive Automatic Annotation[EB/OL]. [2008-06-01]. http://www.sekt-project.org/rd/deliverables/wp02/sekt-d-2-6-1-Massive%20Automatic%20Annotation%20V1.pdf.
[55] Alami H, Kim S, Millard D E, et al. Automatic Ontology-based Knowledge Extraction from Web Document[J].IEEE Intelligent Systems, 2003,18(1):14-21.
[56] Embley D W, Campbell D M, Smith R D, Liddle S W. Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents[EB/OL].[2008-06-01]. http://pages.cs.wisc.edu/~smithr/pubs/cikm98.pdf.
[57] Saggion H, Funk A, Maynard D, Bontcheva K. Ontology-based Information Extraction for Business Intelligence [EB/OL]. [2008-06-01]. http://iswc2007.semanticweb.org/papers/837.pdf.
[58] GATE. General Architecture for Text Engineering[EB/OL].[2008-06-01]. http://gate.ac.uk/.
[59] Dill S, Eiron N, Gibson D, et al. SemTag and Seeker:Bootstrapping the Semantic Web Via Automated Semantic Annotation [C]. In:Proc. of the 12th Intl. WWW Conf. 2003. Hungary:ACM Press.
[60] Luke K. McDowell, M.C. Ontology-driven Information Extraction with OntoSyphon[EB/OL]. The 5th International Semantic Web Conference(2006). [2008-06-01]. http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf.
[61] Yildiz B, Miksch S. OntoX - A Method for Ontology-Driven Information Extraction[C]. Computational Science and Its Applications - ICCSA 2007, Springer-Verlag, LNCS 4707.
[62] Funk A,  Davis B,  Tablan V,  Bontcheva K,  Cunningham H. Controlled Language IE Components Version 2. SEKT Project Deliverable D2.2.2. January 2007 [EB/OL].[2008-06-01]. http://gate.ac.uk/projects/sekt/deliv2-2-2.pdf.
[63] Líon Project[EB/OL].[2008-06-11]. http://www.deri.ie/research/projects/.

[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] Shi Xiang,Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[9] Sheng Shu, Huang Qi, Yang Yang, Xie Qiwen, Qin Xinguo. Exchanging Chinese Medical Information Based on HL7 FHIR[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[10] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[11] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[12] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[13] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[14] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[15] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn