Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (1): 43-50    DOI: 10.11925/infotech.1003-3513.2014.01.07
KNOWLEDGE ORGANIZATION AND KNOWLEDGE MANAGEMENT Current Issue | Archive | Adv Search |
Research on Domain Ontology Term Extraction
Tang Qing1, Lv Xueqiang1, 2, Li Zhuo1, Shi Shuicai1, 2
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China; 2Beijing TRS Information Technology Co.Ltd.,Beijing 100101,China
Download: PDF(608 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  [Objective] Ontology terms are extracted as more as possible for the quality of Ontology construction. [Methods] This paper proposes an Ontology term extraction method based on term component extension. It uses the polymerization characteristics and POS features of the terms,extracts term components by word frequency comparison approach. Considering the factors of term length,term POS and term internal associative strength of character strings,reasonable extended rules are designed for components extension to get the candidate terms. Then,Ontology terms are filtered from candidate terms by using the relational information and the contextual information. [Results] Experimental result shows that accuracy rate is 83.5%,the recall rate is 87%,the accuracy rate is 2.5 percentages over the baseline. [Limitations] It needs a balanced corpus to extract term component,and term extracting effect is effected by the quality of the term. [Conclusions] The method is effective and has a positive significance for Ontology learning and Ontology construction etc.
Key wordsOntology term      Term extraction      Term component      Component extension     
Received: 14 February 2014      Published: 14 February 2014
:  TP391.1  

Cite this article:

Tang Qing,Lv Xueqiang,Li Zhuo,Shi Shuicai,. Research on Domain Ontology Term Extraction. New Technology of Library and Information Service, 2014, 30(1): 43-50.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.01.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I1/43

[1] Gruber T R.A Translation Approach to Portable Ontology Specifications [J].Knowledge Acquisition,1993,5(2):199-220.
[2]中国国家标准化管理委员会.GB/T 19101-2003,建立术语语料库的一般原则与方法[S]. 北京:中国标准出版社,2003:1-4.(Standardization Administration of the People’s Re- public of China.GB/T 19101-2003,General Principles and Methods of Establishing Terminology Corpus[S]. Beijing:China Zhijian Publishing House,2003:1-4.)
[3]Chambers N,Jurafsky D.Template-based Information Extraction without the Templates [C]. In:Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics(HLT’11). Stroudsburg:Association for Comp- utational Linguistics,2011:976-986.
[4]韦小丽,孙涌,张书奎,等.基于最大熵模型的本体概念获取方法研究 [J]. 计算机工程,2009,35(24):114-116.(Wei Xiaoli,Sun Yong,Zhang Shukui,et al. Ontological Concept Extraction Method Based on Maximum Entropy Model [J]. Computer Engineering,2009,35(24):114-116.)
[5]游宏梁,张巍,沈钧毅,等.一种基于加权投票的术语自动识别方法[J]. 中文信息学报,2011,25(3):9-16.(You Hongliang,Zhang Wei,Shen Junyi,et al. A Weighted Voting Based Automatic Term Recognition Method[J]. Journal of Chinese Information Processing,2011,25(3):9-16.)
[6]Yang Y,Lu Q,Zhao T.A Delimiter-based General Approach for Chinese Term Extraction [J]. Journal of the American Society for Information Science and Technology,2010,61(1):111-125.
[7]章成志.基于多层术语度的一体化术语抽取研究[J]. 情报学报,2011,30(3):275-285.(Zhang Chengzhi.Using Integration Strategy and Multi-level Termhood to Extract Terminology [J]. Journal of the China Society for Scientific and Technical Information,2011,30(3):275-285.)
[8]Lee C,Huang C,Tang K,et al. Iterative Machine-Learning Chinese Term Extraction [C]. In:Proceedings of the 14th International Conference on Asia-Pacific Digital Libraries. 2012:309-312.
[9]王卫民,贺冬春,符建辉.基于种子扩充的专业术语识别方法研究[J]. 计算机应用研究,2012,29(11):4105-4107.(Wang Weimin,He Dongchun,Fu Jianhui. Research of Professional Term Identification Method Based on Seed Expansion[J]. Application Research of Computers,2012,29(11):4105-4107.)
[10]吴云芳,穗志方,邱利坤,等.信息科学与技术领域术语部件描述[J]. 语言文字应用,2003(4):34-39.(Wu Yunfang,Sui Zhifang,Qiu Likun,et al. The Approaches and Strategies to Describe the Term Component in Information Science and Technology [J]. Applied Linguistics,2003(4):34-39.)
[11]中国国家标准化管理委员会.GB/T 19102-2003,术语部件库的信息描述规范[S]. 北京:中国标准出版社,2003:1-4.(Standardization Administration of the People’s Republic of China GB/T 19101-2003,Specification of Description of Term Component Database [S]. Beijing:China Zhijian Publishing House,2003:1-4.)
[12]冯志伟. 术语形成的经济律——FEL公式[J]. 中国科技术语,2010,12(2):9-15.(Feng Zhiwei.Economic Law of Term Formation——FEL Formula [J]. China Terminology,2010,12(2):9-15.)
[13]李萍,黄崇岭. IT领域的专业术语构词特点及功能意义[J].桂林电子工业学院学报,2004,24(2):48-51.(Li Ping,Huang Chongling. The Morphological Formation and Functional Significance of Technical Term in IT Field [J]. Journal of Guilin University of Electronic Technology,2004,24(2):48-51.)
[14]陈士超,郁滨. 面向术语抽取的双阈值互信息过滤方法[J].计算机应用,2011,31(4):1070-1073.(Chen Shichao,Yu Bin. Method of Mutual Information Filtration with Dual-threshold for Term Extraction[J]. Journal of Computer Applications,2011,31(4):1070-1073.)
[15]Page L,Brin S,Motwani R,et al. The PageRank Citation Ranking:Bringing Order to the Web[R]. Stanford InfoLab,1999.
[16]Resnik P. Using Information Content to Evaluate Semantic Similarity [C]. In:Proceedings of the 14th International Joint Conference on Artificial Intelligence(IJCAI’95). San Francisco:Morgan Kaufmann Publishers Inc.,1995:448- 453.
[17]Tan P,Steinbach M,Kumar V. Introduction to Data Mining [M]. Addison-Wesley,2005.
[18]何琳.基于多策略的领域本体术语抽取研究[J]. 情报学报,2012,31(8):798-804.(He Lin. Domain Ontology Terminology Extraction Based on Integrated Strategy Method [J]. Journal of the China Society for Scientific and Technical Information,2012,31(8):798-804.)
[1] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[2] Tang Shouli, Xu Baoxiang. Research on Ontology-based Cloud Services Semantic Retrieval System[J]. 现代图书情报技术, 2014, 30(12): 27-35.
[3] Xiong Liyan, Tan Long, Zhong Maosheng. An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency[J]. 现代图书情报技术, 2013, 29(9): 54-59.
[4] Hu Apei, Zhang Jing, Liu Junli. Chinese Term Extraction Based on Improved C-value Method[J]. 现代图书情报技术, 2013, 29(2): 24-29.
[5] Li Zhenqing, Liu Jianyi, Wang Cong, Wu Xu. Research and Implementation of Peer-review Experts Selection System[J]. 现代图书情报技术, 2012, 28(5): 81-86.
[6] Wang Hao . Named Entity Extraction Model Based on Hierarchical Pattern Matching[J]. 现代图书情报技术, 2007, 2(5): 62-68.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn