Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (1): 43-50    DOI: 10.11925/infotech.1003-3513.2014.01.07
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
领域本体术语抽取研究*
汤青1, 吕学强1, 2, 李卓1, 施水才1, 2
1北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101; 2北京拓尔思信息技术股份有限公司 北京 100101
Research on Domain Ontology Term Extraction
Tang Qing1, Lv Xueqiang1, 2, Li Zhuo1, Shi Shuicai1, 2
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China; 2Beijing TRS Information Technology Co.Ltd.,Beijing 100101,China
全文: PDF(608 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 【目的】尽可能多地抽取多字词本体术语,以保证本体构建的质量。【方法】提出基于部件扩展的本体术语抽取方法。利用部件的领域聚合性和词性特征,采用领域词频比较的方法抽取部件;考虑术语长度、术语词性构成以及术语内部结合度等因素,设计合理的扩展规则对部件扩展以形成候选术语;利用上下文关联信息、语境信息从候选术语集中筛选出本体术语。【结果】利用该方法在IT领域实验数据集上进行测试,实验结果准确率为83.5%,召回率为87%,准确率相比Baseline方法要高出2.5个百分点。【局限】部件抽取方法需要借助于平衡语料库,部件的质量直接影响术语抽取效果。【结论】实验结果表明该方法是有效的,对本体学习、本体构建具有积极意义。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
汤青
吕学强
李卓
施水才
关键词 本体术语术语抽取术语部件部件扩展    
Abstract:[Objective] Ontology terms are extracted as more as possible for the quality of Ontology construction. [Methods] This paper proposes an Ontology term extraction method based on term component extension. It uses the polymerization characteristics and POS features of the terms,extracts term components by word frequency comparison approach. Considering the factors of term length,term POS and term internal associative strength of character strings,reasonable extended rules are designed for components extension to get the candidate terms. Then,Ontology terms are filtered from candidate terms by using the relational information and the contextual information. [Results] Experimental result shows that accuracy rate is 83.5%,the recall rate is 87%,the accuracy rate is 2.5 percentages over the baseline. [Limitations] It needs a balanced corpus to extract term component,and term extracting effect is effected by the quality of the term. [Conclusions] The method is effective and has a positive significance for Ontology learning and Ontology construction etc.
Key wordsOntology term    Term extraction    Term component    Component extension
收稿日期: 2014-02-14     
:  TP391.1  
基金资助:本文系国家自然科学基金项目“基于本体的专利自动标引研究”(项目编号:61271304)和北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目“面向领域的互联网多模态信息精准搜索方法研究”(项目编号:KZ201311232037)的研究成果之一。
通讯作者: 通讯作者 汤青 E-mail:tangqing20062008@126.com   
作者简介: 作者贡献声明:汤青:提出研究思路,设计研究方案和完成实验,论文的起草、撰写;吕学强,李卓:负责设计论文框架和论文的修改;施水才:提出研究课题,负责论文的修订工作。
引用本文:   
汤青,吕学强,李卓,施水才,. 领域本体术语抽取研究*[J]. 现代图书情报技术, 2014, 30(1): 43-50.
Tang Qing,Lv Xueqiang,Li Zhuo,Shi Shuicai,. Research on Domain Ontology Term Extraction. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2014.01.07.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.01.07
[1] Gruber T R.A Translation Approach to Portable Ontology Specifications [J].Knowledge Acquisition,1993,5(2):199-220.
[2]中国国家标准化管理委员会.GB/T 19101-2003,建立术语语料库的一般原则与方法[S]. 北京:中国标准出版社,2003:1-4.(Standardization Administration of the People’s Re- public of China.GB/T 19101-2003,General Principles and Methods of Establishing Terminology Corpus[S]. Beijing:China Zhijian Publishing House,2003:1-4.)
[3]Chambers N,Jurafsky D.Template-based Information Extraction without the Templates [C]. In:Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics(HLT’11). Stroudsburg:Association for Comp- utational Linguistics,2011:976-986.
[4]韦小丽,孙涌,张书奎,等.基于最大熵模型的本体概念获取方法研究 [J]. 计算机工程,2009,35(24):114-116.(Wei Xiaoli,Sun Yong,Zhang Shukui,et al. Ontological Concept Extraction Method Based on Maximum Entropy Model [J]. Computer Engineering,2009,35(24):114-116.)
[5]游宏梁,张巍,沈钧毅,等.一种基于加权投票的术语自动识别方法[J]. 中文信息学报,2011,25(3):9-16.(You Hongliang,Zhang Wei,Shen Junyi,et al. A Weighted Voting Based Automatic Term Recognition Method[J]. Journal of Chinese Information Processing,2011,25(3):9-16.)
[6]Yang Y,Lu Q,Zhao T.A Delimiter-based General Approach for Chinese Term Extraction [J]. Journal of the American Society for Information Science and Technology,2010,61(1):111-125.
[7]章成志.基于多层术语度的一体化术语抽取研究[J]. 情报学报,2011,30(3):275-285.(Zhang Chengzhi.Using Integration Strategy and Multi-level Termhood to Extract Terminology [J]. Journal of the China Society for Scientific and Technical Information,2011,30(3):275-285.)
[8]Lee C,Huang C,Tang K,et al. Iterative Machine-Learning Chinese Term Extraction [C]. In:Proceedings of the 14th International Conference on Asia-Pacific Digital Libraries. 2012:309-312.
[9]王卫民,贺冬春,符建辉.基于种子扩充的专业术语识别方法研究[J]. 计算机应用研究,2012,29(11):4105-4107.(Wang Weimin,He Dongchun,Fu Jianhui. Research of Professional Term Identification Method Based on Seed Expansion[J]. Application Research of Computers,2012,29(11):4105-4107.)
[10]吴云芳,穗志方,邱利坤,等.信息科学与技术领域术语部件描述[J]. 语言文字应用,2003(4):34-39.(Wu Yunfang,Sui Zhifang,Qiu Likun,et al. The Approaches and Strategies to Describe the Term Component in Information Science and Technology [J]. Applied Linguistics,2003(4):34-39.)
[11]中国国家标准化管理委员会.GB/T 19102-2003,术语部件库的信息描述规范[S]. 北京:中国标准出版社,2003:1-4.(Standardization Administration of the People’s Republic of China GB/T 19101-2003,Specification of Description of Term Component Database [S]. Beijing:China Zhijian Publishing House,2003:1-4.)
[12]冯志伟. 术语形成的经济律——FEL公式[J]. 中国科技术语,2010,12(2):9-15.(Feng Zhiwei.Economic Law of Term Formation——FEL Formula [J]. China Terminology,2010,12(2):9-15.)
[13]李萍,黄崇岭. IT领域的专业术语构词特点及功能意义[J].桂林电子工业学院学报,2004,24(2):48-51.(Li Ping,Huang Chongling. The Morphological Formation and Functional Significance of Technical Term in IT Field [J]. Journal of Guilin University of Electronic Technology,2004,24(2):48-51.)
[14]陈士超,郁滨. 面向术语抽取的双阈值互信息过滤方法[J].计算机应用,2011,31(4):1070-1073.(Chen Shichao,Yu Bin. Method of Mutual Information Filtration with Dual-threshold for Term Extraction[J]. Journal of Computer Applications,2011,31(4):1070-1073.)
[15]Page L,Brin S,Motwani R,et al. The PageRank Citation Ranking:Bringing Order to the Web[R]. Stanford InfoLab,1999.
[16]Resnik P. Using Information Content to Evaluate Semantic Similarity [C]. In:Proceedings of the 14th International Joint Conference on Artificial Intelligence(IJCAI’95). San Francisco:Morgan Kaufmann Publishers Inc.,1995:448- 453.
[17]Tan P,Steinbach M,Kumar V. Introduction to Data Mining [M]. Addison-Wesley,2005.
[18]何琳.基于多策略的领域本体术语抽取研究[J]. 情报学报,2012,31(8):798-804.(He Lin. Domain Ontology Terminology Extraction Based on Integrated Strategy Method [J]. Journal of the China Society for Scientific and Technical Information,2012,31(8):798-804.)
[1] 王密平,王昊,邓三鸿,吴志祥. 基于CRFs的冶金领域中文专利术语抽取研究*[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[2] 姜霖,王东波. 采用连续词袋模型(CBOW)的领域术语自动抽取研究*[J]. 现代图书情报技术, 2016, 32(2): 9-15.
[3] 何宇, 吕学强, 徐丽萍. 新能源汽车领域中文术语抽取方法[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[4] 张杰, 张海超, 翟东升. 面向中文专利权利要求书的分词方法研究[J]. 现代图书情报技术, 2014, 30(9): 91-98.
[5] 唐守利, 徐宝祥. 基于本体的云服务语义检索系统研究[J]. 现代图书情报技术, 2014, 30(12): 27-35.
[6] 熊李艳, 谭龙, 钟茂生. 基于有效词频的改进C-value自动术语抽取方法[J]. 现代图书情报技术, 2013, 29(9): 54-59.
[7] 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013, (6): 68-75.
[8] 胡阿沛, 张静, 刘俊丽. 基于改进C-value方法的中文术语抽取[J]. 现代图书情报技术, 2013, 29(2): 24-29.
[9] 李振清, 刘建毅, 王枞, 吴旭. 同行评议专家遴选系统研究与实现[J]. 现代图书情报技术, 2012, 28(5): 81-86.
[10] 康小丽, 章成志. 用于双语术语抽取的专业领域中英文可比语料库构建[J]. 现代图书情报技术, 2012, 28(2): 28-33.
[11] 许德山, 张智雄, 王峰, 邢美凤. 上下文分析与统计特征相结合的英文术语抽取研究[J]. 现代图书情报技术, 2010, 26(12): 28-33.
[12] 康小丽,章成志,王惠临. 基于可比语料库的双语术语抽取研究述评*[J]. 现代图书情报技术, 2009, (10): 7-13.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn