Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (10): 73-78    DOI: 10.11925/infotech.1003-3513.2013.10.12
  应用实践 本期目录 | 过刊浏览 | 高级检索 |
中文专利中本体关系获取研究
谷俊1, 许鑫2
1. 上海宝山钢铁股份有限公司 上海 201900;
2. 华东师范大学商学院信息学系 上海 200241
Study on Ontology Relation Extraction in Chinese Patent Documents
Gu Jun1, Xu Xin2
1. Baoshan Iron and Steel Co, Ltd., Shanghai 201900, China;
2. Department of Informatics, Business School, East China Normal University, Shanghai 200241, China
全文: PDF(570 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 介绍从中文专利摘要文本中抽取本体非分类关系的方法。首先对摘要文本的句法格式进行分析,按照“领域句式”、“特征句式”、“组件“工艺句式”和“效果句式”等构建子句抽取规则,再利用B、I、E和O等标注符号对子句中的术语进行人工标注,形成一定规模的训练语料集合,并利用CRFs实现训练语料的学习和新语料的抽取。最后给出应用实例并进行分析,验证方法的有效性。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
谷俊
许鑫
关键词 规则匹配条件随机场本体学习非分类关系抽取    
Abstract:This paper promotes a method which collects the non-taxonomic relation from the Chinese patents' texts. Firstly, it analyzes the syntax of abstract texts, then constructs the sub-sentences extraction rules by domain sentence,character sentence, module & craft sentence and effect sentence. Secondly, artificially labels the terms of sub-sentences by label symbols such as BIEO, creates a scale of training data set. Thirdly, learns the training data and extracts the new data by CRFs. Finally, analyzes the experiment results and verifies the validity of the method.
Key wordsRule matching    CRFs    Ontology learning    Non-taxonomic relation
收稿日期: 2013-07-19     
:  TP391  
基金资助:本文系上海市科技发展基金软科学研究项目“基于专利文献的本体构建与应用方法研究”(项目编号:13692107000)的研究成果之一。
通讯作者: 谷俊     E-mail: frjcygu@163.com
引用本文:   
谷俊, 许鑫. 中文专利中本体关系获取研究[J]. 现代图书情报技术, 2013, 29(10): 73-78.
Gu Jun, Xu Xin. Study on Ontology Relation Extraction in Chinese Patent Documents. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2013.10.12.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.10.12
[1] 邵波. 企业竞争与反竞争情报中的专利分析研究[J]. 情报科学, 2006, 24(2):235-238.(Shao Bo. The Patent Analysis of Enterprise Competitive Intelligence and Counterintelligence[J]. Information Science,2006,24(2):235-238.)
[2] 100004说明书摘要[EB/OL].[2012-02-13].http://www.sipo.gov.cn/bgxz/zlsqbg/ty/100004smszy.doc. (Instruction Abstract of 100004[EB/OL].[2012-02-13].http://www.sipo.gov.cn/bgxz/zlsqbg/ty/100004smszy.doc.)
[3] Girju R,Moldovan D I. Text Mining for Causal Relations[C].In: Proceedings of the 15th International Florida Artificial Intelligence Research Society Conference, Florida, USA.AAAI Press,2002:360-364.
[4] Byrd R J, Ravin Y. Identifying and Extracting Relations in Text[C]. In: Proceedings of the 4th International Conference Application of Natural Language to Information System.1999.
[5] Maedche A,Staab S. Discovering Conceptual Relations from Text[C]. In: Proceedings of the 14th European Conference on Artificial Intelligence (ECAI 2000). 2000:321-325.
[6] 谭力,史忠植.基于数据挖掘的本体关系学习算法[J]. 郑州大学学报:理学版,2008,40(3):40-43.(Tan Li,Shi Zhongzhi. Ontology Conceptual Relation Learning Algorithm Based on Data Mining[J].Journal of Zhengzhou University:Natural Science Edition,2008,40(3):40-43.)
[7] 董丽丽,胡云飞,张翔.一种领域概念非分类关系的获取方法[J]. 计算机工程与应用, 2013,49(4):157-161.(Dong Lili,Hu Yunfei,Zhang Xiang. Method for Non-taxonomical Relations from Domain Concepts[J]. Computer Engineering and Applications,2013,49(4):157-161.)
[8] 于娟,党延忠.本体关系学习方法研究——概念特征词法[J]. 系统工程理论与实践,2012,32(7):1582-1590.(Yu Juan, Dang Yanzhong. Learning Ontology Relations from Documents:The Concept-feature Method[J]. Systems Engineering—Theory & Practice,2012,32(7):1582-1590.)
[9] Li L, Zhou R, Huang D. Two-phase Biomedical Named Entity Recognition Using CRFs[J]. Computational Biology and Chemistry,2009,33(4):334-338.
[10] Peng L, Liu Z, Zhang L. A Recognition Approach Study on Chinese Field Term Based Mutual Information/Conditional Random Fields[J]. Procedia Engineering,2012,29:1952-1956.
[11] Chen L, Qi L, Wang F. Comparison of Feature-level Learning Methods for Mining Online Consumer Reviews[J]. Expert Systems with Applications, 2012,39(10): 9588-9601.
[12] Esuli A, Marcheggiani D, Sebastiani F. An Enhanced CRFs-based System for Information Extraction from Radiology Reports[J].Journal of Biomedical Informatics,2013,46(3):425-435.
[13] 国家知识产权局[EB/OL].[2012-02-14]. http://www.sipo.gov.cn. (State Intellectual Property Office of the People's Republic of China[EB/OL]. [2012-02-14]. http://www.sipo.gov.cn.)
[14] ICTCLAS特色[EB/OL]. [2011-01-10]. http://ictclas.org/ictclas_feature.html. (Features of ICTCLAS[EB/OL]. [2011-01-10]. http://ictclas.org/ictclas_feature.html.)
[15] Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In: Proceedings of the 18th International Conference on Machine Learning (ICICML'01), Williamstown, MA, USA. San Francisco,CA, USA: Morgan Kaufmann Publishers Inc.,2001:282-289.
[16] CRF + +: Yet Another CRF Toolkit [EB/OL].[2012-03-11].http://crfpp.googlecode.com/svn/trunk/doc/index.html.
[17] 谷俊,王昊.基于领域中文文本的术语抽取方法研究[J]. 现代图书情报技术,2011(4):29-34.(Gu Jun,Wang Hao. Study on Term Extraction on the Basis of Chinese Domain Texts[J].New Technology of Library and Information Service,2011(4):29-34.)
[1] 黄菡,王宏宇,王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别*[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[2] 唐慧慧,王昊,张紫玄,王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[3] 王东波,吴毅,叶文豪,刘睿伦. 多特征知识下的食品安全事件实体抽取研究*[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[4] 张越,王东波,朱丹浩. 面向食品安全突发事件汉语分词的特征选择及模型优化研究*[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[5] 张琳,秦策,叶文豪. 基于条件随机场的法言法语实体自动识别模型研究*[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
[6] 王密平,王昊,邓三鸿,吴志祥. 基于CRFs的冶金领域中文专利术语抽取研究*[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[7] 贺惠新,刘丽娟. 主动学习的科技文献研究对象标引体系研究*[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[8] 隋明爽,崔雷. 结合多种特征的CRF模型用于化学物质-疾病命名实体识别[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[9] 朱惠,杨建林,王昊. 中文领域专业术语层次关系构建研究*[J]. 现代图书情报技术, 2016, 32(1): 73-80.
[10] 段宇锋, 朱雯晶, 陈巧, 刘伟, 刘凤红. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[11] 姜春涛. 自动标注中文专利的引文信息[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[12] 何宇, 吕学强, 徐丽萍. 新能源汽车领域中文术语抽取方法[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[13] 曾镇, 吕学强, 李卓. 搜索日志中中文人名的自动识别[J]. 现代图书情报技术, 2014, 30(12): 71-77.
[14] 石翠, 王杨, 杨彬, 姚晔. 面向中文专利文献的单层并列结构识别[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[15] 汪润,何琳,王东波,黄水清,范远标. 面向文本挖掘的植物生长发育实体识别研究*[J]. 现代图书情报技术, 2014, 30(1): 24-27.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn