Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (5): 15-23    DOI: 10.11925/infotech.1003-3513.2015.05.03
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
领域科技文献创新点句中主题属性实例识别方法研究
张帆1,2, 乐小虬1
1 中国科学院文献情报中心 北京 100190;
2 中国科学院大学 北京 100049
Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper
Zhang Fan1,2, Le Xiaoqiu1
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 University of Chinese Academy of Sciences, Beijing 100049, China
全文: PDF(668 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]识别创新点句中主题属性实例, 进一步挖掘创新点句中的知识关系。[方法]采用语义角色标注以及依存句法分析方法, 借助领域本体中属性类目下主题词, 从依存树中识别创新点句中的核心主题词以及属性实例; 针对依存句法分析的特征, 设计组合术语识别模块以及连接词关系识别模块以改善识别效果。[结果]创新点句中核心主题词识别的F值达到77.94%; 创新点句中属性实例识别的平均F值在90%左右。[局限]使用Stanford依存句法分析工具对肿瘤领域进行句法分析造成的偏差影响识别效果; 使用NCIt本体属性类别时, 有待进一步过滤与规范。[结论]实验结果表明, 该方法对领域创新点句中的主题属性实例具有较好的识别效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张帆
乐小虬
关键词 领域本体语义角色标注依存句法分析属性实例    
Abstract

[Objective] This article aims to extract concept attribute instances in innovation sentences, and then to explore the relationship between concepts. [Methods] A method of recognizing core concept and concept attribute instances from dependency tree is presented. This method is based on the results of semantic role labeling and dependency parsing, and takes advantage of property of classes in domain Ontology. Considering the feature of dependency parsing, a concept combination module and a conjunction relationship detection module are designed to improve the effect of concept attribute instances recognition. [Results] The results show that the F value of core concept recognition is 77.94%, and the average F value of concept attribute instances recognition is around 90%. [Limitations] Stanford parsing tool leads to wrong parsing results which may result in inaccurate recognition. The class of Properties or Attributes in NCIt is not well filtered and standardized. [Conclusions] This method can effectively extract core concepts and concept attribute instances in innovation sentences.

Key wordsDomain Ontology    Semantic role labeling    Dependency parsing    Attribute instances
收稿日期: 2014-11-07     
:  TP393  
基金资助:

本文系“十二五”国家科技支撑计划重点项目子课题“基于文献知识网络的领域学术关系研究与示范”(项目编号:2011BAH10B06-04)的研究成果之一。

通讯作者: 张帆,ORCID:0000-0001-5929-5198,E-mail:zhangf@mail.las.ac.cn。     E-mail: zhangf@mail.las.ac.cn
作者简介: 作者贡献声明: 张帆:设计并实施技术方案、技术路线,数据采集、清洗,实验分析、验证,论文起草;乐小虬:提出研究方向和研究思路,论文修改及最终版本修订。
引用本文:   
张帆, 乐小虬. 领域科技文献创新点句中主题属性实例识别方法研究[J]. 现代图书情报技术, 2015, 31(5): 15-23.
Zhang Fan, Le Xiaoqiu. Research on Recognition of Concept Attribute Instances in Innovation Sentences of Scientific Research Paper. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.05.03.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.05.03

[1] 李如森, 彭彩红, 赵福荣. 科技论文创新性判断方法[J]. 鞍山钢铁学院学报, 2001, 24(3): 234-236. (Li Rusen, Peng Caihong, Zhao Furong. Judging Method of Innovation for Scientific and Technological Papers [J]. Journal of Anshan Institute of Iron and Steel Technology, 2001, 24(3): 234-236.)
[2] 张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21. (Zhang Fan, Le Xiaoqiu. Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus [J]. New Technology of Library and Information Service, 2014(9): 15-21.)
[3] Lan M, Zhang Y Z, Lu Y, et al. Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results [C]. In: Proceedings of the 2nd Web People Search Evaluation Workshop, Madrid, Spain. 2009.
[4] Ghani R, Probst K, Liu Y, et al. Text Mining for Product Attribute Extraction [J]. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48.
[5] 丁君军, 郑彦宁, 化柏林. 国内外属性抽取研究综述[J]. 情报科学, 2011, 29(5): 793-796. (Ding Junjun, Zheng Yanning, Hua Bolin. Survey on Attribute Extraction at Home and Abroad [J]. Information Science, 2011, 29(5): 793-796.)
[6] 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14. (Ding Junjun, Zheng Yanning, Hua Bolin. Attribute Extraction of Academic Concepts Based on Rules [J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.)
[7] Dietrich R S. Biomedical Named Entity Recognition, Whatizit [A].// Dubitzky W, Wolkenhauer O, Yokota H, et al. Encyclopedia of Systems Biology [M]. New York: Springer, 2013: 132-134.
[8] Jones D E, Igo S, Hurdle J, et al. Automatic Extraction of Nanoparticle Properties Using Natural Language Processing: NanoSifter an Application to Acquire PAMAM Dendrimer Properties [J]. PLoS One, 2014, 9(1): e83932.
[9] Fundel K, Küffner R, Zimmer R. RelEx--Relation Extraction Using Dependency Parse Trees [J]. Bioinformatics, 2007, 23(3): 365-371.
[10] Tang Y T, Li S J, Kao H Y, et al. Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction [J]. PLoS One, 2011, 6(5): e19633.
[11] Pechsiri C, Painuall S, Janviriyasopak U. Medicinal Property Knowledge Extraction from Herbal Documents for Supporting Question Answering System [A]. //New Frontiers in Applied Data Mining [M]. Springer Berlin Heidelberg, 2012: 431-443.
[12] Pechsiri C, Piriyakul R. Developing the UCKG-Why-QA System [C]. In: Proceedings of the 7th International Conference on Computing and Convergence Technology, Seoul, Korea. IEEE, 2012: 679-683.
[13] Feng D H, Burns G, Hovy E H. Extracting Data Records from Unstructured Biomedical Full Text [C]. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic. 2007: 837-846.
[14] Feng D H, Burns G, Zhu J B, et al. Towards Automated Semantic Analysis on Biomedical Research Articles [C]. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing. 2008.
[15] Pham S B, Hoffmann A G. Extracting Positive Attributions from Scientific Papers [C]. In: Proceedings of the 7th International Conference on Discovery Science, Padova, Italy. 2004: 169-182.
[16] Pechsiri C, Kawtrakul A. Mining Causality for Explanation Knowledge from Text [J]. Journal of Computer Science and Technology, 2007, 22(6): 877-889.
[17] Xiao L, Tang K, Liu X, et al. Information Extraction from Nanotoxicity Related Publications [C]. In: Proceedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China. 2013: 25-30.
[18] Pechsiri C, Piriyakul R. Explanation Knowledge Graph Construction Through Causality Extraction from Texts [J]. Journal of Computer Science and Technology, 2010, 25(5): 1055-1070.
[19] 王璐, 朱东华, 任智军. 科技术语属性抽取方法研究[J]. 现代图书情报技术, 2007(5): 69-72. (Wang Lu, Zhu Donghua, Ren Zhijun. A Study on Extraction Method of Term's Attributes [J]. New Technology of Library and Information Service, 2007(5): 69-72.)
[20] Wikipedia: Argument(linguistics)-Notes 1[EB/OL]. [2014-10-29]. http://en.wikipedia.org/wiki/Argument_%28linguistics% 29#cite_note-1.
[21] Wikipedia: Semantic Role Labeling [EB/OL]. [2014-11-04]. http://en.wikipedia.org/wiki/Semantic_role_labeling.
[22] Baker C F, Fillmore C J, Lowe J B. The Berkeley Framenet Project [C]. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th Interna-tional Conference on Computational Linguistics. 1998: 86-90.
[23] Palmer M, Gildea D, Kingsbury P. The Proposition Bank: An Annotated Corpus of Semantic Roles [J]. Computational Linguistics, 2005, 31(1): 71-106.
[24] Meyers A, Reeves R, Macleod C, et al. The NomBank Project: An Interim Report [C]. In: Proceedings of the 2004 Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics Annual Meeting. 2004: 24-31.
[25] 周露阳. 论审评学术论文创新因素的指标体系[J]. 编辑学报, 2006, 18(1): 68-70. (Zhou Luyang. Index System for Identifying Innovation Factors in Academic Papers [J]. Acta Editologica, 2006, 18(1): 68-70.)
[26] 田丽, 周润智. 谈研究生学位论文的创新性[J]. 教育科学, 1999, 2: 55-57. (Tian Li, Zhou Runzhi. Research of Innovation of Postgraduates Dissertations [J]. Education Science, 1999, 2: 55-57.)
[27] National Cancer Institute Thesaurus [EB/OL]. [2014-04-28]. http://ncit.nci.nih.gov/.
[28] Semantic Knowledge Representation Project [EB/OL]. [2014-02-03]. http://skr3.nlm.nih.gov/.
[29] Choi J D, Palmer M. Transition-based Semantic Role Labeling Using Predicate Argument Clustering [C]. In: Proceedings of the 2011 ACL Workshop on Relational Models of Semantics. 2011: 37-45.
[30] The Stanford Natural Language Processing Group [EB/OL]. [2013-09-24]. http://nlp.Stanford.edu.

[1] 何有世,何述芳. 基于领域本体的产品网络口碑信息多层次细粒度情感挖掘*[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[2] 李琳,李辉. 一种基于概念向量空间的文本相似度计算方法[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[3] 陆佳莹,袁勤俭,黄奇,钱韵洁. 基于概念格理论的产品领域本体构建研究*[J]. 现代图书情报技术, 2016, 32(5): 38-46.
[4] 鲍玉来,毕强. 蒙古文音乐领域的语义检索初探*[J]. 现代图书情报技术, 2016, 32(11): 94-100.
[5] 段宇锋, 朱雯晶, 陈巧, 刘伟, 刘凤红. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[6] 段宇锋, 黄思思. 基于BFO构建中文植物物种多样性领域本体的研究[J]. 现代图书情报技术, 2015, 31(12): 72-79.
[7] 颜时彦, 王胜清, 罗云川, 黄浩军. 云环境下基于FCA的领域本体协作构建模式初探[J]. 现代图书情报技术, 2014, 30(3): 49-56.
[8] 聂卉, 杜嘉忠. 依存句法模板下的商品特征标签抽取研究[J]. 现代图书情报技术, 2014, 30(12): 44-50.
[9] 唐晓波, 肖璐. 基于依存句法网络的文本特征提取研究[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[10] 姚晓娜, 祝忠明, 王思丽. 面向地学领域的自动语义标注研究[J]. 现代图书情报技术, 2013, (4): 48-53.
[11] 许鑫, 郭金龙. 基于领域本体的专题库构建——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013, (12): 2-9.
[12] 郭金龙, 洪韵佳, 许鑫. 中华烹饪文化领域本体构建及其应用[J]. 现代图书情报技术, 2013, (12): 10-18.
[13] 洪韵佳, 许鑫. 基于领域本体的知识库多层次文本聚类研究——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013, (12): 19-26.
[14] 金碧漪, 郭金龙, 许鑫. 利用领域本体优化文档检索的研究——基于KIM平台的设计与实现[J]. 现代图书情报技术, 2013, (12): 27-33.
[15] 唐晓波, 肖璐. 融合关键词增补与领域本体的共词分析方法研究[J]. 现代图书情报技术, 2013, 29(11): 60-67.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn