Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (9): 15-21     https://doi.org/10.11925/infotech.1003-3513.2014.09.03
  数字图书馆 本期目录 | 过刊浏览 | 高级检索 |
面向领域科技文献的句子级创新点抽取研究
张帆1,2, 乐小虬1
1. 中国科学院文献情报中心 北京 100190;
2. 中国科学院大学 北京 100049
Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus
Zhang Fan1,2, Le Xiaoqiu1
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
全文: PDF (489 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 抽取领域科技文献中句子级创新点。[方法] 面向文献中的句子,以领域词表和本体中的关系为基 础构建识别规则,采用基于主题词重叠度的冗余度计算方法过滤创新点候选集。[结果] 选取肿瘤领域的数据集 进行实验,抽取结果的准确率为89.42%,召回率为60.14%。[局限] 规则有待进一步完善,提高召回率。[结论] 利用领域词表和本体中的关系能有效地抽取科技文献中的句子级创新点。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
乐小虬
张帆
关键词 科技文献语言学特征结构式摘要创新点抽取冗余度计算    
Abstract

[Objective] This article aims to extract innovation points of sentence-level from scientific research paper of specific domain. [Methods] The field thesaurus and Ontology are used in constructing rules to extract innovation points from sentences in research papers, and a redundancy computing method based on keyword-overlap computing is used to filter redundant innovation points. [Results] The experiment is undertaken on data set of Neoplasm and the result shows that the accuracy rate is 89.42% and the recall rate is 60.14%. [Limitations] The rules need to be further improved, and the recall rate needs to be improved. [Conclusions] Using field thesaurus and the relationships in Ontology is effective in extracting innovation points from scientific research paper.

Key wordsScientific research paper    Linguistic feature    Structured abstract    Innovation point extraction    Overlap computing
收稿日期: 2014-05-14      出版日期: 2014-10-20
:  TP393  
基金资助:

本文系国家科技支撑计划子课题“基于文献知识网络的领域学术关系研究与示范”(项目编号:2011BAH10B06-04)的研究成果之一。

通讯作者: 张帆 E-mail:zhangf@mail.las.ac.cn     E-mail: zhangf@mail.las.ac.cn
作者简介: 作者贡献声明:张帆:设计并实施技术方案、技术路线,数据采集、数据清洗,实验的分析和验证,论文的起草、撰写以及最终版本的修订;乐小虬:提出论文研究方向和主要研究思路,指导研究方案及技术路线的设计,文章部分修改。
引用本文:   
张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014, 30(9): 15-21.
Zhang Fan, Le Xiaoqiu. Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus. New Technology of Library and Information Service, 2014, 30(9): 15-21.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.09.03      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I9/15

[1] 温有奎, 徐国华, 赖伯年, 等. 知识元挖掘[M]. 西安: 西安电子科技大学出版社, 2005.(Wen Youkui, Xu Guohua, Lai Bonian, et al. Knowledge Element Mining [M]. Xi'an: Xi'an Electronic Science & Technology University Press, 2005.)
[2] 虞沪生, 张瑞清, 阎为民. 科技论文创新性的审读[J]. 编辑学报, 2006, 18(5): 333-334. (Yu Husheng, Zhang Ruiqing, Yan Weimin. Evaluation of Innovative Attribute of Scientific Papers [J]. Acta Editologica, 2006, 18(5): 333-334.)
[3] Dahl T. The Linguistic Representation of Rhetorical Function: A Study of How Economists Present Their Knowledge Claims [J]. Written Communication, 2009, 26(4): 370-391.
[4] 林浩欣, 阮明淑. 知识管理系统导入的知识主张研究—以软体公司知识管理顾问师为例[J]. 图书馆学与资讯科学, 2012, 38(1): 65-83. (Lin Hauhsin, Yuan Mingshu. A Study of Knowledge Claim in Implementing Knowledge Management System —An Example of Software Company's KM Consultants [J]. Journal of Library and Information Science, 2012, 38(1): 65-83.)
[5] Berkenkotter C, Huckin T N. Genre Knowledge in Discip-linary Communication: Cognition/Culture/Power [M]. Lawrence Erlbaum Associates Inc, 1995.
[6] Trine D. Contributing to the Academic Conversation: A Study of New Knowledge Claims in Economics and Linguistics [J]. Journal of Pragmatics, 2008, 40(7): 1184-1201.
[7] Parkinson J. The Discussion Section as Argument: The Language Used to Prove Knowledge Claims [J]. English for Specific Purposes, 2011, 30(3): 164-175.
[8] Liu X, Guo C, Zhang L. Scholar Metadata and Knowledge Generation with Human and Artificial Intelligence [J]. Journal of the American Society for Information Science and Technology, 2014, 65(6): 1187-1201.
[9] Gonzalez E, Turmo J. Unsupervised Relation Extraction by Massive Clustering [C]. In: Proceedings of the 9th IEEE International Conference on Data Mining, Miami, FL, US.IEEE, 2009: 782-787.
[10] 温有奎, 温浩, 徐端颐, 等. 基于创新点的知识元挖掘[J]. 情报学报, 2005, 24(6): 663-668. (Wen Youkui, Wen Hao, Xu Duanyi, et al. Knowledge Element Mining in Knowledge Management [J]. Journal of the China Society for Scientific and Technical Information, 2005, 24(6): 663-668.)
[11] 杨硕, 崔蒙, 赵英凯, 等. 基于知识元的中医药信息知识标引[J]. 中国中医药信息杂志, 2011, 18(8): 24-25. (Yang Shuo, Cui Meng, Zhao Yingkai, et al. Knowledge Index about TCM Information Based on Knowledge Element [J]. Chinese Journal of Information on Traditional Chinese Medinine, 2011, 18(8): 24-25.)
[12] 冷伏海, 白如江, 祝清松. 面向科技文献的混合语义信息抽取方法研究[J]. 图书情报工作, 2013, 57(11): 112-119. (Leng Fuhai, Bai Rujiang, Zhu Qingsong. A Hybrid Semantic Information Extraction Method for Scientific Research Papers [J]. Library and Information Service, 2013, 57(11): 112-119.)
[13] Klavans J L, Muresan S. DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and Their Associated Definitions from On-line Text[C]. In: Proceedings of the AMIA Symposium on American Medical Informatics Association, 2000:1049.
[14] 刘一宁, 郑彦宁, 化柏林. 学术定义抽取系统实现及实验分析[J]. 情报理论与实践, 2012, 34(12): 15-19. (Liu Yining, Zheng Yanning, Hua Bolin. Analysis and Realization of the Academic Definition Extraction System and Experiment [J]. Information Studies: Theory & Application, 2012, 34(12): 15-19.)
[15] Liu B, Chin C W, Ng H T. Mining Topic-Specific Concepts and Definitions on the Web [C]. In: Proceedings of the 12th International Conference on World Wide Web. ACM, 2003: 251-260.
[16] Swanson D R. Medical Literature as a Potential Source of New Knowledge [J]. Bulletin of the Medical Library Association, 1990, 78(1): 29-37.
[17] Chowdhury M N, Paul S, Sultana K Z. Statistical Analysis Based Hypothesis Testing Method in Biological Knowledge Discovery [J]. International Journal on Conputational Sciences & Applications, 2013, 3(6): 21-29.
[18] Cohen T, Widdows D, Schvaneveldt R W, et al. Discovering Discovery Patterns with Predication-based Semantic Indexing [J]. Journal of Biomedical Informatics, 2012, 45(6): 1049-1065.
[19] Teufel S, Moens M. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status [J]. Computational Linguistics, 2002, 28(4): 409-445.
[20] Huang K C, Liu C C H, Yang S S, et al. Classification of PICO Elements by Text Features Systematically Extracted from PubMed Abstracts[C]. In: Proceedings of the 2011 IEEE International Conference on Granular Computing, Kaohsiung, Taiwan, China. IEEE, 2011: 279-283.
[21] Teufel S, Moens M. Discourse-level Argumentation in Scientific Articles: Human and Automatic Annotation [C]. In: Proceedings of the ACL Towards Standards and Tools for Discourse Tagging Workshop. 1999.
[22] Teufel S, Moens M. What's Yours and What's Mine: Determining Intellectual Attribution in Scientific Text [C]. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. 2000: 9-17.
[23] Demner-Fushman D, Few B, Hauser S E, et al. Automatically Identifying Health Outcome Information in MEDLINE Records [J]. Journal of the American Medical Informatics Association, 2006, 13(1): 52-60.
[24] 温有奎, 温浩. 关键词与创新点词句群分布分析[J]. 情报学报, 2007, 26(1): 50-55. (Wen Youkui, Wen Hao. Sentence Group Distribution of Keywords and Innovation Idea Words [J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(1): 50-55.)
[25] Lock S. Structured Abstracts [J]. British Medical Journal, 1988, 297(6642): 156.
[26] Allan J, Wade C, Bolivar A. Retrieval and Novelty Detection at the Sentence Level [C]. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2003: 314-321.
[27] Kwee A T, Tsai F S, Tang W. Sentence-level Novelty Detection in English and Malay [C]. In: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowlege Discovery and Data Mining. Berlin Heidelberg: Springer, 2009: 40-51.
[28] Zhang Y, Callan J, Minka T. Novelty and Redundancy Detection in Adaptive Filtering [C]. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2002: 81-88.
[29] Zhang M, Song R, Lin C, et al. Expansion-based Technologies in Finding Relevant and New Information: THU TREC 2002 Novelty Track Experiments [C]. In: Proceedings of the 11th Text Retrieval Conference. 2002: 586-590.
[30] LingPipe 4.1.0 [CP/OL]. [2008-10-01]. http://alias-i.com/ lingpipe/.
[31] The Stanford Natural Language Processing Group [EB/OL]. [2013-09-24]. http://nlp.stanford.edu.
[32] National Cancer Institute Thesaurus[EB/OL]. [2014-04-28]. http://ncit.nci.nih.gov/.
[33] 韩英, 梁建莉. 英语医学论文标题的类型与翻译[J]. 新疆医科大学学报, 2002, 25(1): 115-117. (Han Ying, Liang Jianli. Type and Translation of English Medical Paper Headline [J]. Journal of XinJiang Medical Universtity, 2002, 25(1): 115-117.)

[1] 柴庆凤, 史霖炎, 梅珊, 熊海涛, 贺惠新. 基于人工特征和机器特征融合的科技文献知识元抽取*[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[2] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[3] 徐红霞,李春旺. 科技文献内容知识点抽取研究综述[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[4] 刘清民,姚长青,石崇德,温晓洁,孙玥莹. 面向科技文献神经机器翻译词汇表优化研究*[J]. 数据分析与知识发现, 2019, 3(3): 76-82.
[5] 王佳琪, 张均胜, 乔晓东. 基于文献的科研事件表示与语义链接研究*[J]. 数据分析与知识发现, 2018, 2(5): 32-39.
[6] 贺惠新,刘丽娟. 主动学习的科技文献研究对象标引体系研究*[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[7] 王颖, 吴振新, 谢靖. 面向科技文献的语义检索系统研究综述[J]. 现代图书情报技术, 2015, 31(5): 1-7.
[8] 张琪, 章颖华. 情境感知的科技文献协同推荐方法研究[J]. 现代图书情报技术, 2012, 28(2): 10-17.
[9] 邢美凤. 科技文献关键词冗余解决方案研究[J]. 现代图书情报技术, 2012, 28(1): 34-39.
[10] 王东波 谢靖. 基于清华汉语树库的有标记联合结构统计分析[J]. 现代图书情报技术, 2010, 26(4): 12-17.
[11] 张莉华. 科技文献检索课网页制作谈[J]. 现代图书情报技术, 1999, 15(6): 55-56.
[12] 赵宗仁. 汉语科技文献自动标引系统CADAIS[J]. 现代图书情报技术, 1993, 9(1): 12-15.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn