Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (9): 15-21    DOI: 10.11925/infotech.1003-3513.2014.09.03
Current Issue | Archive | Adv Search |
Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus
Zhang Fan1,2, Le Xiaoqiu1
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(489 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This article aims to extract innovation points of sentence-level from scientific research paper of specific domain. [Methods] The field thesaurus and Ontology are used in constructing rules to extract innovation points from sentences in research papers, and a redundancy computing method based on keyword-overlap computing is used to filter redundant innovation points. [Results] The experiment is undertaken on data set of Neoplasm and the result shows that the accuracy rate is 89.42% and the recall rate is 60.14%. [Limitations] The rules need to be further improved, and the recall rate needs to be improved. [Conclusions] Using field thesaurus and the relationships in Ontology is effective in extracting innovation points from scientific research paper.

Key wordsScientific research paper      Linguistic feature      Structured abstract      Innovation point extraction      Overlap computing     
Received: 14 May 2014      Published: 20 October 2014
:  TP393  

Cite this article:

Zhang Fan, Le Xiaoqiu. Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus. New Technology of Library and Information Service, 2014, 30(9): 15-21.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.09.03     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I9/15

[1] 温有奎, 徐国华, 赖伯年, 等. 知识元挖掘[M]. 西安: 西安电子科技大学出版社, 2005.(Wen Youkui, Xu Guohua, Lai Bonian, et al. Knowledge Element Mining [M]. Xi'an: Xi'an Electronic Science & Technology University Press, 2005.)
[2] 虞沪生, 张瑞清, 阎为民. 科技论文创新性的审读[J]. 编辑学报, 2006, 18(5): 333-334. (Yu Husheng, Zhang Ruiqing, Yan Weimin. Evaluation of Innovative Attribute of Scientific Papers [J]. Acta Editologica, 2006, 18(5): 333-334.)
[3] Dahl T. The Linguistic Representation of Rhetorical Function: A Study of How Economists Present Their Knowledge Claims [J]. Written Communication, 2009, 26(4): 370-391.
[4] 林浩欣, 阮明淑. 知识管理系统导入的知识主张研究—以软体公司知识管理顾问师为例[J]. 图书馆学与资讯科学, 2012, 38(1): 65-83. (Lin Hauhsin, Yuan Mingshu. A Study of Knowledge Claim in Implementing Knowledge Management System —An Example of Software Company's KM Consultants [J]. Journal of Library and Information Science, 2012, 38(1): 65-83.)
[5] Berkenkotter C, Huckin T N. Genre Knowledge in Discip-linary Communication: Cognition/Culture/Power [M]. Lawrence Erlbaum Associates Inc, 1995.
[6] Trine D. Contributing to the Academic Conversation: A Study of New Knowledge Claims in Economics and Linguistics [J]. Journal of Pragmatics, 2008, 40(7): 1184-1201.
[7] Parkinson J. The Discussion Section as Argument: The Language Used to Prove Knowledge Claims [J]. English for Specific Purposes, 2011, 30(3): 164-175.
[8] Liu X, Guo C, Zhang L. Scholar Metadata and Knowledge Generation with Human and Artificial Intelligence [J]. Journal of the American Society for Information Science and Technology, 2014, 65(6): 1187-1201.
[9] Gonzalez E, Turmo J. Unsupervised Relation Extraction by Massive Clustering [C]. In: Proceedings of the 9th IEEE International Conference on Data Mining, Miami, FL, US.IEEE, 2009: 782-787.
[10] 温有奎, 温浩, 徐端颐, 等. 基于创新点的知识元挖掘[J]. 情报学报, 2005, 24(6): 663-668. (Wen Youkui, Wen Hao, Xu Duanyi, et al. Knowledge Element Mining in Knowledge Management [J]. Journal of the China Society for Scientific and Technical Information, 2005, 24(6): 663-668.)
[11] 杨硕, 崔蒙, 赵英凯, 等. 基于知识元的中医药信息知识标引[J]. 中国中医药信息杂志, 2011, 18(8): 24-25. (Yang Shuo, Cui Meng, Zhao Yingkai, et al. Knowledge Index about TCM Information Based on Knowledge Element [J]. Chinese Journal of Information on Traditional Chinese Medinine, 2011, 18(8): 24-25.)
[12] 冷伏海, 白如江, 祝清松. 面向科技文献的混合语义信息抽取方法研究[J]. 图书情报工作, 2013, 57(11): 112-119. (Leng Fuhai, Bai Rujiang, Zhu Qingsong. A Hybrid Semantic Information Extraction Method for Scientific Research Papers [J]. Library and Information Service, 2013, 57(11): 112-119.)
[13] Klavans J L, Muresan S. DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and Their Associated Definitions from On-line Text[C]. In: Proceedings of the AMIA Symposium on American Medical Informatics Association, 2000:1049.
[14] 刘一宁, 郑彦宁, 化柏林. 学术定义抽取系统实现及实验分析[J]. 情报理论与实践, 2012, 34(12): 15-19. (Liu Yining, Zheng Yanning, Hua Bolin. Analysis and Realization of the Academic Definition Extraction System and Experiment [J]. Information Studies: Theory & Application, 2012, 34(12): 15-19.)
[15] Liu B, Chin C W, Ng H T. Mining Topic-Specific Concepts and Definitions on the Web [C]. In: Proceedings of the 12th International Conference on World Wide Web. ACM, 2003: 251-260.
[16] Swanson D R. Medical Literature as a Potential Source of New Knowledge [J]. Bulletin of the Medical Library Association, 1990, 78(1): 29-37.
[17] Chowdhury M N, Paul S, Sultana K Z. Statistical Analysis Based Hypothesis Testing Method in Biological Knowledge Discovery [J]. International Journal on Conputational Sciences & Applications, 2013, 3(6): 21-29.
[18] Cohen T, Widdows D, Schvaneveldt R W, et al. Discovering Discovery Patterns with Predication-based Semantic Indexing [J]. Journal of Biomedical Informatics, 2012, 45(6): 1049-1065.
[19] Teufel S, Moens M. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status [J]. Computational Linguistics, 2002, 28(4): 409-445.
[20] Huang K C, Liu C C H, Yang S S, et al. Classification of PICO Elements by Text Features Systematically Extracted from PubMed Abstracts[C]. In: Proceedings of the 2011 IEEE International Conference on Granular Computing, Kaohsiung, Taiwan, China. IEEE, 2011: 279-283.
[21] Teufel S, Moens M. Discourse-level Argumentation in Scientific Articles: Human and Automatic Annotation [C]. In: Proceedings of the ACL Towards Standards and Tools for Discourse Tagging Workshop. 1999.
[22] Teufel S, Moens M. What's Yours and What's Mine: Determining Intellectual Attribution in Scientific Text [C]. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. 2000: 9-17.
[23] Demner-Fushman D, Few B, Hauser S E, et al. Automatically Identifying Health Outcome Information in MEDLINE Records [J]. Journal of the American Medical Informatics Association, 2006, 13(1): 52-60.
[24] 温有奎, 温浩. 关键词与创新点词句群分布分析[J]. 情报学报, 2007, 26(1): 50-55. (Wen Youkui, Wen Hao. Sentence Group Distribution of Keywords and Innovation Idea Words [J]. Journal of the China Society for Scientific and Technical Information, 2007, 26(1): 50-55.)
[25] Lock S. Structured Abstracts [J]. British Medical Journal, 1988, 297(6642): 156.
[26] Allan J, Wade C, Bolivar A. Retrieval and Novelty Detection at the Sentence Level [C]. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2003: 314-321.
[27] Kwee A T, Tsai F S, Tang W. Sentence-level Novelty Detection in English and Malay [C]. In: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowlege Discovery and Data Mining. Berlin Heidelberg: Springer, 2009: 40-51.
[28] Zhang Y, Callan J, Minka T. Novelty and Redundancy Detection in Adaptive Filtering [C]. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2002: 81-88.
[29] Zhang M, Song R, Lin C, et al. Expansion-based Technologies in Finding Relevant and New Information: THU TREC 2002 Novelty Track Experiments [C]. In: Proceedings of the 11th Text Retrieval Conference. 2002: 586-590.
[30] LingPipe 4.1.0 [CP/OL]. [2008-10-01]. http://alias-i.com/ lingpipe/.
[31] The Stanford Natural Language Processing Group [EB/OL]. [2013-09-24]. http://nlp.stanford.edu.
[32] National Cancer Institute Thesaurus[EB/OL]. [2014-04-28]. http://ncit.nci.nih.gov/.
[33] 韩英, 梁建莉. 英语医学论文标题的类型与翻译[J]. 新疆医科大学学报, 2002, 25(1): 115-117. (Han Ying, Liang Jianli. Type and Translation of English Medical Paper Headline [J]. Journal of XinJiang Medical Universtity, 2002, 25(1): 115-117.)

[1] Mao Chenyu,Le Xiaoqiu. Linguistic Features of New Findings in Chinese Scientific Papers[J]. 现代图书情报技术, 2016, 32(5): 47-55.
[2] Wang Dong-Bo,Xie Jing. Analyzing the Linguistic Features of Coordination with Overt[J]. 现代图书情报技术, 2010, 26(4): 12-17.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn