Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (10): 12-18    DOI: 10.11925/infotech.2096-3467.2019.0055
Current Issue | Archive | Adv Search |
Extracting Sentences of Research Originality from Full Text Academic Articles
Chengzhi Zhang1,3(),Zheng Li2,3
1School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094, China
2School of Information Management, Nanjing University, Nanjing 210023, China
3Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
Download: PDF(562 KB)   HTML ( 23
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper analyzes full texts of academic articles, aiming to extract sentences of research originality as well as, exploring their characteristics. [Methods] We used full-text journal papers in the field of library, information and archives as experiment data. Then, we chose mark words, created extraction rules for sentences of research originality. Finally, we analyzed distribution of these sentences with the mark words, types, and locations. [Results] The extracted sentences were mainly divided into six categories, and most of them appeared in the top 24.8% section of each article. [Limitations] The proposed sentence extraction method needs to be optimized. [Conclusions] Sentences of research originality in the field of library, information and archives focus on concepts and theories. The categories and distributions of these sentences are various among different journals.

Key wordsSentences of Originality Research Evaluation      Information Extraction      Academic Evaluation      Full Text Analysis of Academic Articles     
Received: 14 January 2019      Published: 25 November 2019
ZTFLH:  TP391 G35  
Corresponding Authors: Chengzhi Zhang     E-mail: zhangcz@njust.edu.cn

Cite this article:

Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles. Data Analysis and Knowledge Discovery, 2019, 3(10): 12-18.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0055     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I10/12

抽取对象 含义
命名实体识别 从众多信息中识别所需要的命名实体, 是信息抽取中最基本的任务
多语言实体识别 是在命名实体识别基础上的延伸, 可扩展至多种语言
模板元素抽取 将实体与其属性一同抽取出来, 形成实体对象
参照信息抽取 实现将不同地方的统一实体进行连接
模板关系抽取 进一步完善模板元素抽取, 补充各元素之间的关系
情节模板信息抽取 将时间、组织、人物或其他实体连接起来, 形成完整的事件
期刊名称 总数(篇)
大学图书馆学报 259
档案学通讯 360
档案学研究 461
国家图书馆学刊 18
情报科学 948
情报理论与实践 1 182
情报杂志 1 244
情报资料工作 105
图书馆 973
图书馆工作与研究 1 262
图书馆建设 843
图书馆论坛 948
图书馆学研究 1 626
图书馆杂志 805
图书情报工作 2 266
图书情报知识 345
图书与情报 527
现代图书情报技术 459
中国图书馆学报 298
合计 14 929
机构 评价句 出处 引用文献
北京大学 QuEChERS方法是一种快速便捷的前处理
方法, 由Anastassiades等在2003年首次
提出该方法由提取和净化两个主要步骤组
成, 主要用于果蔬中农药的检测。
木合他拜尔, 严华, 徐姗, 冯楠,
郝杰, 朱尘琪, 郭爽, 张朝晖, 韩
南银. 色谱, 2015, 33(11): 1199-1204.
Anastassiades M, Lehotay S J, Stajinbaher D, et al. J. AOAC Int, 2003, 86(2): 412.
北京工业
大学
东南大学的郝英立等人最先在2005年指
出在结霜临界状态时冰粒的大小及其在冷
表面上的分布具有分形特征。
刘耀民, 刘中良, 黄玲艳, 孙
俊芳. 中国科学: 技术科学,
2009, 3911: 1864-1869.
Hao Y L, Jose I, Yong X T. Experimental study of initial state of frost formation on flat surface. J Southeast Uni, 2005, 35(1): 149-153
华东理工
大学
另外, Shi课题组于2009年第一次成功实
现了NiCl2(PCy3)2催化下芳基氰化物与反
应性较差的芳基硼酸酯或烯基硼酸酯的
Suzuki-Miyaura偶联反应。
寇学振, 范佳骏, 童晓峰, 沈
增明. 有机化学, 2013, 33(7):
1407-1422.
Yu D G, Yu M, Guan B T, Li B J, Zheng Y, Wu Z H, Shi Z. Carbon-carbon formation via Ni-catalyzed Suzuki-Miyaura coupling through C-CN bond cleavage of aryl nitrile[J]. J. Org. Lett. 2009, 11(48): 3374-3377.
标志词 频次(比率) 例句
首次······提出 588(17.4%) “被引速度作为有效的文献计量学工具由A. Schubert等人于1986年首次提出”
最早······提出 485(14.4%) “协同概念最早是由德国物理学家赫尔曼·哈肯提出并形成系统性理论”
开创 163(4.8%) “香农(Shannon)从通信角度引入熵的概念开创了信息度量先河掀起了信息度量研究序幕”
早在······提出 125(3.7%) “T. Berners-Lee早在2006年便提出了关联数据的概念”
创始人 117(3.5%) “斯科特(Peter J. Scott)因此成为公认的文件系列系统创始人和文件连续体理论的先驱”
首先······提出 113(3.3%) “信息素养首先于1974年由美国信息产业协会主席PaulZurkow Ski提出”
始于 100(3.0%) “国外的政府信息资源规划研究始于20世纪80年代早期的马钱德、霍顿的信息资源管理阶段性理论”
源于 87(2.6%) “社会分类概念源于Barth”
追溯到 87(2.6%) “消费者决策过程研究最早可追溯到1967年P.Kotler提出的消费者购买决策黑箱理论”
最早······研究 61(1.8%) “最早对作者文献耦合方法进行实证研究的是Zhao Dangzhi”
标志词 频次(比率) 例句
源于 3 965(24.1%) “碎片化信息大多源于微媒体”
第一个 1 210(7.4%) “选择关键词构建共词矩阵是共词分析中的第一个关键步骤”
始于 1 081(6.6%) “佛山市智能图书馆建设始于2011年”
开创 816(5.0%) “移动电子商务开创了产品与服务新的模式”
追溯到 297(1.8%) “修谱者往往愿意将自己的祖先追溯到某个名人”
最早······出现 225(1.4%) “Twitter作为最早出现的微博, 发展相对成熟, 是学术界微博研究者的主要研究对象”
首创 214(1.3%) “统计表明,美国的技术创新有78%为其首创”
首先······分析 213(1.3%) “本研究首先对三种活动类型的特征进行调查分析, 包括普及性和价值性两个方面”
创始人 202(1.2%) “Twitter创始人之一埃文·威廉姆斯曾表示, 微博的真正价值不是粉丝数而是转发量”
首次······出现 112(0.7%) “重要的内容首次出现的位置通常在标题中”
评价句类型 分类依据 例句
概念理论类 由科研人员命名或定义某个概念或理论 “价值链由Michael E.Porter教授于1985年在其著作《竞争优势》一书中首次提出”
观点发现类 学者通过理论或实践研究提出的想法或发现, 且普遍具有一定长度 “引文分析法也存在缺陷, 早在1987年, King J就曾撰文指出同被引法的不足”
模型方法类 在研究过程中使用的方法 “TAM模型最早是由Davis在理性行为理论(Theory of Reasoned Action, TRA)的基础上提出”
派别领域类 开创了某个学派或是最先在某个领域进行研究 “《文献计量学》一书奠定了邱均平教授作为国内文献计量学奠基人之一的学术地位”
系统软件类 研究成果为开发的系统或者软件 “汉构是国际上最早基于HPSG理论、面向深层语言处理的中型汉语语法系统之一”
实践应用类 需要通过动手实践得到 “曼彻斯特大学的教授们首次提取出石墨烯···”
[1] Garfield E . Citation Analysis as a Tool in Journal Evaluation[J]. Science, 1972,178(4060):471-479.
[2] Hirsch J E . An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005,102(46):16569-16572.
[3] Ding Y, Zhang G, Chambers T , et al. Content-based Citation Analysis: The Next Generation of Citation Analysis[J]. Journal of the Association for Information Science & Technology, 2014,65(9):1820-1833.
[4] 胡志刚 . 全文引文分析方法与应用[D]. 大连: 大连理工大学, 2014.
[4] ( Hu Zhigang . Full-Text Citation Analysis and Applications[D]. Dalian: Dalian University of Technology, 2014.)
[5] 祝清松, 冷伏海 . 基于引文内容分析的高被引论文主题识别研究[J]. 中国图书馆学报, 2014,40(1):39-49.
[5] ( Zhu Qingsong, Leng Fuhai . Topic Identification of Highly Cited Papers Based on Citation Content Analysis[J]. Journal of Library Science in China, 2014,40(1):39-49.)
[6] 冯长根 . 一种自然而然的科技成果评价方法值得国家推广[J]. 中国人大, 2017(7):33-35.
[6] ( Feng Changgen . A Natural Evaluation Method of Scientific and Technological Achievements is Worthy of National Promotion[J]. The People’s Congress of China, 2017(7):33-35.)
[7] 尚海茹, 冯长根, 孙良 . 用学术影响力评价学术论文——兼论关于学术传承效应和长期引用的两个新指标[J]. 科学通报, 2016,61(26):2853-2860.
[7] ( Shang Hairu, Feng Changgen, Sun Liang . Evaluation of Academic Papers with Academic Influence——Proposing Two New Indicators of Academic Inheritance Effect and Long-term Citation[J]. Chinese Science Bulletin, 2016,61(26):2853-2860.)
[8] 索传军, 盖双双, 周志超 . 认知计算——单篇学术论文评价的新视角[J]. 中国图书馆学报, 2018,44(1):50-61.
[8] ( Suo Chuanjun, Gai Shuangshuang, Zhou Zhichao . Cognitive Computing: A New Perspective for Evaluating the Individual Academic Paper[J]. Journal of Library Science in China, 2018,44(1):50-61.)
[9] Houngbo H, Mercer R E . Method Mention Extraction from Scientific Research Papers [C]// Proceedings of the 2012 International Conference on Computational Linguistics. 2012: 1211-1222.
[10] 毛琛瑜, 乐小虬 . 领域内中文科技文献中新发现语言描述特征分析[J]. 现代图书情报技术, 2016(5):47-55.
[10] ( Mao Chenyu, Le Xiaoqiu . Linguistic Features of New Findings in Chinese Scientific Papers[J]. New Technology of Library and Information Service, 2016(5):47-55.)
[11] Heffernan K, Teufel S. Identifying Problem Statements in Scientific Text [C]// Proceedings of the 6th International Conference on Computational Models of Argument. 2016.
[12] Small H, Tseng H, Patek M . Discovering Discoveries: Identifying Biomedical Discoveries Using Citation Contexts[J]. Journal of Informetrics, 2017,11(1):46-62.
[13] 李中言, 李普跃 . 信息抽取方法综述[J]. 廊坊师范学院学报, 2005,21(3):115-116.
[13] ( Li Zhongyan, Li Puyue . A Summary of Information Sampling Method[J]. Journal of Langfang Teachers College, 2005,21(3):115-116.)
[14] 徐文海, 温有奎 . 一种基于TFIDF方法的中文关键词抽取算法[J]. 情报理论与实践, 2008,31(2):298-302.
[14] ( Xu Wenhai, Wen Youkui . A Chinese Keyword Extraction Algorithm Based on TFIDF Method[J]. Information Studies: Theory & Application, 2008,31(2):298-302.)
[15] Castro-Sánchez N A, Sidorov G. Analysis of Definitions of Verbs in an Explanatory Dictionary for Automatic Extraction of Actants Based on Detection of Patterns [C]// Proceedings of the 15th International Conference on Applications of Natural Language to Information Systems. 2010: 233-239.
[16] 温有奎, 吴广印 . 碎片化科研创新点动态挖掘研究[J]. 数字图书馆论坛, 2014(7):25-32.
[16] ( Wen Youkui, Wu Guangyin . Dynamic Mining of Fragmented Scientific Research Innovation Points[J]. Digital Library Forum, 2014(7):25-32.)
[17] Helen A, Purwarianti A, Widyantoro D H . Extraction and Classification of Rhetorical Sentences of Experimental Technical Paper Based on Section Class [C]// Proceedings of the 2nd International Conference on Information and Communication Technology. IEEE, 2014: 419-424.
[18] 杨波, 王雪, 佘曾溧 . 生物信息学文献中的科学软件利用行为研究[J]. 情报学报, 2016,35(11):1140-1147.
[18] ( Yang Bo, Wang Xue, She Zengli . Research on Using Behavior of Scientific Software in Bioinformatics Literature[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(11):1140-1147.)
[19] Warrens M J . Chance-Corrected Measures for 2 × 2 Tables That Coincide with Weighted Kappa[J]. The British Journal of Mathematical and Statistical Psychology, 2011,64(2):355-365.
[1] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[3] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[4] Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[5] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[6] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[7] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[8] Zhang Han, Liu Shuangmei. Comparative Analysis of Centrality Indices in Extracting Concepts from Semantic Predication Network——Based on Disease Treatment Research[J]. 现代图书情报技术, 2013, (6): 30-35.
[9] Huang Xun, You Hongliang, Yu Yang. A Review of Relation Extraction[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[10] He Lin, He Juan, Shen Gengyu, Yang Bo, Huang Shuiqing. An Approach to Discovery of Reference Control Gene for qRT-PCR Experiment Based on Texting Mining[J]. 现代图书情报技术, 2012, 28(7): 109-114.
[11] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
[12] Wang Xiuyan, Cui Lei. Overview of Semantic Relations Extraction Between Biomedical Entities by Key Verbs[J]. 现代图书情报技术, 2011, 27(9): 21-27.
[13] Zhou Hong, Zhang Bei, Jiang Airong, Zhang Chengyu. Design and Implementation of Library Bibliography Information Self SMS Push Service[J]. 现代图书情报技术, 2011, 27(7/8): 127-131.
[14] Deng Sanhong, Wang Hao, Su Xinning. Association Analysis of Academic Periodicals Based on CSSCI_Onto[J]. 现代图书情报技术, 2011, 27(3): 30-37.
[15] Wang Zhichao, Weng Nan, Wang Yu. Research of Title Party News Identification Technology Based on Topic Sentence Similarity[J]. 现代图书情报技术, 2011, (11): 48-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn