Extracting Sentences of Research Originality from Full Text Academic Articles
Chengzhi Zhang1,3(),Zheng Li2,3
1School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094, China 2School of Information Management, Nanjing University, Nanjing 210023, China 3Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] This paper analyzes full texts of academic articles, aiming to extract sentences of research originality as well as, exploring their characteristics. [Methods] We used full-text journal papers in the field of library, information and archives as experiment data. Then, we chose mark words, created extraction rules for sentences of research originality. Finally, we analyzed distribution of these sentences with the mark words, types, and locations. [Results] The extracted sentences were mainly divided into six categories, and most of them appeared in the top 24.8% section of each article. [Limitations] The proposed sentence extraction method needs to be optimized. [Conclusions] Sentences of research originality in the field of library, information and archives focus on concepts and theories. The categories and distributions of these sentences are various among different journals.
章成志,李铮. 基于学术论文全文的创新研究评价句抽取研究 *[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles. Data Analysis and Knowledge Discovery, 2019, 3(10): 12-18.
Yu D G, Yu M, Guan B T, Li B J, Zheng Y, Wu Z H, Shi Z. Carbon-carbon formation via Ni-catalyzed Suzuki-Miyaura coupling through C-CN bond cleavage of aryl nitrile[J]. J. Org. Lett. 2009, 11(48): 3374-3377.
学术评价句样例
标志词
频次(比率)
例句
首次······提出
588(17.4%)
“被引速度作为有效的文献计量学工具由A. Schubert等人于1986年首次提出”
最早······提出
485(14.4%)
“协同概念最早是由德国物理学家赫尔曼·哈肯提出并形成系统性理论”
开创
163(4.8%)
“香农(Shannon)从通信角度引入熵的概念开创了信息度量先河掀起了信息度量研究序幕”
早在······提出
125(3.7%)
“T. Berners-Lee早在2006年便提出了关联数据的概念”
创始人
117(3.5%)
“斯科特(Peter J. Scott)因此成为公认的文件系列系统创始人和文件连续体理论的先驱”
首先······提出
113(3.3%)
“信息素养首先于1974年由美国信息产业协会主席PaulZurkow Ski提出”
始于
100(3.0%)
“国外的政府信息资源规划研究始于20世纪80年代早期的马钱德、霍顿的信息资源管理阶段性理论”
源于
87(2.6%)
“社会分类概念源于Barth”
追溯到
87(2.6%)
“消费者决策过程研究最早可追溯到1967年P.Kotler提出的消费者购买决策黑箱理论”
最早······研究
61(1.8%)
“最早对作者文献耦合方法进行实证研究的是Zhao Dangzhi”
评价句的标志词(频次排序前10位)
标志词
频次(比率)
例句
源于
3 965(24.1%)
“碎片化信息大多源于微媒体”
第一个
1 210(7.4%)
“选择关键词构建共词矩阵是共词分析中的第一个关键步骤”
始于
1 081(6.6%)
“佛山市智能图书馆建设始于2011年”
开创
816(5.0%)
“移动电子商务开创了产品与服务新的模式”
追溯到
297(1.8%)
“修谱者往往愿意将自己的祖先追溯到某个名人”
最早······出现
225(1.4%)
“Twitter作为最早出现的微博, 发展相对成熟, 是学术界微博研究者的主要研究对象”
首创
214(1.3%)
“统计表明,美国的技术创新有78%为其首创”
首先······分析
213(1.3%)
“本研究首先对三种活动类型的特征进行调查分析, 包括普及性和价值性两个方面”
创始人
202(1.2%)
“Twitter创始人之一埃文·威廉姆斯曾表示, 微博的真正价值不是粉丝数而是转发量”
首次······出现
112(0.7%)
“重要的内容首次出现的位置通常在标题中”
非评价句的标志词(频次排序前10位)
评价句类型
分类依据
例句
概念理论类
由科研人员命名或定义某个概念或理论
“价值链由Michael E.Porter教授于1985年在其著作《竞争优势》一书中首次提出”
观点发现类
学者通过理论或实践研究提出的想法或发现, 且普遍具有一定长度
“引文分析法也存在缺陷, 早在1987年, King J就曾撰文指出同被引法的不足”
模型方法类
在研究过程中使用的方法
“TAM模型最早是由Davis在理性行为理论(Theory of Reasoned Action, TRA)的基础上提出”
派别领域类
开创了某个学派或是最先在某个领域进行研究
“《文献计量学》一书奠定了邱均平教授作为国内文献计量学奠基人之一的学术地位”
系统软件类
研究成果为开发的系统或者软件
“汉构是国际上最早基于HPSG理论、面向深层语言处理的中型汉语语法系统之一”
实践应用类
需要通过动手实践得到
“曼彻斯特大学的教授们首次提取出石墨烯···”
评价句类型分类说明
评价句类型占比
各期刊的评价句类型分布
评价句位置分布
[1]
Garfield E . Citation Analysis as a Tool in Journal Evaluation[J]. Science, 1972,178(4060):471-479.
[2]
Hirsch J E . An Index to Quantify an Individual’s Scientific Research Output[J]. Proceedings of the National Academy of Sciences of the United States of America, 2005,102(46):16569-16572.
[3]
Ding Y, Zhang G, Chambers T , et al. Content-based Citation Analysis: The Next Generation of Citation Analysis[J]. Journal of the Association for Information Science & Technology, 2014,65(9):1820-1833.
[4]
胡志刚 . 全文引文分析方法与应用[D]. 大连: 大连理工大学, 2014.
[4]
( Hu Zhigang . Full-Text Citation Analysis and Applications[D]. Dalian: Dalian University of Technology, 2014.)
( Zhu Qingsong, Leng Fuhai . Topic Identification of Highly Cited Papers Based on Citation Content Analysis[J]. Journal of Library Science in China, 2014,40(1):39-49.)
( Feng Changgen . A Natural Evaluation Method of Scientific and Technological Achievements is Worthy of National Promotion[J]. The People’s Congress of China, 2017(7):33-35.)
( Shang Hairu, Feng Changgen, Sun Liang . Evaluation of Academic Papers with Academic Influence——Proposing Two New Indicators of Academic Inheritance Effect and Long-term Citation[J]. Chinese Science Bulletin, 2016,61(26):2853-2860.)
( Suo Chuanjun, Gai Shuangshuang, Zhou Zhichao . Cognitive Computing: A New Perspective for Evaluating the Individual Academic Paper[J]. Journal of Library Science in China, 2018,44(1):50-61.)
[9]
Houngbo H, Mercer R E . Method Mention Extraction from Scientific Research Papers [C]// Proceedings of the 2012 International Conference on Computational Linguistics. 2012: 1211-1222.
( Mao Chenyu, Le Xiaoqiu . Linguistic Features of New Findings in Chinese Scientific Papers[J]. New Technology of Library and Information Service, 2016(5):47-55.)
[11]
Heffernan K, Teufel S. Identifying Problem Statements in Scientific Text [C]// Proceedings of the 6th International Conference on Computational Models of Argument. 2016.
[12]
Small H, Tseng H, Patek M . Discovering Discoveries: Identifying Biomedical Discoveries Using Citation Contexts[J]. Journal of Informetrics, 2017,11(1):46-62.
( Xu Wenhai, Wen Youkui . A Chinese Keyword Extraction Algorithm Based on TFIDF Method[J]. Information Studies: Theory & Application, 2008,31(2):298-302.)
[15]
Castro-Sánchez N A, Sidorov G. Analysis of Definitions of Verbs in an Explanatory Dictionary for Automatic Extraction of Actants Based on Detection of Patterns [C]// Proceedings of the 15th International Conference on Applications of Natural Language to Information Systems. 2010: 233-239.
( Wen Youkui, Wu Guangyin . Dynamic Mining of Fragmented Scientific Research Innovation Points[J]. Digital Library Forum, 2014(7):25-32.)
[17]
Helen A, Purwarianti A, Widyantoro D H . Extraction and Classification of Rhetorical Sentences of Experimental Technical Paper Based on Section Class [C]// Proceedings of the 2nd International Conference on Information and Communication Technology. IEEE, 2014: 419-424.
( Yang Bo, Wang Xue, She Zengli . Research on Using Behavior of Scientific Software in Bioinformatics Literature[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(11):1140-1147.)
[19]
Warrens M J . Chance-Corrected Measures for 2 × 2 Tables That Coincide with Weighted Kappa[J]. The British Journal of Mathematical and Statistical Psychology, 2011,64(2):355-365.