Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (10): 29-36    DOI: 10.11925/infotech.2096-3467.2019.0069
Current Issue | Archive | Adv Search |
System Analysis and Design for Methodological Entities Extraction in Full Text of Academic Literature
Hao Xu1,Xuefang Zhu2(),Chengzhi Zhang3,Chuan Jiang4
1School of Economics & Management, Nanjing Institute of Technology, Nanjing 211167, China
2School of Information Management, Nanjing University, Nanjing 210023, China
3School of Economics & Management, Nanjing University of Science and Technology, Nanjing 210094, China
4 College of Information Science & Technology, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF(1446 KB)   HTML ( 19
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new system to extract methodological entities from the full texts of academic literature, aiming to identify their indexing features and usages. [Methods] Firstly, we extracted feature sentences and methodological entities based on dictionaries, rules, and manual annotations. Then, we implemented a methodology knowledge extraction module with the help of Microsoft Visual Studio 2012 and SQL Server 2012. [Results] The precision of extracting methodological features was 76%, while the recall rate was greater than 42%. Each feature sentence had 1.42 method entities on average. The formal indexing ratio for methodological entities was less than 27%, while the ratio for feature sentences was less than 35%. We also found low formal indexing rate for subject-specific methodological entities. [Limitations] This system’s recall and precision rates were not very satisfactory. The manual workload was intensive for entity extraction and did not include the semantic features. [Conclusions] The proposed method has inter-disciplinary versatility and helps us explore the dissemination routes of interdisciplinary knowledge.

Key wordsFull Text of Academic Literature      Methodological Entities      Entity Extraction System      Entity Use Feature     
Received: 15 January 2019      Published: 25 November 2019
ZTFLH:  G250  
Corresponding Authors: Xuefang Zhu     E-mail: xfzhu@nju.edu.cn

Cite this article:

Hao Xu,Xuefang Zhu,Chengzhi Zhang,Chuan Jiang. System Analysis and Design for Methodological Entities Extraction in Full Text of Academic Literature. Data Analysis and Knowledge Discovery, 2019, 3(10): 29-36.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0069     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I10/29

序号 模式 序号 模式
1 use<>software 6 analysis be perform with<>
2 perform use<> 7 <>statistical software
3 be perform use<> 8 <> software
4 analysis be perform use<> 9 quantify use<>
5 analyze use<> 10 be calculate use<>
文本名称 特征句数量/准确率(%) 实体数量 标引次数/百分比(%) 特征句标引次数 实体标引占特征句标引比例(%)
S_0.txt 575/76.36 812 213/26.23 269 79.18
S_1.txt 602/76.30 829 209/25.21 257 81.32
S_2.txt 572/75.66 816 206/25.25 261 78.92
S_3.txt 595/75.32 843 215/25.50 266 80.83
S_4.txt 556/74.73 794 196/24.69 241 81.33
S_5.txt 626/77.28 892 219/24.55 268 81.72
S_6.txt 610/76.44 883 221/25.03 278 74.16
S_7.txt 595/76.38 869 214/24.63 276 77.54
S_8.txt 600/76.43 800 194/24.25 249 78.22
S_9.txt 618/76.67 916 223/24.34 299 74.58
序号 实体名称 提及次数 正式引用次数/引用率(%) 正式引用有效次数/有效率(%)
1 SPSS 376 7/1.86 1/14.29
2 Image J 269 38/14.13 29/76.32
3 GraphPad Prism 247 0/0.00 0/0.00
4 ANOVA 209 5/2.39 2/40.00
5 R 178 70/39.33 15/21.43
6 student 's t - test 147 3/2.04 2/66.67
7 SAS 142 9/6.34 2/22.22
8 Stata 113 14/12.39 2/14.29
9 MATLAB 105 25/23.81 18/72.00
10 FlowJo 91 4/4.40 4/100.00
11 BLAST 79 24/30.38 24/100.00
12 Primer 73 15/20.55 10/66.67
13 GraphPad software 56 0/0.00 0/0.00
14 EXCEL 56 25/44.64 1/4.00
15 MEGA 55 28/50.91 27/96.43
[1] 崔明, 潘雪莲, 华薇娜 . 我国图书情报领域的软件使用和引用研究[J]. 中国图书馆学报, 2018,44(3):68-78.
[1] ( Cui Ming, Pan Xuelian, Hua Weina . Software Usage and Citation in the Field of Library and Information Science in China[J]. Journal of Library Science in China, 2018,44(3):68-78.)
[2] Hafer L, Kirkpatrick A E . Assessing Open Source Software as a Scholarly Contribution[J]. Communications of the ACM, 2009,52(12):126-129.
[3] Piwowar H . Altmetrics: Value All Research Products[J]. Nature, 2013,493(7431):159.
[4] Research Excellence Framework. Output Information Requirements[EB/OL]. [ 2018- 11- 18]. .
[5] 孙建军, 裴雷, 蒋婷 . 面向学科领域的学术文献语义标注框架研究[J]. 情报学报, 2018,37(11):1077-1086.
[5] ( Sun Jianjun, Pei Lei, Jiang Ting . Research on Semantic Annotation in Academic Literature[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(11):1077-1086.)
[6] 王佳敏, 李信, 刘齐进 . 全文本文献计量分析学术沙龙综述[J]. 信息资源管理学报, 2018,8(4):119-125.
[6] ( Wang Jiamin, Li Xin, Liu Qijin . A Review of the Academic Salon on Full-text Bibliometric Analysis[J]. Journal of Information Resources Management, 2018,8(4):119-125.)
[7] Gupta S, Manning C D . Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 1-9.
[8] Kondo T, Nanba H, Takezawa T , et al. Technical Trend Analysis by Analyzing Research Papers’ Titles [C]// Proceedings of the 4th Language and Technology Conference. 2009: 512-521.
[9] 化柏林 . 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6):68-75.
[9] ( Hua Bolin . Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6):68-75.)
[10] Girju R, Beamer B, Rozovskaya A , et al. A Knowledge-Rich Approach to Identifying Semantic Relations Between Nominals[J]. Information Processing & Management, 2010,46(5):589-610.
[11] Pan X, Yan E, Wang Q , et al. Assessing the Impact of Software on Science: A Bootstrapped Learning of Software Entities in Full-Text Papers[J]. Journal of Informetrics, 2015,9(4):860-871.
[12] Nanba H, Kondo T, Takezawa T . Automatic Creation of a Technical Trend Map from Research Papers and Patents [C]// Proceedings of the 3rd International Workshop on Patent Information Retrieval. ACM, 2010: 11-16.
[13] Tsai C T, Kundu G, Roth D . Concept-Based Analysis of Scientific Literature [C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 1733-1738.
[14] Houngbo H, Mercer R E . Method Mention Extraction from Scientific Research Papers [C]// Proceedings of the 2012 International Conference on Computational Linguistics. 2012: 1211-1222.
[15] Guo Y, Silins I, Stenius U , et al. Active Learning-Based Information Structure Analysis of Full Scientific Articles and Two Applications for Biomedical Literature Review[J]. Bioinformatics, 2013,29(11):1440-1447.
[16] 钱力, 张晓林, 王茜 . 科技论文的研究设计指纹自动识别方法构建与实现[J]. 图书情报工作, 2018,62(2):135-143.
[16] ( Qian Li, Zhang Xiaolin, Wang Qian . Building and Implement on Automatic Identification Method of Research Design Fingerprint of Scientific Papers[J]. Library and Information Service, 2018,62(2):135-143.)
[17] 程齐凯 . 学术文本的词汇功能识别[D]. 武汉: 武汉大学, 2015.
[17] ( Cheng Qikai . Term Function Recognition from Academic Text[D]. Wuhan: Wuhan University, 2015.)
[18] 李信, 程齐凯, 刘兴帮 . 基于词汇功能识别的科研文献分析系统设计与实现[J]. 图书情报工作, 2017,61(1):109-116.
[18] ( Li Xin, Cheng Qikai, Liu Xingbang . Design and Implementation of Scientific Literature Analysis System Based on Term Function Recognition[J]. Library and Information Service, 2017,61(1):109-116.)
[19] Pettigrew K E, McKechnie L E F . The Use of Theory in Information Science Research[J]. Journal of the American Society for Information Science and Technology, 2001,52(1):62-73.
[20] 王芳, 陈锋, 祝娜 , 等. 我国情报学理论的来源、应用及学科专属度研究[J]. 情报学报, 2016,35(11):1148-1164.
[20] ( Wang Fang, Chen Feng, Zhu Na , et al. Theories of Information Science in China: Source, Uses and Discipline Exclusive Degrees[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(11):1148-1164.)
[21] 王芳, 祝娜, 翟羽佳 . 我国情报学研究中混合方法的应用及其领域分布分析[J]. 情报学报, 2017,36(11):1119-1129.
[21] ( Wang Fang, Zhu Na, Zhai Yujia . Application of Mixed Methods and Their Field Distribution in Information Science Research in China[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(11):1119-1129.)
[22] 徐浩, 钱爱兵, 朱学芳 , 等. 科学知识图谱绘制工具CiteSpace的学科领域扩散特征研究[J]. 情报杂志, 2017,36(5):69-74, 68.
[22] ( Xu Hao, Qian Aibing, Zhu Xuefang , et al. Discipline Diffusion Features of the Mapping Knowledge Domains Software: CiteSpace[J]. Journal of Intelligence, 2017,36(5):69-74,68.)
[23] JATS数据标准[EB/OL]. [ 2018- 11- 09]. .
[23] ( Journal Archiving and Interchange Tag Set[EB/OL]. [ 2018- 11- 09].
[1] Liu Feng, Zhang Xiaolin. Review on the Scientific Metadata Standards and Research on Its Generic Design[J]. 现代图书情报技术, 2015, 31(12): 3-12.
[2] Sun Yi'nan, Ku Liping, Song Xiufang, Liu Jingjing, Jiang Xian. The Policy Research and Analysis of Subject Data Repository ——Cases Study of Life Sciences[J]. 现代图书情报技术, 2015, 31(12): 13-20.
[3] Bi Qiang, Liu Jian. Research on the Service Recommendation of the Content of Digital Literature Resources[J]. 现代图书情报技术, 2015, 31(12): 21-27.
[4] Zhu Guang. Copyright Protection Scheme of Color Images for Libraries, Museums and Archives Based on Zero-Watermarking[J]. 现代图书情报技术, 2015, 31(12): 89-94.
[5] Wang Zhengjun, Yu Xiaoyi, Jin Yuling. Using Sniffer Technology to Constraint Electronic Resource Excessive Downloading[J]. 现代图书情报技术, 2015, 31(12): 95-100.
[6] Jin Wei, Zhao Rongying, Yin Ge. An Analysis of the Accumulation State and the Validity of User Readership Data in Online Reference Managers ——Take the Indicators of Altmetrics as an Example[J]. 现代图书情报技术, 2015, 31(11): 75-81.
[7] Zheng Yangyang, Xu Jian, Xiao Zhuo. Utilization of Sentiment Analysis and Visualization in Online Video Bullet-screen Comments[J]. 现代图书情报技术, 2015, 31(11): 82-90.
[8] Liu Yueru, Guo Limin. The New Utilizes of WeChat Platform with Interactive Functions[J]. 现代图书情报技术, 2015, 31(11): 104-109.
[9] Zhang Chengzhi, Gu Xiaoxue. Clustering Machine-Generated Tags with Different Quality[J]. 现代图书情报技术, 2015, 31(10): 22-29.
[10] Gu Xiaoxue, Zhang Chengzhi. Combined with Annotated Content and User Attributes for Tag Clustering[J]. 现代图书情报技术, 2015, 31(10): 30-39.
[11] Liu Dan. Personalized Book Recommender Service Deployment Using Apache Mahout[J]. 现代图书情报技术, 2015, 31(10): 102-108.
[12] Ma Yumeng, Guo Jinjing, Wang Fang. Research on the Framework of Semantic Organization Model for Research Data in the e-Science Environment[J]. 现代图书情报技术, 2015, 31(7-8): 48-57.
[13] Wu Dan, Ran Aihua. A Comparative Study of Mobile Reading Applications Based on User Experiences[J]. 现代图书情报技术, 2015, 31(7-8): 73-79.
[14] Chen Ting, Han Tao, Li Zexia, Li Guopeng, Wang Xiaomei. Research on Comparison Method of Scientific Funding Layout——Take NSF and EU FP Grants for Instance[J]. 现代图书情报技术, 2015, 31(7-8): 89-96.
[15] Guo Zhenying, Zhao Wenbing, Wei Yuhui. Construction of Linked Data with Lightweight Book Bibliography Ontology[J]. 现代图书情报技术, 2015, 31(7-8): 139-143.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn