Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (10): 29-36    DOI: 10.11925/infotech.2096-3467.2019.0069
Current Issue | Archive | Adv Search |
System Analysis and Design for Methodological Entities Extraction in Full Text of Academic Literature
Hao Xu1,Xuefang Zhu2(),Chengzhi Zhang3,Chuan Jiang4
1School of Economics & Management, Nanjing Institute of Technology, Nanjing 211167, China
2School of Information Management, Nanjing University, Nanjing 210023, China
3School of Economics & Management, Nanjing University of Science and Technology, Nanjing 210094, China
4 College of Information Science & Technology, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF (1446 KB)   HTML ( 21
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new system to extract methodological entities from the full texts of academic literature, aiming to identify their indexing features and usages. [Methods] Firstly, we extracted feature sentences and methodological entities based on dictionaries, rules, and manual annotations. Then, we implemented a methodology knowledge extraction module with the help of Microsoft Visual Studio 2012 and SQL Server 2012. [Results] The precision of extracting methodological features was 76%, while the recall rate was greater than 42%. Each feature sentence had 1.42 method entities on average. The formal indexing ratio for methodological entities was less than 27%, while the ratio for feature sentences was less than 35%. We also found low formal indexing rate for subject-specific methodological entities. [Limitations] This system’s recall and precision rates were not very satisfactory. The manual workload was intensive for entity extraction and did not include the semantic features. [Conclusions] The proposed method has inter-disciplinary versatility and helps us explore the dissemination routes of interdisciplinary knowledge.

Key wordsFull Text of Academic Literature      Methodological Entities      Entity Extraction System      Entity Use Feature     
Received: 15 January 2019      Published: 25 November 2019
ZTFLH:  G250  
Corresponding Authors: Xuefang Zhu     E-mail: xfzhu@nju.edu.cn

Cite this article:

Hao Xu,Xuefang Zhu,Chengzhi Zhang,Chuan Jiang. System Analysis and Design for Methodological Entities Extraction in Full Text of Academic Literature. Data Analysis and Knowledge Discovery, 2019, 3(10): 29-36.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0069     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I10/29

序号 模式 序号 模式
1 use<>software 6 analysis be perform with<>
2 perform use<> 7 <>statistical software
3 be perform use<> 8 <> software
4 analysis be perform use<> 9 quantify use<>
5 analyze use<> 10 be calculate use<>
文本名称 特征句数量/准确率(%) 实体数量 标引次数/百分比(%) 特征句标引次数 实体标引占特征句标引比例(%)
S_0.txt 575/76.36 812 213/26.23 269 79.18
S_1.txt 602/76.30 829 209/25.21 257 81.32
S_2.txt 572/75.66 816 206/25.25 261 78.92
S_3.txt 595/75.32 843 215/25.50 266 80.83
S_4.txt 556/74.73 794 196/24.69 241 81.33
S_5.txt 626/77.28 892 219/24.55 268 81.72
S_6.txt 610/76.44 883 221/25.03 278 74.16
S_7.txt 595/76.38 869 214/24.63 276 77.54
S_8.txt 600/76.43 800 194/24.25 249 78.22
S_9.txt 618/76.67 916 223/24.34 299 74.58
序号 实体名称 提及次数 正式引用次数/引用率(%) 正式引用有效次数/有效率(%)
1 SPSS 376 7/1.86 1/14.29
2 Image J 269 38/14.13 29/76.32
3 GraphPad Prism 247 0/0.00 0/0.00
4 ANOVA 209 5/2.39 2/40.00
5 R 178 70/39.33 15/21.43
6 student 's t - test 147 3/2.04 2/66.67
7 SAS 142 9/6.34 2/22.22
8 Stata 113 14/12.39 2/14.29
9 MATLAB 105 25/23.81 18/72.00
10 FlowJo 91 4/4.40 4/100.00
11 BLAST 79 24/30.38 24/100.00
12 Primer 73 15/20.55 10/66.67
13 GraphPad software 56 0/0.00 0/0.00
14 EXCEL 56 25/44.64 1/4.00
15 MEGA 55 28/50.91 27/96.43
[1] 崔明, 潘雪莲, 华薇娜 . 我国图书情报领域的软件使用和引用研究[J]. 中国图书馆学报, 2018,44(3):68-78.
[1] ( Cui Ming, Pan Xuelian, Hua Weina . Software Usage and Citation in the Field of Library and Information Science in China[J]. Journal of Library Science in China, 2018,44(3):68-78.)
[2] Hafer L, Kirkpatrick A E . Assessing Open Source Software as a Scholarly Contribution[J]. Communications of the ACM, 2009,52(12):126-129.
[3] Piwowar H . Altmetrics: Value All Research Products[J]. Nature, 2013,493(7431):159.
[4] Research Excellence Framework. Output Information Requirements[EB/OL]. [ 2018- 11- 18]. .
[5] 孙建军, 裴雷, 蒋婷 . 面向学科领域的学术文献语义标注框架研究[J]. 情报学报, 2018,37(11):1077-1086.
[5] ( Sun Jianjun, Pei Lei, Jiang Ting . Research on Semantic Annotation in Academic Literature[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(11):1077-1086.)
[6] 王佳敏, 李信, 刘齐进 . 全文本文献计量分析学术沙龙综述[J]. 信息资源管理学报, 2018,8(4):119-125.
[6] ( Wang Jiamin, Li Xin, Liu Qijin . A Review of the Academic Salon on Full-text Bibliometric Analysis[J]. Journal of Information Resources Management, 2018,8(4):119-125.)
[7] Gupta S, Manning C D . Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers [C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 1-9.
[8] Kondo T, Nanba H, Takezawa T , et al. Technical Trend Analysis by Analyzing Research Papers’ Titles [C]// Proceedings of the 4th Language and Technology Conference. 2009: 512-521.
[9] 化柏林 . 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6):68-75.
[9] ( Hua Bolin . Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6):68-75.)
[10] Girju R, Beamer B, Rozovskaya A , et al. A Knowledge-Rich Approach to Identifying Semantic Relations Between Nominals[J]. Information Processing & Management, 2010,46(5):589-610.
[11] Pan X, Yan E, Wang Q , et al. Assessing the Impact of Software on Science: A Bootstrapped Learning of Software Entities in Full-Text Papers[J]. Journal of Informetrics, 2015,9(4):860-871.
[12] Nanba H, Kondo T, Takezawa T . Automatic Creation of a Technical Trend Map from Research Papers and Patents [C]// Proceedings of the 3rd International Workshop on Patent Information Retrieval. ACM, 2010: 11-16.
[13] Tsai C T, Kundu G, Roth D . Concept-Based Analysis of Scientific Literature [C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 1733-1738.
[14] Houngbo H, Mercer R E . Method Mention Extraction from Scientific Research Papers [C]// Proceedings of the 2012 International Conference on Computational Linguistics. 2012: 1211-1222.
[15] Guo Y, Silins I, Stenius U , et al. Active Learning-Based Information Structure Analysis of Full Scientific Articles and Two Applications for Biomedical Literature Review[J]. Bioinformatics, 2013,29(11):1440-1447.
[16] 钱力, 张晓林, 王茜 . 科技论文的研究设计指纹自动识别方法构建与实现[J]. 图书情报工作, 2018,62(2):135-143.
[16] ( Qian Li, Zhang Xiaolin, Wang Qian . Building and Implement on Automatic Identification Method of Research Design Fingerprint of Scientific Papers[J]. Library and Information Service, 2018,62(2):135-143.)
[17] 程齐凯 . 学术文本的词汇功能识别[D]. 武汉: 武汉大学, 2015.
[17] ( Cheng Qikai . Term Function Recognition from Academic Text[D]. Wuhan: Wuhan University, 2015.)
[18] 李信, 程齐凯, 刘兴帮 . 基于词汇功能识别的科研文献分析系统设计与实现[J]. 图书情报工作, 2017,61(1):109-116.
[18] ( Li Xin, Cheng Qikai, Liu Xingbang . Design and Implementation of Scientific Literature Analysis System Based on Term Function Recognition[J]. Library and Information Service, 2017,61(1):109-116.)
[19] Pettigrew K E, McKechnie L E F . The Use of Theory in Information Science Research[J]. Journal of the American Society for Information Science and Technology, 2001,52(1):62-73.
[20] 王芳, 陈锋, 祝娜 , 等. 我国情报学理论的来源、应用及学科专属度研究[J]. 情报学报, 2016,35(11):1148-1164.
[20] ( Wang Fang, Chen Feng, Zhu Na , et al. Theories of Information Science in China: Source, Uses and Discipline Exclusive Degrees[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(11):1148-1164.)
[21] 王芳, 祝娜, 翟羽佳 . 我国情报学研究中混合方法的应用及其领域分布分析[J]. 情报学报, 2017,36(11):1119-1129.
[21] ( Wang Fang, Zhu Na, Zhai Yujia . Application of Mixed Methods and Their Field Distribution in Information Science Research in China[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(11):1119-1129.)
[22] 徐浩, 钱爱兵, 朱学芳 , 等. 科学知识图谱绘制工具CiteSpace的学科领域扩散特征研究[J]. 情报杂志, 2017,36(5):69-74, 68.
[22] ( Xu Hao, Qian Aibing, Zhu Xuefang , et al. Discipline Diffusion Features of the Mapping Knowledge Domains Software: CiteSpace[J]. Journal of Intelligence, 2017,36(5):69-74,68.)
[23] JATS数据标准[EB/OL]. [ 2018- 11- 09]. .
[23] ( Journal Archiving and Interchange Tag Set[EB/OL]. [ 2018- 11- 09].
[1] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[2] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[3] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[4] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[5] Li He,Liu Jiayu,Li Shiyu,Wu Di,Jin Shuaiqi. Optimizing Automatic Question Answering System Based on Disease Knowledge Graph[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
[6] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[7] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[8] Hu Shaohu,Zhang Yingyi,Zhang Chengzhi. Review of Keyword Extraction Studies[J]. 数据分析与知识发现, 2021, 5(3): 45-59.
[9] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[10] Chang Zhijun,Qian Li,Xie Jing,Wu Zhenxin,Zhang Hu,Yu Qianqian,Wang Ying,Wang Yongji. Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J]. 数据分析与知识发现, 2021, 5(3): 69-77.
[11] Liu Tong, Liu Chen, Ni Weijian. A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation [J]. 数据分析与知识发现, 0, (): 1-.
[12] Wang Hongbin, Wang Jianxiong, Zhang Yafei, Yang Heng. Topic Recognition Research on Topic Imbalanced News Text Data Set [J]. 数据分析与知识发现, 0, (): 1-.
[13] Sifan Zhang, Zhendong Niu, Hao Lu, Yifan Zhu, Rongrong Wang. Graph Convolution Embedding and Feature Cross Based Literature Citation Prediction Method:Taking the Transportation Field as An Example [J]. 数据分析与知识发现, 0, (): 1-.
[14] Qi Ruihua, Jian Yue, Guo Xu, Guan Jinghua, Yang Mingxi. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism [J]. 数据分析与知识发现, 0, (): 1-.
[15] Li Jiao, Huang Yongwen, Luo Tingting, Zhao Ruixue, Xian Guojian. Automatic Classification based on Multi-factor Algorithm [J]. 数据分析与知识发现, 0, (): 1-.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn