Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (11): 26-42    DOI: 10.11925/infotech.2096-3467.2020.0364
Current Issue | Archive | Adv Search |
Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network
Qin Chenglei,Zhang Chengzhi()
School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
Download: PDF (7221 KB)   HTML ( 18
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new method using hierarchical attention network, aiming to effectively recognize structure functions of scholarly articles. [Methods] First, we constructed a network model with different-grained hierarchical attention to automatically identify the functions of text structures. Then, we examined the performance of our method with four datasets from PLoS. Same tests were also applied to traditional machine learning models with text feature vectors, as well as and Bert model. We also modified the proposed model in accordance with test results. Third, we evaluated the performance of the new model with articles from Atmospheric Chemistry and Physics and decided the compatibility of this model for other domains. [Results] At the sentence level, our model (using Bi-LSTM+Attention as the encoder) outperformed the others (Macro-F1: 0.866 1). However, this model did not perform well in un-related fields (minimum Macro-F1: 0.455 4). [Limitations] The model cannot recognize functions of mixed structure texts, as well as the logical relationship in these structures. [Conclusions] The proposed model could effectively recognize the structure functions at sentence level, which expands research of the full text scholarly literature.

Key wordsFunction Recognition of Academic Text Structure      Hierarchical Attention Network      IMRaD      Domain Adaptability Analysis     
Received: 27 April 2020      Published: 04 December 2020
ZTFLH:  TP393  
Corresponding Authors: Zhang Chengzhi     E-mail: zhangcz@njust.edu.cn

Cite this article:

Qin Chenglei,Zhang Chengzhi. Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network. Data Analysis and Knowledge Discovery, 2020, 4(11): 26-42.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0364     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I11/26

章节

示例
文章1 文章2 文章3 文章4 文章5
Section 1 Introduction and background Introduction Introduction Introduction Introduction
Section 2 Data In-situ measurements Model description and initialization Theory-The cloud center of gravity Experimental methods
Section 3 Analysis methods Methods Results Case study: aerosol effects on a warm convective cloud Results and discussion
Section 4 Characterisation of data sets Mixing time-scale in the subtropical tropopause layer Discussion and conclusions Summary Conclusions
Section 5 An example of “Hector”: 10 February 2006 TS mixing above the subtropical tropopause layer None None None
Section 6 Results Mixing time-scale at the tropical tropopause None None None
Section 7 Conclusions Conclusions None None None
Example of Section Title of Several Papers in the ACP
The Framework of Academic Text Structure Functional Recognition
期刊 论文数 时间段 Introduction
数量
Methods
数量
Results
数量
Discussions
数量
Others
数量 占比
PLoS Biology 2 976 2003-2019 2 951 2 915 2 625 2 618 286 2.57%
PLoS Medicine 1 740 2004-2019 1 728 1 721 1 714 1 711 137 1.99%
PLoS Genetics 7 268 2005-2019 7 268 7 241 6 700 6 698 639 2.29%
PLoS Computational Biology 5 851 2005-2019 5 851 5 443 5 229 5 107 452 2.09%
ACP 7 279 2001-2016 7 067 7 243 2 743 7 397 7 688 31.44%
Introduction to the Data of the PLoS and ACP
The Distribution of Section Titles in ACP
结构 功能 包含内容
引言(Introduction) 说明开展这项研究的原因 背景介绍,基本原理,研究目的,相关研究的总结、回顾
方法和材料(Materials & Methods) 应用了哪些方法、哪些材料被应用 研究设计、研究材料、影响评估
结果(Results) 阐述研究发现 进行分析,用文字、图、表等详细说明发现内容
讨论(Discussion) 探讨结果的意义 重述主要发现,阐明优势和不足,对其他研究的启示,未解决的问题与未来将要展开的研究,最后进行总结
Description of the Functionality of IMRaD[5]
结构功能 章节标题特征词
Introduction introduction, motivation, background, overview, review of literature
Materials & Methods system, theory, method, methods, methodology, model, models, framework, approach, approaches, methodologies, experimental, experiment, experiments, data, data and methods
Results result, results, analysis
Discussion discussion, discussions, conclusion, conclusions, summary, concluding, summary and conclusions
The Feature Words of IMRaD in this Paper
The Schematic Diagram of HAN (Encoder:Bi-LSTM+CNN+Attention)
模型 特征
向量
PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
SVM TF-IDF 0.652 5 0.508 7 0.440 6 0.602 6 0.586 2 0.563 9 0.665 0 0.536 1 0.462 1 0.444 8 0.461 2 0.365 7
CHI 0.492 3 0.387 5 0.394 1 0.486 0 0.394 9 0.346 4 0.442 8 0.381 1 0.376 8 0.661 0 0.612 8 0.553 2
IG 0.388 8 0.383 6 0.368 1 0.648 9 0.586 2 0.525 5 0.526 8 0.506 2 0.464 3 0.461 5 0.459 0 0.380 9
LR TF-IDF 0.503 5 0.424 7 0.352 4 0.588 1 0.568 2 0.562 8 0.514 8 0.499 8 0.459 5 0.319 7 0.373 8 0.301 7
CHI 0.494 0 0.471 8 0.417 8 0.457 5 0.457 0 0.427 7 0.466 3 0.454 2 0.393 7 0.589 2 0.498 9 0.420 9
IG 0.386 2 0.383 8 0.375 0 0.610 3 0.591 6 0.560 2 0.455 7 0.439 2 0.424 0 0.334 7 0.435 7 0.359 6
NB TF-IDF 0.496 7 0.488 5 0.434 5 0.516 7 0.515 1 0.483 1 0.555 4 0.520 7 0.477 4 0.421 5 0.431 6 0.365 7
CHI 0.654 4 0.617 7 0.605 7 0.572 9 0.531 1 0.486 2 0.536 4 0.555 2 0.488 1 0.726 4 0.679 6 0.633 3
IG 0.514 4 0.463 4 0.460 7 0.620 8 0.593 9 0.587 5 0.436 7 0.364 6 0.306 7 0.418 6 0.315 9 0.236 0
KNN TF-IDF 0.562 0 0.578 0 0.564 3 0.623 8 0.625 3 0.623 0 0.635 8 0.644 3 0.635 7 0.442 1 0.459 7 0.440 9
CHI 0.625 9 0.626 1 0.624 6 0.546 8 0.544 4 0.544 6 0.634 6 0.636 7 0.632 7 0.704 6 0.711 3 0.707 4
IG 0.607 0 0.609 9 0.607 1 0.694 4 0.694 7 0.694 3 0.585 7 0.588 8 0.585 5 0.488 5 0.492 8 0.487 5
XGB TF-IDF 0.712 8 0.715 5 0.711 4 0.694 1 0.695 5 0.694 4 0.744 1 0.744 1 0.742 7 0.539 3 0.550 6 0.533 0
CHI 0.743 9 0.744 7 0.743 7 0.623 2 0.615 3 0.614 5 0.739 7 0.740 5 0.739 0 0.766 0 0.771 3 0.763 5
IG 0.746 1 0.746 5 0.745 7 0.818 2 0.817 9 0.817 8 0.772 8 0.771 9 0.770 2 0.598 5 0.600 9 0.597 1
The Results of the Academic Text Structure Recognition of PLoS Using the Traditional Machine Learning Model
PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
0.919 8 0.919 4 0.919 4 0.973 2 0.973 1 0.973 1 0.929 2 0.929 2 0.929 1 0.837 8 0.838 6 0.838 1
The Results of the Academic Text Structure Function Recognition Using Bert
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
CNN 0.968 6 0.968 9 0.968 7 0.972 5 0.972 6 0.972 4 0.986 5 0.986 4 0.986 4 0.935 4 0.931 4 0.932 1
CNN_Multi Filters 0.976 1 0.975 5 0.975 6 0.971 1 0.970 9 0.971 0 0.986 9 0.987 1 0.986 9 0.947 3 0.946 1 0.946 5
LSTM 0.869 8 0.869 5 0.868 2 0.892 0 0.893 6 0.892 3 0.985 9 0.986 5 0.986 2 0.944 9 0.945 2 0.944 1
LSTM+CNN 0.973 4 0.973 7 0.973 5 0.964 1 0.963 8 0.963 9 0.986 6 0.987 0 0.986 8 0.950 9 0.951 3 0.951 0
LSTM+Attention 0.970 0 0.970 8 0.970 2 0.975 9 0.975 9 0.975 8 0.992 0 0.991 9 0.991 9 0.940 7 0.941 3 0.940 8
LSTM+CNN
+Attention
0.980 4 0.980 6 0.980 4 0.982 8 0.982 7 0.982 7 0.990 4 0.990 3 0.990 3 0.955 5 0.956 8 0.955 9
Bi-LSTM 0.828 5 0.855 7 0.818 2 0.891 7 0.894 2 0.890 9 0.976 6 0.976 4 0.976 1 0.919 9 0.927 1 0.920 8
Bi-LSTM+CNN 0.974 3 0.975 3 0.974 7 0.976 1 0.975 5 0.975 7 0.986 8 0.987 2 0.986 9 0.948 5 0.951 5 0.949 1
Bi-LSTM+Attention 0.970 9 0.971 1 0.970 7 0.973 6 0.974 0 0.973 7 0.988 0 0.988 0 0.988 0 0.950 6 0.950 1 0.950 3
Bi-LSTM+CNN
+Attention
0.981 3 0.980 4 0.980 7 0.982 6 0.982 8 0.982 6 0.991 5 0.992 0 0.991 7 0.962 1 0.961 7 0.961 8
The Results of the Academic Text Structure Function Recognition of PLoS Using Word-Level HAN
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
LSTM 0.961 7 0.961 9 0.961 7 0.954 2 0.949 5 0.949 0 0.985 9 0.985 8 0.985 8 0.925 5 0.925 7 0.925 1
LSTM+CNN 0.975 8 0.977 1 0.976 4 0.982 2 0.981 7 0.981 9 0.989 5 0.989 7 0.989 6 0.935 2 0.935 2 0.935 2
LSTM+Attention 0.980 9 0.980 9 0.980 9 0.986 2 0.986 1 0.986 2 0.989 2 0.989 4 0.989 3 0.936 2 0.935 3 0.935 6
LSTM+CNN
+Attention
0.968 7 0.968 0 0.968 2 0.983 4 0.983 3 0.983 3 0.983 3 0.983 4 0.983 3 0.931 5 0.930 4 0.930 4
Bi-LSTM 0.971 6 0.970 8 0.971 0 0.960 6 0.957 4 0.957 0 0.976 7 0.974 4 0.975 0 0.938 7 0.939 2 0.938 5
Bi-LSTM+CNN 0.979 5 0.979 8 0.979 6 0.988 7 0.988 3 0.988 7 0.994 1 0.994 0 0.994 0 0.912 4 0.903 1 0.903 9
Bi-LSTM+Attention 0.977 9 0.977 5 0.977 7 0.954 3 0.954 5 0.952 8 0.985 7 0.985 4 0.985 4 0.956 6 0.956 5 0.956 5
Bi-LSTM+ CNN +Attention 0.981 5 0.982 0 0.981 6 0.988 9 0.988 1 0.988 3 0.989 9 0.990 2 0.990 0 0.952 7 0.952 3 0.952 5
The Results of the Academic Text Structure Function Recognition of PLoS Using Sentence-Level HAN
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
LSTM+CNN 0.959 7 0.959 5 0.959 5 0.964 5 0.964 3 0.964 3 0.978 5 0.978 6 0.978 5 0.899 5 0.892 5 0.893 6
LSTM+Attention 0.952 0 0.949 7 0.950 4 0.962 5 0.962 3 0.962 3 0.982 3 0.982 4 0.982 3 0.896 4 0.896 0 0.896 1
LSTM+CNN
+Attention
0.945 5 0.944 3 0.943 7 0.954 1 0.951 2 0.951 6 0.980 3 0.979 5 0.979 8 0.895 4 0.896 2 0.895 6
Bi-LSTM+CNN 0.965 0 0.964 8 0.964 9 0.960 8 0.959 7 0.959 9 0.981 2 0.981 0 0.981 1 0.885 8 0.863 7 0.859 9
Bi-LSTM+Attention 0.961 5 0.961 4 0.961 4 0.973 2 0.972 7 0.972 8 0.983 3 0.983 3 0.983 3 0.918 1 0.917 6 0.917 8
Bi-LSTM+CNN
+Attention
0.971 9 0.972 4 0.972 1 0.961 5 0.961 2 0.960 8 0.983 4 0.982 8 0.983 1 0.915 3 0.913 6 0.914 1
The Results of the Academic Text Structure Function Recognition of PLoS Using Paragraph-Level HAN
The Performance of Five Models
模型 Macro-P Macro-R Macro-F1 Accuracy-I Accuracy-M Accuracy-R Accuracy-D
Baseline-1 XGB TF-IDF 0.528 9 0.531 9 0.523 4 0.540 9 0.308 3 0.660 5 0.618 0
CHI 0.788 0 0.786 9 0.785 9 0.920 3 0.730 3 0.656 3 0.840 7
IG 0.604 6 0.608 2 0.604 8 0.731 7 0.481 7 0.585 9 0.633 4
Baseline-2 Bert Model 0.848 0 0.849 1 0.848 2 0.936 8 0.768 4 0.812 5 0.878 7
单词级HAN Bi-LSTM+CNN 0.826 0 0.829 4 0.826 6 0.858 0 0.798 8 0.766 3 0.880 9
Bi-LSTM+Attention 0.851 3 0.852 5 0.851 1 0.917 7 0.780 0 0.868 5 0.839 0
Bi-LSTM+CNN+Attention 0.836 2 0.836 5 0.836 2 0.880 2 0.812 9 0.808 1 0.843 8
句子级HAN Bi-LSTM+CNN 0.813 7 0.798 5 0.788 5 0.961 4 0.836 7 0.506 8 0.889 1
Bi-LSTM+Attention 0.866 8 0.866 6 0.866 1 0.967 2 0.818 6 0.800 4 0.880 1
Bi-LSTM+CNN+Attention 0.853 3 0.849 8 0.848 8 0.953 7 0.874 5 0.711 7 0.859 2
段落级HAN Bi-LSTM+CNN 0.840 8 0.839 8 0.839 6 0.919 6 0.825 3 0.811 3 0.803 2
Bi-LSTM+Attention 0.861 6 0.860 1 0.860 4 0.912 3 0.824 6 0.851 9 0.851 5
Bi-LSTM+CNN+Attention 0.786 6 0.747 2 0.741 7 0.656 1 0.529 3 0.884 2 0.919 4
Training Results of Optimal Models in ACP with Clear Section Functionality
The Distribution of Irregular Data Title Numbers and Prediction Labels in ACP
The Distribution of References and the Clue Word “use” in IMRaD(e.g. PLoS Biology
项目 DTW 余弦距离 欧氏距离 K-S检验(p-value)
I M R D I M R D I M R D I M R D
参考文献 0.000 1 0.000 1 0.000 1 0.000 1 0.008 2 0.012 8 0.012 2 0.009 2 0.004 7 0.002 2 0.006 5 0.004 3 0.676 6 0.794 2 0.260 6 0.443 1
动作线索词“use” 0.000 1 0.000 2 0.000 1 0.000 1 0.028 7 0.009 2 0.015 9 0.030 0 0.003 8 0.008 8 0.006 3 0.004 1 0.140 0 0.556 0 0.140 0 0.882 8
The Distribution Similarity of Reference, the Cue Word “use” and the Result of K-S Test
The Distribution of References in Train Set and Test Set
DTW 余弦距离 欧氏距离 K-S检验(p-value)
I M R D I M R D I M R D I M R D
0.000 3 0.000 2 0.000 2 0.000 4 0.064 8 0.033 2 0.046 3 0.145 6 0.012 2 0.010 1 0.008 2 0.012 4 0.140 0 0.047 0 0.443 1 0.099 4
The Distribution Similarity of References in the Two Data Sets and the Result of K-S Test
The Distribution Similarity of References after Replacing the Results Section with the New Materials & Methods and Discussion
extra_Materials&Methods_train
DTW 余弦距离 欧氏距离 K-S 检验(p-value)
Materials & Methods_train 0.000 2 0.030 5 0.007 9 0.676 6
extra_Materials & Methods_predict 0.000 2 0.036 2 0.008 7 0.260 6
The Distribution Similarity with extra_Materials & Methods and the Result of K-S Test
extra_Discussion _train
DTW 余弦距离 欧氏距离 K-S检验(p-value)
Discussion_train 0.000 1 0.093 4 0.005 2 0.099 4
extra_Discussion_predict 0.000 2 0.108 9 0.007 8 0.443 1
The Distribution Similarity with extra_Discussion and the Result of K-S Test
The Distribution of Cue Words in IMRaD in Train Set and Test Set
单词 DTW 余弦距离 欧氏距离 K-S 检验(p-value)
I M R D I M R D I M R D I M R D
show 0.000 2 0.000 2 0.000 5 0.000 2 0.103 5 0.054 4 0.034 0 0.093 4 0.007 1 0.009 0 0.019 1 0.007 7 0.343 9 0.193 0 0.013 1 0.535 8
use 0.000 2 0.000 2 0.000 2 0.000 2 0.047 8 0.017 8 0.037 6 0.101 5 0.007 4 0.007 4 0.010 0 0.008 8 0.140 0 0.556 0 0.443 1 0.093 4
perform 0.000 4 0.000 6 0.000 4 0.000 3 0.349 4 0.124 2 0.106 7 0.273 9 0.019 7 0.025 9 0.016 6 0.015 3 0.013 1 0.031 4 0.013 1 0.008 2
follow 0.000 5 0.000 5 0.000 4 0.000 3 0.232 6 0.125 4 0.121 0 0.292 8 0.017 2 0.020 2 0.018 7 0.011 6 0.003 0 0.003 0 0.069 1 0.003 0
find 0.000 4 0.000 4 0.000 5 0.000 3 0.186 3 0.153 4 0.126 8 0.171 7 0.015 3 0.014 5 0.021 5 0.014 0 0.260 6 0.005 0 0.260 6 0.193 0
report 0.000 5 0.000 4 0.000 7 0.000 5 0.301 9 0.259 3 0.242 8 0.268 0 0.024 3 0.018 4 0.030 5 0.021 3 0.000 2 0.013 1 0.020 5 0.000 3
suggest 0.000 5 0.000 3 0.000 6 0.000 5 0.297 6 0.384 3 0.172 2 0.239 6 0.022 3 0.014 6 0.026 8 0.020 7 0.069 1 0.000 0 0.047 0 0.000 6
include 0.000 6 0.000 5 0.000 4 0.000 3 0.298 4 0.161 3 0.119 6 0.275 5 0.022 4 0.025 0 0.016 5 0.012 7 0.031 4 0.008 2 0.099 4 0.140 0
The Distribution Similarity of Cue Words and the Result of K-S Test
数据 PLoS Biology PLoS Medicine PLoS Genetics PLoS Comp. Biol. ACP
Macro-
P
Macro-
R
Macro-F1 Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
PLoS Biology - - - 0.964 0 0.962 9 0.962 8 0.983 8 0.983 3 0.983 5 0.892 5 0.883 3 0.883 1 0.660 6 0.504 1 0.459 4
PLoS Medicine 0.943 2 0.940 8 0.941 2 - - - 0.945 1 0.941 5 0.942 0 0.831 6 0.765 7 0.763 0 0.520 3 0.500 5 0.455 4
PLoS Genetics 0.979 4 0.980 2 0.979 6 0.957 6 0.956 6 0.956 4 - - - 0.927 0 0.927 7 0.926 9 0.560 5 0.565 3 0.526 4
PLoS Com. Biol. 0.977 2 0.976 7 0.976 9 0.967 9 0.967 6 0.967 6 0.984 8 0.984 0 0.984 3 - - - 0.691 5 0.584 4 0.526 4
ACP 0.714 4 0.643 6 0.581 7 0.700 8 0.585 5 0.553 3 0.712 0 0.627 8 0.559 3 0.690 7 0.607 7 0.558 8 - - -
Analysis of Adaptation of HAN (e.g. Bi-LSTM+CNN+Attention)
[1] Norris M, Oppenheim C, Rowland F. The Citation Advantage of Open-Access Articles[J]. Journal of the Association for Information Science & Technology, 2014,59(12):1963-1972.
[2] Wang X M, Liu C, Mao W L, et al. The Open Access Advantage Considering Citation, Article Usage and Social Media Attention[J]. Scientometrics, 2015,103(2):555-564.
[3] Haustein S, Piwowar H A, Priem J, et al. Data From: The State of OA: A Large-Scale Analysis of the Prevalence and Impact of Open Access Articles[J]. PeerJ, 2018,6(4):e4375.
[4] Zhang R, Guo J F, Fan Y X, et al. Outline Generation: Understanding the Inherent Content Structure of Documents[OL]. arXiv Preprint, arXiv: 1905. 10039.
[5] Sollaci L B, Pereira M G. The Introduction, Methods, Results, and Discussion (IMRAD) Structure: A Fifty-Year Survey[J]. Journal of the Medical Library Association, 2004,92(3):364.
pmid: 15243643
[6] Bertin M, Atanassova I, Gingras Y, et al. The Invariant Distribution of References in Scientific Articles[J]. Journal of the Association for Information Science & Technology, 2016,67(1):164-177.
[7] Bertin M, Atanassova I, Sugimoto C R, et al. The Linguistic Patterns and Rhetorical Structure of Citation Context:An Approach Using N-Grams[J]. Scientometrics, 2016,109(3):1417-1434.
[8] Hu Z G, Chen C M, Liu Z Y. Where are Citations Located in the Body of Scientific Articles? A Study of the Distributions of Citation Locations[J]. Journal of Informetrics, 2013,7(4):887-896.
doi: 10.1016/j.joi.2013.08.005
[9] Nair P K R, Nair V D. Scientific Writing and Communication in Agriculture and Natural Resources[M]. Springer, 2014.
[10] International Committee of Medical Journal Editors. Uniform Requirements for Manuscripts Submitted to Biomedical Journals[J]. The New England Journal of Medicine, 1991,324(6):424-428.
pmid: 1987468
[11] Devlin J, Chang M W, Lee K, et al. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810. 04805.
[12] Noriko K. Text-Level Structure of Research Papers: Implications for Text-Based Information Processing Systems[C]// Proceedings of the 19th Annual BCS-IRSG Conference on Information Retrieval Research, Swindon, United Kingdom. 1997: 1-14.
[13] Santiago P. The Schematic Structure of Computer Science Research Articles[J]. English for Specific Purposes, 1999,18(2):139-160.
[14] Budsaba K. Rhetorical Structure of Biochemistry Research Articles[J]. English for Specific Purposes, 2005,24(3):269-292.
doi: 10.1016/j.esp.2004.08.003
[15] McKnight L, Srinivasan P. Categorization of Sentence Types in Medical Abstracts[C]// Proceedings of AMIA Annual Symposium,Washington, DC, USA. 2003: 440-444.
[16] Mizuta Y, Korhonen A, Mullen T, et al. Zone Analysis in Biology Articles as a Basis for Information Extraction[J]. International Journal of Medical Informatics, 2006,75(6):468-487.
doi: 10.1016/j.ijmedinf.2005.06.013 pmid: 16112609
[17] Agarwal S, Yu H. Automatically Classifying Sentences in Full-Text Biomedical Articles into Introduction, Methods, Results and Discussion[J]. Bioinformatics, 2009,25(23):3174-3180.
doi: 10.1093/bioinformatics/btp548 pmid: 19783830
[18] Ribeiro S S, Yao J T, Rezende D A. Discovering IMRAD Structure with Different Classifiers[C]// Proceedings of 2018 IEEE International Conference on Big Knowledge (ICBK,Singapore. 2018: 200-204.
[19] 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014,33(9):979-985.
[19] ( Lu Wei, Huang Yong, Cheng Qikai. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(9):979-985.)
[20] 黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016,35(3):293-300.
[20] ( Huang Yong, Lu Wei, Cheng Qikai. The Structure Function Recognition of Academic Text Chapter Content Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(3):293-300.)
[21] 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016,35(5):530-538.
[21] ( Huang Yong, Lu Wei, Cheng Qikai, et al. The Structure Function Recognition of Academic Text Paragraph-Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(5):530-538.)
[22] 王东波, 高瑞卿, 叶文豪, 等. 不同特征下的学术文本结构功能自动识别研究[J]. 情报学报, 2018,37(10):997-1008.
[22] ( Wang Dongbo, Gao Ruiqing, Ye Wenhao, et al. Research on the Structure Recognition of Academic Texts Under Different Characteristics[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(10):997-1008.)
[23] 王佳敏, 陆伟, 刘家伟, 等. 多层次融合的学术文本结构功能识别研究[J]. 图书情报工作, 2019,63(13):95-104.
[23] ( Wang Jiamin, Lu Wei, Liu Jiawei, et al. Research on Structure Function Recognition of Academic Text Based on Multi-Level Fusion[J]. Library and Information Service, 2019,63(13):95-104.)
[24] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[OL]. arXiv Preprint, arXiv: 1607. 01759.
[25] Pappas N, Popescu-Belis A. Multilingual Hierarchical Attention Networks for Document Classification[OL]. arXiv Preprint, arXiv: 1707. 00896.
[26] Zhang X, Zhao J B, Lecun Y. Character-Level Convolutional Networks for Text Classification[OL]. arXiv Preprint, arXiv: 1509. 01626.
[27] Lee J Y, Dernoncourt F. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks[OL]. arXiv Preprint, arXiv: 1603. 03827.
[28] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[OL]. arXiv Preprint, arXiv: 1706. 03762.
[29] Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vancouver, Canada. 2017: 1480-1489.
[30] Giorgino T. Computing and Visualizing Dynamic Time Warping Alignments in R: The DTW Package[J]. Journal of Statistical Software, 2009,31(7):1-25.
[31] Massey F J. The Kolmogorov-Smirnov Test for Goodness of Fit[J]. Publications of the American Statistical Association, 1951,46(253):68-78.
[32] Yang Y M. An Evaluation of Statistical Approaches to Text Categorization[J]. Information Retrieval, 1999,1(1/2):69-90.
doi: 10.1023/A:1009982220290
[33] Cherkassky V, Ma Y Q. Practical Selection of SVM Parameters and Noise Estimation for SVM Regression[J]. Neural Networks, 2004,17(1):113-126.
doi: 10.1016/S0893-6080(03)00169-2 pmid: 14690712
[34] Hernández-Lobato J M, Hernández-Lobato D, Suárez A. Expectation Propagation in Linear Regression Models with Spike- and-Slab Priors[J]. Machine Learning, 2015,99(3):437-487.
doi: 10.1007/s10994-014-5475-7
[35] Sebe N, Lew M S, Cohen I, et al. Emotion Recognition Using a Cauchy Naive Bayes Classifier[C]// Proceedings of the 16th International Conference on Pattern Recognition, Quebec, Canada. 2002: 17-20.
[36] Zhang H, Berg A C, Michael B, et al. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition[C]// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,New York, USA. 2006: 17-22.
[37] Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA. 2016: 785-794.
[38] Wu H C, Luk K, Wong K F, et al. Interpreting TF-IDF Term Weights as Making Relevance Decisions[J]. ACM Transactions on Information System, 2008,26(3):1-37.
[39] Satorra A, Bentler P M. A Scaled Difference Chi-Square Test Statistic for Moment Structure Analysis[J]. Psychometrika, 2001,66(4):507-514.
doi: 10.1007/BF02296192
[40] Kent J T. Information Gain and a General Measure of Correlation[J]. Biometrika, 1983,70(1):163-173.
doi: 10.1093/biomet/70.1.163
[41] Rigby A S. Statistical Methods in Epidemiology. v. Towards an Understanding of the Kappa Coefficient[J]. Disability & Rehabilitation, 2000,22(8):339-344.
doi: 10.1080/096382800296575 pmid: 10896093
[42] Gannon T, Madnick S E, Moulton A, et al. Framework for the Analysis of the Adaptability, Extensibility, and Scalability of Semantic Information Integration and the Context Mediation Approach[C]// Proceedings of the 42nd Hawaii International Conference on System Sciences. 2009: 1-11.
[1] Li Jiao,Huang Yongwen,Luo Tingting,Zhao Ruixue,Xian Guojian. Automatic Classification Method Based on Multi-factor Algorithm[J]. 数据分析与知识发现, 2020, 4(11): 43-51.
[2] Wang Xiwei,Zhang Liu,Huang Bo,Wei Ya’nan. Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[3] Ding Heng,Li Yingxuan. Improving Online Q&A Service with Deep Learning[J]. 数据分析与知识发现, 2020, 4(10): 37-46.
[4] Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[5] Xu Tongtong,Sun Huazhi,Ma Chunmei,Jiang Lifen,Liu Yichen. Classification Model for Few-shot Texts Based on Bi-directional Long-term Attention Features[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[6] Tao Yue,Yu Li,Zhang Runjie. Active Learning Strategies for Extracting Phrase-Level Topics from Scientific Literature[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[7] Zhang Chunjin, Guo Shenghui, Ji Shujuan, Yang Wei, Yi Lei . The Group recommendation algorithms based on implicit representation learning of multi-attribute ratings [J]. 数据分析与知识发现, 0, (): 1-.
[8] Sifan Zhang, Zhendong Niu, Hao Lu, Yifan Zhu, Rongrong Wang. Graph Convolution Embedding and Feature Cross Based Literature Citation Prediction Method:Taking the Transportation Field as An Example [J]. 数据分析与知识发现, 0, (): 1-.
[9] Zhang Sifan,Niu Zhendong,Lu Hao,Zhu Yifan,Wang Rongrong. Predicting Citations Based on Graph Convolution Embedding and Feature Cross:Case Study of Transportation Research[J]. 数据分析与知识发现, 2020, 4(9): 56-67.
[10] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[11] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[12] Huang Lu,Zhou Enguo,Li Daifeng. Text Representation Learning Model Based on Attention Mechanism with Task-specific Information[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[13] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[14] Shen Zhe, Wang Yi, Yao Yifan, Cheng Ying. Author Name Disambiguation Techniques for Academic Literature: A Review[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[15] Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn