Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (11): 26-42     https://doi.org/10.11925/infotech.2096-3467.2020.0364
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于层次注意力网络模型的学术文本结构功能识别*
秦成磊,章成志()
南京理工大学经济管理学院 南京 210094
Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network
Qin Chenglei,Zhang Chengzhi()
School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
全文: PDF (7221 KB)   HTML ( 23
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对当前学术文本章节功能识别存在诸多不足的现状,提出使用层次注意力网络模型提升学术文本章节功能识别的效果。【方法】 首先,构建能够捕获章节结构信息的不同粒度的层次注意力网络模型,对比分析使用不同文本特征向量的传统机器学习模型、Bert模型与层次注意力网络模型在PLoS的4种期刊规范数据集上的学术文本结构功能的识别结果以获取最佳模型;随后,使用最佳模型识别Atmospheric Chemistry and Physics(ACP,IF 5.6)期刊中章节标题命名缺乏规范且人工标注结构功能一致性较低的章节的结构功能,并提出使用参考文献分布相似、动词线索词分布相似评估识别结果;最后,对所构建的层次注意力网络模型的领域适应性进行分析。【结果】 以Bi-LSTM+Attention为编码器的句子级层次注意力网络模型识别效果优于其他模型,Macro-F1值为0.866 1;存在领域适应问题,在差异较大的领域中模型识别性能下降明显,Macro-F1值最低为0.455 4。【局限】 不能识别具有混合结构的章节的功能;模型中未考虑文章结构之间的逻辑关系。【结论】 句子级层次注意力网络模型能够较好地识别章节的结构功能,引入学术文本结构信息能够丰富和拓展基于学术论文全文本相关研究的研究内容与范围。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
秦成磊
章成志
关键词 学术文本结构功能识别层次注意力网络IMRaD领域适应性分析    
Abstract

[Objective] This paper proposes a new method using hierarchical attention network, aiming to effectively recognize structure functions of scholarly articles. [Methods] First, we constructed a network model with different-grained hierarchical attention to automatically identify the functions of text structures. Then, we examined the performance of our method with four datasets from PLoS. Same tests were also applied to traditional machine learning models with text feature vectors, as well as and Bert model. We also modified the proposed model in accordance with test results. Third, we evaluated the performance of the new model with articles from Atmospheric Chemistry and Physics and decided the compatibility of this model for other domains. [Results] At the sentence level, our model (using Bi-LSTM+Attention as the encoder) outperformed the others (Macro-F1: 0.866 1). However, this model did not perform well in un-related fields (minimum Macro-F1: 0.455 4). [Limitations] The model cannot recognize functions of mixed structure texts, as well as the logical relationship in these structures. [Conclusions] The proposed model could effectively recognize the structure functions at sentence level, which expands research of the full text scholarly literature.

Key wordsFunction Recognition of Academic Text Structure    Hierarchical Attention Network    IMRaD    Domain Adaptability Analysis
收稿日期: 2020-04-27      出版日期: 2020-12-04
ZTFLH:  TP393  
基金资助:*本文系中国科学技术信息研究所情报工程实验室开放基金项目“大数据环境下同行评议特征要素分析与关键问题研究”的研究成果之一
通讯作者: 章成志     E-mail: zhangcz@njust.edu.cn
引用本文:   
秦成磊,章成志. 基于层次注意力网络模型的学术文本结构功能识别*[J]. 数据分析与知识发现, 2020, 4(11): 26-42.
Qin Chenglei,Zhang Chengzhi. Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network. Data Analysis and Knowledge Discovery, 2020, 4(11): 26-42.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0364      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I11/26
章节

示例
文章1 文章2 文章3 文章4 文章5
Section 1 Introduction and background Introduction Introduction Introduction Introduction
Section 2 Data In-situ measurements Model description and initialization Theory-The cloud center of gravity Experimental methods
Section 3 Analysis methods Methods Results Case study: aerosol effects on a warm convective cloud Results and discussion
Section 4 Characterisation of data sets Mixing time-scale in the subtropical tropopause layer Discussion and conclusions Summary Conclusions
Section 5 An example of “Hector”: 10 February 2006 TS mixing above the subtropical tropopause layer None None None
Section 6 Results Mixing time-scale at the tropical tropopause None None None
Section 7 Conclusions Conclusions None None None
Table 1  ACP期刊中部分论文章节标题示例
Fig.1  学术文本结构功能识别研究框架图
期刊 论文数 时间段 Introduction
数量
Methods
数量
Results
数量
Discussions
数量
Others
数量 占比
PLoS Biology 2 976 2003-2019 2 951 2 915 2 625 2 618 286 2.57%
PLoS Medicine 1 740 2004-2019 1 728 1 721 1 714 1 711 137 1.99%
PLoS Genetics 7 268 2005-2019 7 268 7 241 6 700 6 698 639 2.29%
PLoS Computational Biology 5 851 2005-2019 5 851 5 443 5 229 5 107 452 2.09%
ACP 7 279 2001-2016 7 067 7 243 2 743 7 397 7 688 31.44%
Table 2  PLoS系列期刊与ACP期刊数据介绍
Fig.2  ACP数据中章节标题分布情况
结构 功能 包含内容
引言(Introduction) 说明开展这项研究的原因 背景介绍,基本原理,研究目的,相关研究的总结、回顾
方法和材料(Materials & Methods) 应用了哪些方法、哪些材料被应用 研究设计、研究材料、影响评估
结果(Results) 阐述研究发现 进行分析,用文字、图、表等详细说明发现内容
讨论(Discussion) 探讨结果的意义 重述主要发现,阐明优势和不足,对其他研究的启示,未解决的问题与未来将要展开的研究,最后进行总结
Table 3  IMRaD结构功能说明[5]
结构功能 章节标题特征词
Introduction introduction, motivation, background, overview, review of literature
Materials & Methods system, theory, method, methods, methodology, model, models, framework, approach, approaches, methodologies, experimental, experiment, experiments, data, data and methods
Results result, results, analysis
Discussion discussion, discussions, conclusion, conclusions, summary, concluding, summary and conclusions
Table 4  结构功能与之对应的章节标题特征词
Fig.3  层次注意力网络结构示意图(以编码器为Bi-LSTM+CNN+Attention为例)
模型 特征
向量
PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
SVM TF-IDF 0.652 5 0.508 7 0.440 6 0.602 6 0.586 2 0.563 9 0.665 0 0.536 1 0.462 1 0.444 8 0.461 2 0.365 7
CHI 0.492 3 0.387 5 0.394 1 0.486 0 0.394 9 0.346 4 0.442 8 0.381 1 0.376 8 0.661 0 0.612 8 0.553 2
IG 0.388 8 0.383 6 0.368 1 0.648 9 0.586 2 0.525 5 0.526 8 0.506 2 0.464 3 0.461 5 0.459 0 0.380 9
LR TF-IDF 0.503 5 0.424 7 0.352 4 0.588 1 0.568 2 0.562 8 0.514 8 0.499 8 0.459 5 0.319 7 0.373 8 0.301 7
CHI 0.494 0 0.471 8 0.417 8 0.457 5 0.457 0 0.427 7 0.466 3 0.454 2 0.393 7 0.589 2 0.498 9 0.420 9
IG 0.386 2 0.383 8 0.375 0 0.610 3 0.591 6 0.560 2 0.455 7 0.439 2 0.424 0 0.334 7 0.435 7 0.359 6
NB TF-IDF 0.496 7 0.488 5 0.434 5 0.516 7 0.515 1 0.483 1 0.555 4 0.520 7 0.477 4 0.421 5 0.431 6 0.365 7
CHI 0.654 4 0.617 7 0.605 7 0.572 9 0.531 1 0.486 2 0.536 4 0.555 2 0.488 1 0.726 4 0.679 6 0.633 3
IG 0.514 4 0.463 4 0.460 7 0.620 8 0.593 9 0.587 5 0.436 7 0.364 6 0.306 7 0.418 6 0.315 9 0.236 0
KNN TF-IDF 0.562 0 0.578 0 0.564 3 0.623 8 0.625 3 0.623 0 0.635 8 0.644 3 0.635 7 0.442 1 0.459 7 0.440 9
CHI 0.625 9 0.626 1 0.624 6 0.546 8 0.544 4 0.544 6 0.634 6 0.636 7 0.632 7 0.704 6 0.711 3 0.707 4
IG 0.607 0 0.609 9 0.607 1 0.694 4 0.694 7 0.694 3 0.585 7 0.588 8 0.585 5 0.488 5 0.492 8 0.487 5
XGB TF-IDF 0.712 8 0.715 5 0.711 4 0.694 1 0.695 5 0.694 4 0.744 1 0.744 1 0.742 7 0.539 3 0.550 6 0.533 0
CHI 0.743 9 0.744 7 0.743 7 0.623 2 0.615 3 0.614 5 0.739 7 0.740 5 0.739 0 0.766 0 0.771 3 0.763 5
IG 0.746 1 0.746 5 0.745 7 0.818 2 0.817 9 0.817 8 0.772 8 0.771 9 0.770 2 0.598 5 0.600 9 0.597 1
Table 5  传统机器学习模型下PLoS系列期刊学术文本结构功能识别结果
PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
0.919 8 0.919 4 0.919 4 0.973 2 0.973 1 0.973 1 0.929 2 0.929 2 0.929 1 0.837 8 0.838 6 0.838 1
Table 6  基于Bert模型的学术文本结构功能识别结果
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
CNN 0.968 6 0.968 9 0.968 7 0.972 5 0.972 6 0.972 4 0.986 5 0.986 4 0.986 4 0.935 4 0.931 4 0.932 1
CNN_Multi Filters 0.976 1 0.975 5 0.975 6 0.971 1 0.970 9 0.971 0 0.986 9 0.987 1 0.986 9 0.947 3 0.946 1 0.946 5
LSTM 0.869 8 0.869 5 0.868 2 0.892 0 0.893 6 0.892 3 0.985 9 0.986 5 0.986 2 0.944 9 0.945 2 0.944 1
LSTM+CNN 0.973 4 0.973 7 0.973 5 0.964 1 0.963 8 0.963 9 0.986 6 0.987 0 0.986 8 0.950 9 0.951 3 0.951 0
LSTM+Attention 0.970 0 0.970 8 0.970 2 0.975 9 0.975 9 0.975 8 0.992 0 0.991 9 0.991 9 0.940 7 0.941 3 0.940 8
LSTM+CNN
+Attention
0.980 4 0.980 6 0.980 4 0.982 8 0.982 7 0.982 7 0.990 4 0.990 3 0.990 3 0.955 5 0.956 8 0.955 9
Bi-LSTM 0.828 5 0.855 7 0.818 2 0.891 7 0.894 2 0.890 9 0.976 6 0.976 4 0.976 1 0.919 9 0.927 1 0.920 8
Bi-LSTM+CNN 0.974 3 0.975 3 0.974 7 0.976 1 0.975 5 0.975 7 0.986 8 0.987 2 0.986 9 0.948 5 0.951 5 0.949 1
Bi-LSTM+Attention 0.970 9 0.971 1 0.970 7 0.973 6 0.974 0 0.973 7 0.988 0 0.988 0 0.988 0 0.950 6 0.950 1 0.950 3
Bi-LSTM+CNN
+Attention
0.981 3 0.980 4 0.980 7 0.982 6 0.982 8 0.982 6 0.991 5 0.992 0 0.991 7 0.962 1 0.961 7 0.961 8
Table 7  单词级层次注意力网络对PLoS系列期刊学术文本结构功能的识别结果
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
LSTM 0.961 7 0.961 9 0.961 7 0.954 2 0.949 5 0.949 0 0.985 9 0.985 8 0.985 8 0.925 5 0.925 7 0.925 1
LSTM+CNN 0.975 8 0.977 1 0.976 4 0.982 2 0.981 7 0.981 9 0.989 5 0.989 7 0.989 6 0.935 2 0.935 2 0.935 2
LSTM+Attention 0.980 9 0.980 9 0.980 9 0.986 2 0.986 1 0.986 2 0.989 2 0.989 4 0.989 3 0.936 2 0.935 3 0.935 6
LSTM+CNN
+Attention
0.968 7 0.968 0 0.968 2 0.983 4 0.983 3 0.983 3 0.983 3 0.983 4 0.983 3 0.931 5 0.930 4 0.930 4
Bi-LSTM 0.971 6 0.970 8 0.971 0 0.960 6 0.957 4 0.957 0 0.976 7 0.974 4 0.975 0 0.938 7 0.939 2 0.938 5
Bi-LSTM+CNN 0.979 5 0.979 8 0.979 6 0.988 7 0.988 3 0.988 7 0.994 1 0.994 0 0.994 0 0.912 4 0.903 1 0.903 9
Bi-LSTM+Attention 0.977 9 0.977 5 0.977 7 0.954 3 0.954 5 0.952 8 0.985 7 0.985 4 0.985 4 0.956 6 0.956 5 0.956 5
Bi-LSTM+ CNN +Attention 0.981 5 0.982 0 0.981 6 0.988 9 0.988 1 0.988 3 0.989 9 0.990 2 0.990 0 0.952 7 0.952 3 0.952 5
Table 8  句子级层次注意力网络对PLoS系列期刊学术文本结构功能的识别结果
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
LSTM+CNN 0.959 7 0.959 5 0.959 5 0.964 5 0.964 3 0.964 3 0.978 5 0.978 6 0.978 5 0.899 5 0.892 5 0.893 6
LSTM+Attention 0.952 0 0.949 7 0.950 4 0.962 5 0.962 3 0.962 3 0.982 3 0.982 4 0.982 3 0.896 4 0.896 0 0.896 1
LSTM+CNN
+Attention
0.945 5 0.944 3 0.943 7 0.954 1 0.951 2 0.951 6 0.980 3 0.979 5 0.979 8 0.895 4 0.896 2 0.895 6
Bi-LSTM+CNN 0.965 0 0.964 8 0.964 9 0.960 8 0.959 7 0.959 9 0.981 2 0.981 0 0.981 1 0.885 8 0.863 7 0.859 9
Bi-LSTM+Attention 0.961 5 0.961 4 0.961 4 0.973 2 0.972 7 0.972 8 0.983 3 0.983 3 0.983 3 0.918 1 0.917 6 0.917 8
Bi-LSTM+CNN
+Attention
0.971 9 0.972 4 0.972 1 0.961 5 0.961 2 0.960 8 0.983 4 0.982 8 0.983 1 0.915 3 0.913 6 0.914 1
Table 9  段落级层次注意力网络对PLoS系列期刊学术文本结构功能的识别结果
Fig.4  5种章节功能识别模型对比
模型 Macro-P Macro-R Macro-F1 Accuracy-I Accuracy-M Accuracy-R Accuracy-D
Baseline-1 XGB TF-IDF 0.528 9 0.531 9 0.523 4 0.540 9 0.308 3 0.660 5 0.618 0
CHI 0.788 0 0.786 9 0.785 9 0.920 3 0.730 3 0.656 3 0.840 7
IG 0.604 6 0.608 2 0.604 8 0.731 7 0.481 7 0.585 9 0.633 4
Baseline-2 Bert Model 0.848 0 0.849 1 0.848 2 0.936 8 0.768 4 0.812 5 0.878 7
单词级HAN Bi-LSTM+CNN 0.826 0 0.829 4 0.826 6 0.858 0 0.798 8 0.766 3 0.880 9
Bi-LSTM+Attention 0.851 3 0.852 5 0.851 1 0.917 7 0.780 0 0.868 5 0.839 0
Bi-LSTM+CNN+Attention 0.836 2 0.836 5 0.836 2 0.880 2 0.812 9 0.808 1 0.843 8
句子级HAN Bi-LSTM+CNN 0.813 7 0.798 5 0.788 5 0.961 4 0.836 7 0.506 8 0.889 1
Bi-LSTM+Attention 0.866 8 0.866 6 0.866 1 0.967 2 0.818 6 0.800 4 0.880 1
Bi-LSTM+CNN+Attention 0.853 3 0.849 8 0.848 8 0.953 7 0.874 5 0.711 7 0.859 2
段落级HAN Bi-LSTM+CNN 0.840 8 0.839 8 0.839 6 0.919 6 0.825 3 0.811 3 0.803 2
Bi-LSTM+Attention 0.861 6 0.860 1 0.860 4 0.912 3 0.824 6 0.851 9 0.851 5
Bi-LSTM+CNN+Attention 0.786 6 0.747 2 0.741 7 0.656 1 0.529 3 0.884 2 0.919 4
Table 10   最优模型在章节功能明确的ACP期刊数据中的训练结果
Fig.5  ACP中不规范数据标题序号分布与预测标签分布
Fig.6  其参考文献分布与动作线索词“use”分布
项目 DTW 余弦距离 欧氏距离 K-S检验(p-value)
I M R D I M R D I M R D I M R D
参考文献 0.000 1 0.000 1 0.000 1 0.000 1 0.008 2 0.012 8 0.012 2 0.009 2 0.004 7 0.002 2 0.006 5 0.004 3 0.676 6 0.794 2 0.260 6 0.443 1
动作线索词“use” 0.000 1 0.000 2 0.000 1 0.000 1 0.028 7 0.009 2 0.015 9 0.030 0 0.003 8 0.008 8 0.006 3 0.004 1 0.140 0 0.556 0 0.140 0 0.882 8
Table 11   参考文献、动作线索词“use”分布相似度与K-S检验
Fig.7  参考文献在两种数据集中的分布
DTW 余弦距离 欧氏距离 K-S检验(p-value)
I M R D I M R D I M R D I M R D
0.000 3 0.000 2 0.000 2 0.000 4 0.064 8 0.033 2 0.046 3 0.145 6 0.012 2 0.010 1 0.008 2 0.012 4 0.140 0 0.047 0 0.443 1 0.099 4
Table 12   参考文献在两种数据集中分布相似度与K-S检验
Fig.8  将Results部分替换为新的Material & Methods和Discussion后的参考文献分布相似度
extra_Materials&Methods_train
DTW 余弦距离 欧氏距离 K-S 检验(p-value)
Materials & Methods_train 0.000 2 0.030 5 0.007 9 0.676 6
extra_Materials & Methods_predict 0.000 2 0.036 2 0.008 7 0.260 6
Table 13   extra_Materials & Methods分布相似度与K-S检验
extra_Discussion _train
DTW 余弦距离 欧氏距离 K-S检验(p-value)
Discussion_train 0.000 1 0.093 4 0.005 2 0.099 4
extra_Discussion_predict 0.000 2 0.108 9 0.007 8 0.443 1
Table 14   extra_Discussion分布相似度与K-S检验
Fig.9  动作线索词在两种数据集IMRaD中的分布
单词 DTW 余弦距离 欧氏距离 K-S 检验(p-value)
I M R D I M R D I M R D I M R D
show 0.000 2 0.000 2 0.000 5 0.000 2 0.103 5 0.054 4 0.034 0 0.093 4 0.007 1 0.009 0 0.019 1 0.007 7 0.343 9 0.193 0 0.013 1 0.535 8
use 0.000 2 0.000 2 0.000 2 0.000 2 0.047 8 0.017 8 0.037 6 0.101 5 0.007 4 0.007 4 0.010 0 0.008 8 0.140 0 0.556 0 0.443 1 0.093 4
perform 0.000 4 0.000 6 0.000 4 0.000 3 0.349 4 0.124 2 0.106 7 0.273 9 0.019 7 0.025 9 0.016 6 0.015 3 0.013 1 0.031 4 0.013 1 0.008 2
follow 0.000 5 0.000 5 0.000 4 0.000 3 0.232 6 0.125 4 0.121 0 0.292 8 0.017 2 0.020 2 0.018 7 0.011 6 0.003 0 0.003 0 0.069 1 0.003 0
find 0.000 4 0.000 4 0.000 5 0.000 3 0.186 3 0.153 4 0.126 8 0.171 7 0.015 3 0.014 5 0.021 5 0.014 0 0.260 6 0.005 0 0.260 6 0.193 0
report 0.000 5 0.000 4 0.000 7 0.000 5 0.301 9 0.259 3 0.242 8 0.268 0 0.024 3 0.018 4 0.030 5 0.021 3 0.000 2 0.013 1 0.020 5 0.000 3
suggest 0.000 5 0.000 3 0.000 6 0.000 5 0.297 6 0.384 3 0.172 2 0.239 6 0.022 3 0.014 6 0.026 8 0.020 7 0.069 1 0.000 0 0.047 0 0.000 6
include 0.000 6 0.000 5 0.000 4 0.000 3 0.298 4 0.161 3 0.119 6 0.275 5 0.022 4 0.025 0 0.016 5 0.012 7 0.031 4 0.008 2 0.099 4 0.140 0
Table 15   动作线索词分布相似度计算与K-S检验
数据 PLoS Biology PLoS Medicine PLoS Genetics PLoS Comp. Biol. ACP
Macro-
P
Macro-
R
Macro-F1 Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
PLoS Biology - - - 0.964 0 0.962 9 0.962 8 0.983 8 0.983 3 0.983 5 0.892 5 0.883 3 0.883 1 0.660 6 0.504 1 0.459 4
PLoS Medicine 0.943 2 0.940 8 0.941 2 - - - 0.945 1 0.941 5 0.942 0 0.831 6 0.765 7 0.763 0 0.520 3 0.500 5 0.455 4
PLoS Genetics 0.979 4 0.980 2 0.979 6 0.957 6 0.956 6 0.956 4 - - - 0.927 0 0.927 7 0.926 9 0.560 5 0.565 3 0.526 4
PLoS Com. Biol. 0.977 2 0.976 7 0.976 9 0.967 9 0.967 6 0.967 6 0.984 8 0.984 0 0.984 3 - - - 0.691 5 0.584 4 0.526 4
ACP 0.714 4 0.643 6 0.581 7 0.700 8 0.585 5 0.553 3 0.712 0 0.627 8 0.559 3 0.690 7 0.607 7 0.558 8 - - -
Table 16   层次注意力网络领域适应问题分析(以句子级Bi-LSTM+CNN+Attention为例)
[1] Norris M, Oppenheim C, Rowland F. The Citation Advantage of Open-Access Articles[J]. Journal of the Association for Information Science & Technology, 2014,59(12):1963-1972.
[2] Wang X M, Liu C, Mao W L, et al. The Open Access Advantage Considering Citation, Article Usage and Social Media Attention[J]. Scientometrics, 2015,103(2):555-564.
[3] Haustein S, Piwowar H A, Priem J, et al. Data From: The State of OA: A Large-Scale Analysis of the Prevalence and Impact of Open Access Articles[J]. PeerJ, 2018,6(4):e4375.
[4] Zhang R, Guo J F, Fan Y X, et al. Outline Generation: Understanding the Inherent Content Structure of Documents[OL]. arXiv Preprint, arXiv: 1905. 10039.
[5] Sollaci L B, Pereira M G. The Introduction, Methods, Results, and Discussion (IMRAD) Structure: A Fifty-Year Survey[J]. Journal of the Medical Library Association, 2004,92(3):364.
pmid: 15243643
[6] Bertin M, Atanassova I, Gingras Y, et al. The Invariant Distribution of References in Scientific Articles[J]. Journal of the Association for Information Science & Technology, 2016,67(1):164-177.
[7] Bertin M, Atanassova I, Sugimoto C R, et al. The Linguistic Patterns and Rhetorical Structure of Citation Context:An Approach Using N-Grams[J]. Scientometrics, 2016,109(3):1417-1434.
[8] Hu Z G, Chen C M, Liu Z Y. Where are Citations Located in the Body of Scientific Articles? A Study of the Distributions of Citation Locations[J]. Journal of Informetrics, 2013,7(4):887-896.
doi: 10.1016/j.joi.2013.08.005
[9] Nair P K R, Nair V D. Scientific Writing and Communication in Agriculture and Natural Resources[M]. Springer, 2014.
[10] International Committee of Medical Journal Editors. Uniform Requirements for Manuscripts Submitted to Biomedical Journals[J]. The New England Journal of Medicine, 1991,324(6):424-428.
pmid: 1987468
[11] Devlin J, Chang M W, Lee K, et al. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810. 04805.
[12] Noriko K. Text-Level Structure of Research Papers: Implications for Text-Based Information Processing Systems[C]// Proceedings of the 19th Annual BCS-IRSG Conference on Information Retrieval Research, Swindon, United Kingdom. 1997: 1-14.
[13] Santiago P. The Schematic Structure of Computer Science Research Articles[J]. English for Specific Purposes, 1999,18(2):139-160.
[14] Budsaba K. Rhetorical Structure of Biochemistry Research Articles[J]. English for Specific Purposes, 2005,24(3):269-292.
doi: 10.1016/j.esp.2004.08.003
[15] McKnight L, Srinivasan P. Categorization of Sentence Types in Medical Abstracts[C]// Proceedings of AMIA Annual Symposium,Washington, DC, USA. 2003: 440-444.
[16] Mizuta Y, Korhonen A, Mullen T, et al. Zone Analysis in Biology Articles as a Basis for Information Extraction[J]. International Journal of Medical Informatics, 2006,75(6):468-487.
doi: 10.1016/j.ijmedinf.2005.06.013 pmid: 16112609
[17] Agarwal S, Yu H. Automatically Classifying Sentences in Full-Text Biomedical Articles into Introduction, Methods, Results and Discussion[J]. Bioinformatics, 2009,25(23):3174-3180.
doi: 10.1093/bioinformatics/btp548 pmid: 19783830
[18] Ribeiro S S, Yao J T, Rezende D A. Discovering IMRAD Structure with Different Classifiers[C]// Proceedings of 2018 IEEE International Conference on Big Knowledge (ICBK,Singapore. 2018: 200-204.
[19] 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014,33(9):979-985.
[19] ( Lu Wei, Huang Yong, Cheng Qikai. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(9):979-985.)
[20] 黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016,35(3):293-300.
[20] ( Huang Yong, Lu Wei, Cheng Qikai. The Structure Function Recognition of Academic Text Chapter Content Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(3):293-300.)
[21] 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016,35(5):530-538.
[21] ( Huang Yong, Lu Wei, Cheng Qikai, et al. The Structure Function Recognition of Academic Text Paragraph-Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(5):530-538.)
[22] 王东波, 高瑞卿, 叶文豪, 等. 不同特征下的学术文本结构功能自动识别研究[J]. 情报学报, 2018,37(10):997-1008.
[22] ( Wang Dongbo, Gao Ruiqing, Ye Wenhao, et al. Research on the Structure Recognition of Academic Texts Under Different Characteristics[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(10):997-1008.)
[23] 王佳敏, 陆伟, 刘家伟, 等. 多层次融合的学术文本结构功能识别研究[J]. 图书情报工作, 2019,63(13):95-104.
[23] ( Wang Jiamin, Lu Wei, Liu Jiawei, et al. Research on Structure Function Recognition of Academic Text Based on Multi-Level Fusion[J]. Library and Information Service, 2019,63(13):95-104.)
[24] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[OL]. arXiv Preprint, arXiv: 1607. 01759.
[25] Pappas N, Popescu-Belis A. Multilingual Hierarchical Attention Networks for Document Classification[OL]. arXiv Preprint, arXiv: 1707. 00896.
[26] Zhang X, Zhao J B, Lecun Y. Character-Level Convolutional Networks for Text Classification[OL]. arXiv Preprint, arXiv: 1509. 01626.
[27] Lee J Y, Dernoncourt F. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks[OL]. arXiv Preprint, arXiv: 1603. 03827.
[28] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[OL]. arXiv Preprint, arXiv: 1706. 03762.
[29] Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vancouver, Canada. 2017: 1480-1489.
[30] Giorgino T. Computing and Visualizing Dynamic Time Warping Alignments in R: The DTW Package[J]. Journal of Statistical Software, 2009,31(7):1-25.
[31] Massey F J. The Kolmogorov-Smirnov Test for Goodness of Fit[J]. Publications of the American Statistical Association, 1951,46(253):68-78.
[32] Yang Y M. An Evaluation of Statistical Approaches to Text Categorization[J]. Information Retrieval, 1999,1(1/2):69-90.
doi: 10.1023/A:1009982220290
[33] Cherkassky V, Ma Y Q. Practical Selection of SVM Parameters and Noise Estimation for SVM Regression[J]. Neural Networks, 2004,17(1):113-126.
doi: 10.1016/S0893-6080(03)00169-2 pmid: 14690712
[34] Hernández-Lobato J M, Hernández-Lobato D, Suárez A. Expectation Propagation in Linear Regression Models with Spike- and-Slab Priors[J]. Machine Learning, 2015,99(3):437-487.
doi: 10.1007/s10994-014-5475-7
[35] Sebe N, Lew M S, Cohen I, et al. Emotion Recognition Using a Cauchy Naive Bayes Classifier[C]// Proceedings of the 16th International Conference on Pattern Recognition, Quebec, Canada. 2002: 17-20.
[36] Zhang H, Berg A C, Michael B, et al. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition[C]// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,New York, USA. 2006: 17-22.
[37] Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA. 2016: 785-794.
[38] Wu H C, Luk K, Wong K F, et al. Interpreting TF-IDF Term Weights as Making Relevance Decisions[J]. ACM Transactions on Information System, 2008,26(3):1-37.
[39] Satorra A, Bentler P M. A Scaled Difference Chi-Square Test Statistic for Moment Structure Analysis[J]. Psychometrika, 2001,66(4):507-514.
doi: 10.1007/BF02296192
[40] Kent J T. Information Gain and a General Measure of Correlation[J]. Biometrika, 1983,70(1):163-173.
doi: 10.1093/biomet/70.1.163
[41] Rigby A S. Statistical Methods in Epidemiology. v. Towards an Understanding of the Kappa Coefficient[J]. Disability & Rehabilitation, 2000,22(8):339-344.
doi: 10.1080/096382800296575 pmid: 10896093
[42] Gannon T, Madnick S E, Moulton A, et al. Framework for the Analysis of the Adaptability, Extensibility, and Scalability of Semantic Information Integration and the Context Mediation Approach[C]// Proceedings of the 42nd Hawaii International Conference on System Sciences. 2009: 1-11.
[1] 李娇,黄永文,罗婷婷,赵瑞雪,鲜国建. 基于多因子算法的自动分类研究*[J]. 数据分析与知识发现, 2020, 4(11): 43-51.
[2] 王晰巍,张柳,黄博,韦雅楠. 基于LDA的微博用户主题图谱构建及实证研究*——以“埃航空难”为例[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[3] 丁恒,李映萱. 基于深度学习的问答平台查询推荐研究*[J]. 数据分析与知识发现, 2020, 4(10): 37-46.
[4] 李家全,李宝安,游新冬,吕学强. 基于专利知识图谱的专利术语相似度计算研究*[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[5] 徐彤彤,孙华志,马春梅,姜丽芬,刘逸琛. 基于双向长效注意力特征表达的少样本文本分类模型研究*[J]. 数据分析与知识发现, 2020, 4(10): 113-123.
[6] 陶玥,余丽,张润杰. 科技文献中短语级主题抽取的主动学习方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[7] 张纯金, 郭盛辉, 纪淑娟, 杨伟, 伊磊. 基于多属性评分隐表征学习的群组推荐算法 [J]. 数据分析与知识发现, 0, (): 1-.
[8] 张思凡, 牛振东, 陆浩, 朱一凡, 王荣荣. 基于图卷积嵌入与特征交叉的文献被引量预测方法:以交通运输领域为例 [J]. 数据分析与知识发现, 0, (): 1-.
[9] 张思凡,牛振东,陆浩,朱一凡,王荣荣. 基于图卷积嵌入与特征交叉的文献被引量预测方法:以交通运输领域为例*[J]. 数据分析与知识发现, 2020, 4(9): 56-67.
[10] 曾桢,李纲,毛进,陈璟浩. 区域公共安全数据治理与业务领域本体研究*[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[11] 温萍梅,叶志炜,丁文健,刘颖,徐健. 命名实体消歧研究进展综述*[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[12] 黄露,周恩国,李岱峰. 融合特定任务信息注意力机制的文本表示学习模型*[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[13] 刘倩, 李晨亮. 基于社交媒体的话题演变研究综述*[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[14] 沈喆, 王毅, 姚毅凡, 成颖. 面向学术文献的作者名消歧方法研究综述*[J]. 数据分析与知识发现, 2020, 4(8): 15-27.
[15] 盛嘉祺, 许鑫. 融合主题相似度与合著网络的学者标签扩展方法研究*[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn