Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (11): 26-42     https://doi.org/10.11925/infotech.2096-3467.2020.0364
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于层次注意力网络模型的学术文本结构功能识别*
秦成磊,章成志()
南京理工大学经济管理学院 南京 210094
Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network
Qin Chenglei,Zhang Chengzhi()
School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
全文: PDF (7221 KB)   HTML ( 25
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对当前学术文本章节功能识别存在诸多不足的现状,提出使用层次注意力网络模型提升学术文本章节功能识别的效果。【方法】 首先,构建能够捕获章节结构信息的不同粒度的层次注意力网络模型,对比分析使用不同文本特征向量的传统机器学习模型、Bert模型与层次注意力网络模型在PLoS的4种期刊规范数据集上的学术文本结构功能的识别结果以获取最佳模型;随后,使用最佳模型识别Atmospheric Chemistry and Physics(ACP,IF 5.6)期刊中章节标题命名缺乏规范且人工标注结构功能一致性较低的章节的结构功能,并提出使用参考文献分布相似、动词线索词分布相似评估识别结果;最后,对所构建的层次注意力网络模型的领域适应性进行分析。【结果】 以Bi-LSTM+Attention为编码器的句子级层次注意力网络模型识别效果优于其他模型,Macro-F1值为0.866 1;存在领域适应问题,在差异较大的领域中模型识别性能下降明显,Macro-F1值最低为0.455 4。【局限】 不能识别具有混合结构的章节的功能;模型中未考虑文章结构之间的逻辑关系。【结论】 句子级层次注意力网络模型能够较好地识别章节的结构功能,引入学术文本结构信息能够丰富和拓展基于学术论文全文本相关研究的研究内容与范围。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
秦成磊
章成志
关键词 学术文本结构功能识别层次注意力网络IMRaD领域适应性分析    
Abstract

[Objective] This paper proposes a new method using hierarchical attention network, aiming to effectively recognize structure functions of scholarly articles. [Methods] First, we constructed a network model with different-grained hierarchical attention to automatically identify the functions of text structures. Then, we examined the performance of our method with four datasets from PLoS. Same tests were also applied to traditional machine learning models with text feature vectors, as well as and Bert model. We also modified the proposed model in accordance with test results. Third, we evaluated the performance of the new model with articles from Atmospheric Chemistry and Physics and decided the compatibility of this model for other domains. [Results] At the sentence level, our model (using Bi-LSTM+Attention as the encoder) outperformed the others (Macro-F1: 0.866 1). However, this model did not perform well in un-related fields (minimum Macro-F1: 0.455 4). [Limitations] The model cannot recognize functions of mixed structure texts, as well as the logical relationship in these structures. [Conclusions] The proposed model could effectively recognize the structure functions at sentence level, which expands research of the full text scholarly literature.

Key wordsFunction Recognition of Academic Text Structure    Hierarchical Attention Network    IMRaD    Domain Adaptability Analysis
收稿日期: 2020-04-27      出版日期: 2020-12-04
ZTFLH:  TP393  
基金资助:*本文系中国科学技术信息研究所情报工程实验室开放基金项目“大数据环境下同行评议特征要素分析与关键问题研究”的研究成果之一
通讯作者: 章成志     E-mail: zhangcz@njust.edu.cn
引用本文:   
秦成磊,章成志. 基于层次注意力网络模型的学术文本结构功能识别*[J]. 数据分析与知识发现, 2020, 4(11): 26-42.
Qin Chenglei,Zhang Chengzhi. Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network. Data Analysis and Knowledge Discovery, 2020, 4(11): 26-42.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0364      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I11/26
章节

示例
文章1 文章2 文章3 文章4 文章5
Section 1 Introduction and background Introduction Introduction Introduction Introduction
Section 2 Data In-situ measurements Model description and initialization Theory-The cloud center of gravity Experimental methods
Section 3 Analysis methods Methods Results Case study: aerosol effects on a warm convective cloud Results and discussion
Section 4 Characterisation of data sets Mixing time-scale in the subtropical tropopause layer Discussion and conclusions Summary Conclusions
Section 5 An example of “Hector”: 10 February 2006 TS mixing above the subtropical tropopause layer None None None
Section 6 Results Mixing time-scale at the tropical tropopause None None None
Section 7 Conclusions Conclusions None None None
Table 1  ACP期刊中部分论文章节标题示例
Fig.1  学术文本结构功能识别研究框架图
期刊 论文数 时间段 Introduction
数量
Methods
数量
Results
数量
Discussions
数量
Others
数量 占比
PLoS Biology 2 976 2003-2019 2 951 2 915 2 625 2 618 286 2.57%
PLoS Medicine 1 740 2004-2019 1 728 1 721 1 714 1 711 137 1.99%
PLoS Genetics 7 268 2005-2019 7 268 7 241 6 700 6 698 639 2.29%
PLoS Computational Biology 5 851 2005-2019 5 851 5 443 5 229 5 107 452 2.09%
ACP 7 279 2001-2016 7 067 7 243 2 743 7 397 7 688 31.44%
Table 2  PLoS系列期刊与ACP期刊数据介绍
Fig.2  ACP数据中章节标题分布情况
结构 功能 包含内容
引言(Introduction) 说明开展这项研究的原因 背景介绍,基本原理,研究目的,相关研究的总结、回顾
方法和材料(Materials & Methods) 应用了哪些方法、哪些材料被应用 研究设计、研究材料、影响评估
结果(Results) 阐述研究发现 进行分析,用文字、图、表等详细说明发现内容
讨论(Discussion) 探讨结果的意义 重述主要发现,阐明优势和不足,对其他研究的启示,未解决的问题与未来将要展开的研究,最后进行总结
Table 3  IMRaD结构功能说明[5]
结构功能 章节标题特征词
Introduction introduction, motivation, background, overview, review of literature
Materials & Methods system, theory, method, methods, methodology, model, models, framework, approach, approaches, methodologies, experimental, experiment, experiments, data, data and methods
Results result, results, analysis
Discussion discussion, discussions, conclusion, conclusions, summary, concluding, summary and conclusions
Table 4  结构功能与之对应的章节标题特征词
Fig.3  层次注意力网络结构示意图(以编码器为Bi-LSTM+CNN+Attention为例)
模型 特征
向量
PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
SVM TF-IDF 0.652 5 0.508 7 0.440 6 0.602 6 0.586 2 0.563 9 0.665 0 0.536 1 0.462 1 0.444 8 0.461 2 0.365 7
CHI 0.492 3 0.387 5 0.394 1 0.486 0 0.394 9 0.346 4 0.442 8 0.381 1 0.376 8 0.661 0 0.612 8 0.553 2
IG 0.388 8 0.383 6 0.368 1 0.648 9 0.586 2 0.525 5 0.526 8 0.506 2 0.464 3 0.461 5 0.459 0 0.380 9
LR TF-IDF 0.503 5 0.424 7 0.352 4 0.588 1 0.568 2 0.562 8 0.514 8 0.499 8 0.459 5 0.319 7 0.373 8 0.301 7
CHI 0.494 0 0.471 8 0.417 8 0.457 5 0.457 0 0.427 7 0.466 3 0.454 2 0.393 7 0.589 2 0.498 9 0.420 9
IG 0.386 2 0.383 8 0.375 0 0.610 3 0.591 6 0.560 2 0.455 7 0.439 2 0.424 0 0.334 7 0.435 7 0.359 6
NB TF-IDF 0.496 7 0.488 5 0.434 5 0.516 7 0.515 1 0.483 1 0.555 4 0.520 7 0.477 4 0.421 5 0.431 6 0.365 7
CHI 0.654 4 0.617 7 0.605 7 0.572 9 0.531 1 0.486 2 0.536 4 0.555 2 0.488 1 0.726 4 0.679 6 0.633 3
IG 0.514 4 0.463 4 0.460 7 0.620 8 0.593 9 0.587 5 0.436 7 0.364 6 0.306 7 0.418 6 0.315 9 0.236 0
KNN TF-IDF 0.562 0 0.578 0 0.564 3 0.623 8 0.625 3 0.623 0 0.635 8 0.644 3 0.635 7 0.442 1 0.459 7 0.440 9
CHI 0.625 9 0.626 1 0.624 6 0.546 8 0.544 4 0.544 6 0.634 6 0.636 7 0.632 7 0.704 6 0.711 3 0.707 4
IG 0.607 0 0.609 9 0.607 1 0.694 4 0.694 7 0.694 3 0.585 7 0.588 8 0.585 5 0.488 5 0.492 8 0.487 5
XGB TF-IDF 0.712 8 0.715 5 0.711 4 0.694 1 0.695 5 0.694 4 0.744 1 0.744 1 0.742 7 0.539 3 0.550 6 0.533 0
CHI 0.743 9 0.744 7 0.743 7 0.623 2 0.615 3 0.614 5 0.739 7 0.740 5 0.739 0 0.766 0 0.771 3 0.763 5
IG 0.746 1 0.746 5 0.745 7 0.818 2 0.817 9 0.817 8 0.772 8 0.771 9 0.770 2 0.598 5 0.600 9 0.597 1
Table 5  传统机器学习模型下PLoS系列期刊学术文本结构功能识别结果
PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
0.919 8 0.919 4 0.919 4 0.973 2 0.973 1 0.973 1 0.929 2 0.929 2 0.929 1 0.837 8 0.838 6 0.838 1
Table 6  基于Bert模型的学术文本结构功能识别结果
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
CNN 0.968 6 0.968 9 0.968 7 0.972 5 0.972 6 0.972 4 0.986 5 0.986 4 0.986 4 0.935 4 0.931 4 0.932 1
CNN_Multi Filters 0.976 1 0.975 5 0.975 6 0.971 1 0.970 9 0.971 0 0.986 9 0.987 1 0.986 9 0.947 3 0.946 1 0.946 5
LSTM 0.869 8 0.869 5 0.868 2 0.892 0 0.893 6 0.892 3 0.985 9 0.986 5 0.986 2 0.944 9 0.945 2 0.944 1
LSTM+CNN 0.973 4 0.973 7 0.973 5 0.964 1 0.963 8 0.963 9 0.986 6 0.987 0 0.986 8 0.950 9 0.951 3 0.951 0
LSTM+Attention 0.970 0 0.970 8 0.970 2 0.975 9 0.975 9 0.975 8 0.992 0 0.991 9 0.991 9 0.940 7 0.941 3 0.940 8
LSTM+CNN
+Attention
0.980 4 0.980 6 0.980 4 0.982 8 0.982 7 0.982 7 0.990 4 0.990 3 0.990 3 0.955 5 0.956 8 0.955 9
Bi-LSTM 0.828 5 0.855 7 0.818 2 0.891 7 0.894 2 0.890 9 0.976 6 0.976 4 0.976 1 0.919 9 0.927 1 0.920 8
Bi-LSTM+CNN 0.974 3 0.975 3 0.974 7 0.976 1 0.975 5 0.975 7 0.986 8 0.987 2 0.986 9 0.948 5 0.951 5 0.949 1
Bi-LSTM+Attention 0.970 9 0.971 1 0.970 7 0.973 6 0.974 0 0.973 7 0.988 0 0.988 0 0.988 0 0.950 6 0.950 1 0.950 3
Bi-LSTM+CNN
+Attention
0.981 3 0.980 4 0.980 7 0.982 6 0.982 8 0.982 6 0.991 5 0.992 0 0.991 7 0.962 1 0.961 7 0.961 8
Table 7  单词级层次注意力网络对PLoS系列期刊学术文本结构功能的识别结果
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
LSTM 0.961 7 0.961 9 0.961 7 0.954 2 0.949 5 0.949 0 0.985 9 0.985 8 0.985 8 0.925 5 0.925 7 0.925 1
LSTM+CNN 0.975 8 0.977 1 0.976 4 0.982 2 0.981 7 0.981 9 0.989 5 0.989 7 0.989 6 0.935 2 0.935 2 0.935 2
LSTM+Attention 0.980 9 0.980 9 0.980 9 0.986 2 0.986 1 0.986 2 0.989 2 0.989 4 0.989 3 0.936 2 0.935 3 0.935 6
LSTM+CNN
+Attention
0.968 7 0.968 0 0.968 2 0.983 4 0.983 3 0.983 3 0.983 3 0.983 4 0.983 3 0.931 5 0.930 4 0.930 4
Bi-LSTM 0.971 6 0.970 8 0.971 0 0.960 6 0.957 4 0.957 0 0.976 7 0.974 4 0.975 0 0.938 7 0.939 2 0.938 5
Bi-LSTM+CNN 0.979 5 0.979 8 0.979 6 0.988 7 0.988 3 0.988 7 0.994 1 0.994 0 0.994 0 0.912 4 0.903 1 0.903 9
Bi-LSTM+Attention 0.977 9 0.977 5 0.977 7 0.954 3 0.954 5 0.952 8 0.985 7 0.985 4 0.985 4 0.956 6 0.956 5 0.956 5
Bi-LSTM+ CNN +Attention 0.981 5 0.982 0 0.981 6 0.988 9 0.988 1 0.988 3 0.989 9 0.990 2 0.990 0 0.952 7 0.952 3 0.952 5
Table 8  句子级层次注意力网络对PLoS系列期刊学术文本结构功能的识别结果
编码器 PLoS Biology PLoS Medicine PLoS Genetics PLoS Computational Biology
Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1 Macro-P Macro-R Macro-F1
LSTM+CNN 0.959 7 0.959 5 0.959 5 0.964 5 0.964 3 0.964 3 0.978 5 0.978 6 0.978 5 0.899 5 0.892 5 0.893 6
LSTM+Attention 0.952 0 0.949 7 0.950 4 0.962 5 0.962 3 0.962 3 0.982 3 0.982 4 0.982 3 0.896 4 0.896 0 0.896 1
LSTM+CNN
+Attention
0.945 5 0.944 3 0.943 7 0.954 1 0.951 2 0.951 6 0.980 3 0.979 5 0.979 8 0.895 4 0.896 2 0.895 6
Bi-LSTM+CNN 0.965 0 0.964 8 0.964 9 0.960 8 0.959 7 0.959 9 0.981 2 0.981 0 0.981 1 0.885 8 0.863 7 0.859 9
Bi-LSTM+Attention 0.961 5 0.961 4 0.961 4 0.973 2 0.972 7 0.972 8 0.983 3 0.983 3 0.983 3 0.918 1 0.917 6 0.917 8
Bi-LSTM+CNN
+Attention
0.971 9 0.972 4 0.972 1 0.961 5 0.961 2 0.960 8 0.983 4 0.982 8 0.983 1 0.915 3 0.913 6 0.914 1
Table 9  段落级层次注意力网络对PLoS系列期刊学术文本结构功能的识别结果
Fig.4  5种章节功能识别模型对比
模型 Macro-P Macro-R Macro-F1 Accuracy-I Accuracy-M Accuracy-R Accuracy-D
Baseline-1 XGB TF-IDF 0.528 9 0.531 9 0.523 4 0.540 9 0.308 3 0.660 5 0.618 0
CHI 0.788 0 0.786 9 0.785 9 0.920 3 0.730 3 0.656 3 0.840 7
IG 0.604 6 0.608 2 0.604 8 0.731 7 0.481 7 0.585 9 0.633 4
Baseline-2 Bert Model 0.848 0 0.849 1 0.848 2 0.936 8 0.768 4 0.812 5 0.878 7
单词级HAN Bi-LSTM+CNN 0.826 0 0.829 4 0.826 6 0.858 0 0.798 8 0.766 3 0.880 9
Bi-LSTM+Attention 0.851 3 0.852 5 0.851 1 0.917 7 0.780 0 0.868 5 0.839 0
Bi-LSTM+CNN+Attention 0.836 2 0.836 5 0.836 2 0.880 2 0.812 9 0.808 1 0.843 8
句子级HAN Bi-LSTM+CNN 0.813 7 0.798 5 0.788 5 0.961 4 0.836 7 0.506 8 0.889 1
Bi-LSTM+Attention 0.866 8 0.866 6 0.866 1 0.967 2 0.818 6 0.800 4 0.880 1
Bi-LSTM+CNN+Attention 0.853 3 0.849 8 0.848 8 0.953 7 0.874 5 0.711 7 0.859 2
段落级HAN Bi-LSTM+CNN 0.840 8 0.839 8 0.839 6 0.919 6 0.825 3 0.811 3 0.803 2
Bi-LSTM+Attention 0.861 6 0.860 1 0.860 4 0.912 3 0.824 6 0.851 9 0.851 5
Bi-LSTM+CNN+Attention 0.786 6 0.747 2 0.741 7 0.656 1 0.529 3 0.884 2 0.919 4
Table 10   最优模型在章节功能明确的ACP期刊数据中的训练结果
Fig.5  ACP中不规范数据标题序号分布与预测标签分布
Fig.6  其参考文献分布与动作线索词“use”分布
项目 DTW 余弦距离 欧氏距离 K-S检验(p-value)
I M R D I M R D I M R D I M R D
参考文献 0.000 1 0.000 1 0.000 1 0.000 1 0.008 2 0.012 8 0.012 2 0.009 2 0.004 7 0.002 2 0.006 5 0.004 3 0.676 6 0.794 2 0.260 6 0.443 1
动作线索词“use” 0.000 1 0.000 2 0.000 1 0.000 1 0.028 7 0.009 2 0.015 9 0.030 0 0.003 8 0.008 8 0.006 3 0.004 1 0.140 0 0.556 0 0.140 0 0.882 8
Table 11   参考文献、动作线索词“use”分布相似度与K-S检验
Fig.7  参考文献在两种数据集中的分布
DTW 余弦距离 欧氏距离 K-S检验(p-value)
I M R D I M R D I M R D I M R D
0.000 3 0.000 2 0.000 2 0.000 4 0.064 8 0.033 2 0.046 3 0.145 6 0.012 2 0.010 1 0.008 2 0.012 4 0.140 0 0.047 0 0.443 1 0.099 4
Table 12   参考文献在两种数据集中分布相似度与K-S检验
Fig.8  将Results部分替换为新的Material & Methods和Discussion后的参考文献分布相似度
extra_Materials&Methods_train
DTW 余弦距离 欧氏距离 K-S 检验(p-value)
Materials & Methods_train 0.000 2 0.030 5 0.007 9 0.676 6
extra_Materials & Methods_predict 0.000 2 0.036 2 0.008 7 0.260 6
Table 13   extra_Materials & Methods分布相似度与K-S检验
extra_Discussion _train
DTW 余弦距离 欧氏距离 K-S检验(p-value)
Discussion_train 0.000 1 0.093 4 0.005 2 0.099 4
extra_Discussion_predict 0.000 2 0.108 9 0.007 8 0.443 1
Table 14   extra_Discussion分布相似度与K-S检验
Fig.9  动作线索词在两种数据集IMRaD中的分布
单词 DTW 余弦距离 欧氏距离 K-S 检验(p-value)
I M R D I M R D I M R D I M R D
show 0.000 2 0.000 2 0.000 5 0.000 2 0.103 5 0.054 4 0.034 0 0.093 4 0.007 1 0.009 0 0.019 1 0.007 7 0.343 9 0.193 0 0.013 1 0.535 8
use 0.000 2 0.000 2 0.000 2 0.000 2 0.047 8 0.017 8 0.037 6 0.101 5 0.007 4 0.007 4 0.010 0 0.008 8 0.140 0 0.556 0 0.443 1 0.093 4
perform 0.000 4 0.000 6 0.000 4 0.000 3 0.349 4 0.124 2 0.106 7 0.273 9 0.019 7 0.025 9 0.016 6 0.015 3 0.013 1 0.031 4 0.013 1 0.008 2
follow 0.000 5 0.000 5 0.000 4 0.000 3 0.232 6 0.125 4 0.121 0 0.292 8 0.017 2 0.020 2 0.018 7 0.011 6 0.003 0 0.003 0 0.069 1 0.003 0
find 0.000 4 0.000 4 0.000 5 0.000 3 0.186 3 0.153 4 0.126 8 0.171 7 0.015 3 0.014 5 0.021 5 0.014 0 0.260 6 0.005 0 0.260 6 0.193 0
report 0.000 5 0.000 4 0.000 7 0.000 5 0.301 9 0.259 3 0.242 8 0.268 0 0.024 3 0.018 4 0.030 5 0.021 3 0.000 2 0.013 1 0.020 5 0.000 3
suggest 0.000 5 0.000 3 0.000 6 0.000 5 0.297 6 0.384 3 0.172 2 0.239 6 0.022 3 0.014 6 0.026 8 0.020 7 0.069 1 0.000 0 0.047 0 0.000 6
include 0.000 6 0.000 5 0.000 4 0.000 3 0.298 4 0.161 3 0.119 6 0.275 5 0.022 4 0.025 0 0.016 5 0.012 7 0.031 4 0.008 2 0.099 4 0.140 0
Table 15   动作线索词分布相似度计算与K-S检验
数据 PLoS Biology PLoS Medicine PLoS Genetics PLoS Comp. Biol. ACP
Macro-
P
Macro-
R
Macro-F1 Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
Macro-
P
Macro-
R
Macro-
F1
PLoS Biology - - - 0.964 0 0.962 9 0.962 8 0.983 8 0.983 3 0.983 5 0.892 5 0.883 3 0.883 1 0.660 6 0.504 1 0.459 4
PLoS Medicine 0.943 2 0.940 8 0.941 2 - - - 0.945 1 0.941 5 0.942 0 0.831 6 0.765 7 0.763 0 0.520 3 0.500 5 0.455 4
PLoS Genetics 0.979 4 0.980 2 0.979 6 0.957 6 0.956 6 0.956 4 - - - 0.927 0 0.927 7 0.926 9 0.560 5 0.565 3 0.526 4
PLoS Com. Biol. 0.977 2 0.976 7 0.976 9 0.967 9 0.967 6 0.967 6 0.984 8 0.984 0 0.984 3 - - - 0.691 5 0.584 4 0.526 4
ACP 0.714 4 0.643 6 0.581 7 0.700 8 0.585 5 0.553 3 0.712 0 0.627 8 0.559 3 0.690 7 0.607 7 0.558 8 - - -
Table 16   层次注意力网络领域适应问题分析(以句子级Bi-LSTM+CNN+Attention为例)
[1] Norris M, Oppenheim C, Rowland F. The Citation Advantage of Open-Access Articles[J]. Journal of the Association for Information Science & Technology, 2014,59(12):1963-1972.
[2] Wang X M, Liu C, Mao W L, et al. The Open Access Advantage Considering Citation, Article Usage and Social Media Attention[J]. Scientometrics, 2015,103(2):555-564.
[3] Haustein S, Piwowar H A, Priem J, et al. Data From: The State of OA: A Large-Scale Analysis of the Prevalence and Impact of Open Access Articles[J]. PeerJ, 2018,6(4):e4375.
[4] Zhang R, Guo J F, Fan Y X, et al. Outline Generation: Understanding the Inherent Content Structure of Documents[OL]. arXiv Preprint, arXiv: 1905. 10039.
[5] Sollaci L B, Pereira M G. The Introduction, Methods, Results, and Discussion (IMRAD) Structure: A Fifty-Year Survey[J]. Journal of the Medical Library Association, 2004,92(3):364.
pmid: 15243643
[6] Bertin M, Atanassova I, Gingras Y, et al. The Invariant Distribution of References in Scientific Articles[J]. Journal of the Association for Information Science & Technology, 2016,67(1):164-177.
[7] Bertin M, Atanassova I, Sugimoto C R, et al. The Linguistic Patterns and Rhetorical Structure of Citation Context:An Approach Using N-Grams[J]. Scientometrics, 2016,109(3):1417-1434.
[8] Hu Z G, Chen C M, Liu Z Y. Where are Citations Located in the Body of Scientific Articles? A Study of the Distributions of Citation Locations[J]. Journal of Informetrics, 2013,7(4):887-896.
doi: 10.1016/j.joi.2013.08.005
[9] Nair P K R, Nair V D. Scientific Writing and Communication in Agriculture and Natural Resources[M]. Springer, 2014.
[10] International Committee of Medical Journal Editors. Uniform Requirements for Manuscripts Submitted to Biomedical Journals[J]. The New England Journal of Medicine, 1991,324(6):424-428.
pmid: 1987468
[11] Devlin J, Chang M W, Lee K, et al. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810. 04805.
[12] Noriko K. Text-Level Structure of Research Papers: Implications for Text-Based Information Processing Systems[C]// Proceedings of the 19th Annual BCS-IRSG Conference on Information Retrieval Research, Swindon, United Kingdom. 1997: 1-14.
[13] Santiago P. The Schematic Structure of Computer Science Research Articles[J]. English for Specific Purposes, 1999,18(2):139-160.
[14] Budsaba K. Rhetorical Structure of Biochemistry Research Articles[J]. English for Specific Purposes, 2005,24(3):269-292.
doi: 10.1016/j.esp.2004.08.003
[15] McKnight L, Srinivasan P. Categorization of Sentence Types in Medical Abstracts[C]// Proceedings of AMIA Annual Symposium,Washington, DC, USA. 2003: 440-444.
[16] Mizuta Y, Korhonen A, Mullen T, et al. Zone Analysis in Biology Articles as a Basis for Information Extraction[J]. International Journal of Medical Informatics, 2006,75(6):468-487.
doi: 10.1016/j.ijmedinf.2005.06.013 pmid: 16112609
[17] Agarwal S, Yu H. Automatically Classifying Sentences in Full-Text Biomedical Articles into Introduction, Methods, Results and Discussion[J]. Bioinformatics, 2009,25(23):3174-3180.
doi: 10.1093/bioinformatics/btp548 pmid: 19783830
[18] Ribeiro S S, Yao J T, Rezende D A. Discovering IMRAD Structure with Different Classifiers[C]// Proceedings of 2018 IEEE International Conference on Big Knowledge (ICBK,Singapore. 2018: 200-204.
[19] 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014,33(9):979-985.
[19] ( Lu Wei, Huang Yong, Cheng Qikai. The Structure Function of Academic Text and Its Classification[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(9):979-985.)
[20] 黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016,35(3):293-300.
[20] ( Huang Yong, Lu Wei, Cheng Qikai. The Structure Function Recognition of Academic Text Chapter Content Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(3):293-300.)
[21] 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016,35(5):530-538.
[21] ( Huang Yong, Lu Wei, Cheng Qikai, et al. The Structure Function Recognition of Academic Text Paragraph-Based Recognition[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(5):530-538.)
[22] 王东波, 高瑞卿, 叶文豪, 等. 不同特征下的学术文本结构功能自动识别研究[J]. 情报学报, 2018,37(10):997-1008.
[22] ( Wang Dongbo, Gao Ruiqing, Ye Wenhao, et al. Research on the Structure Recognition of Academic Texts Under Different Characteristics[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(10):997-1008.)
[23] 王佳敏, 陆伟, 刘家伟, 等. 多层次融合的学术文本结构功能识别研究[J]. 图书情报工作, 2019,63(13):95-104.
[23] ( Wang Jiamin, Lu Wei, Liu Jiawei, et al. Research on Structure Function Recognition of Academic Text Based on Multi-Level Fusion[J]. Library and Information Service, 2019,63(13):95-104.)
[24] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[OL]. arXiv Preprint, arXiv: 1607. 01759.
[25] Pappas N, Popescu-Belis A. Multilingual Hierarchical Attention Networks for Document Classification[OL]. arXiv Preprint, arXiv: 1707. 00896.
[26] Zhang X, Zhao J B, Lecun Y. Character-Level Convolutional Networks for Text Classification[OL]. arXiv Preprint, arXiv: 1509. 01626.
[27] Lee J Y, Dernoncourt F. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks[OL]. arXiv Preprint, arXiv: 1603. 03827.
[28] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[OL]. arXiv Preprint, arXiv: 1706. 03762.
[29] Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vancouver, Canada. 2017: 1480-1489.
[30] Giorgino T. Computing and Visualizing Dynamic Time Warping Alignments in R: The DTW Package[J]. Journal of Statistical Software, 2009,31(7):1-25.
[31] Massey F J. The Kolmogorov-Smirnov Test for Goodness of Fit[J]. Publications of the American Statistical Association, 1951,46(253):68-78.
[32] Yang Y M. An Evaluation of Statistical Approaches to Text Categorization[J]. Information Retrieval, 1999,1(1/2):69-90.
doi: 10.1023/A:1009982220290
[33] Cherkassky V, Ma Y Q. Practical Selection of SVM Parameters and Noise Estimation for SVM Regression[J]. Neural Networks, 2004,17(1):113-126.
doi: 10.1016/S0893-6080(03)00169-2 pmid: 14690712
[34] Hernández-Lobato J M, Hernández-Lobato D, Suárez A. Expectation Propagation in Linear Regression Models with Spike- and-Slab Priors[J]. Machine Learning, 2015,99(3):437-487.
doi: 10.1007/s10994-014-5475-7
[35] Sebe N, Lew M S, Cohen I, et al. Emotion Recognition Using a Cauchy Naive Bayes Classifier[C]// Proceedings of the 16th International Conference on Pattern Recognition, Quebec, Canada. 2002: 17-20.
[36] Zhang H, Berg A C, Michael B, et al. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition[C]// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,New York, USA. 2006: 17-22.
[37] Chen T Q, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA. 2016: 785-794.
[38] Wu H C, Luk K, Wong K F, et al. Interpreting TF-IDF Term Weights as Making Relevance Decisions[J]. ACM Transactions on Information System, 2008,26(3):1-37.
[39] Satorra A, Bentler P M. A Scaled Difference Chi-Square Test Statistic for Moment Structure Analysis[J]. Psychometrika, 2001,66(4):507-514.
doi: 10.1007/BF02296192
[40] Kent J T. Information Gain and a General Measure of Correlation[J]. Biometrika, 1983,70(1):163-173.
doi: 10.1093/biomet/70.1.163
[41] Rigby A S. Statistical Methods in Epidemiology. v. Towards an Understanding of the Kappa Coefficient[J]. Disability & Rehabilitation, 2000,22(8):339-344.
doi: 10.1080/096382800296575 pmid: 10896093
[42] Gannon T, Madnick S E, Moulton A, et al. Framework for the Analysis of the Adaptability, Extensibility, and Scalability of Semantic Information Integration and the Context Mediation Approach[C]// Proceedings of the 42nd Hawaii International Conference on System Sciences. 2009: 1-11.
[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 李文娜,张智雄. 基于置信学习的知识库错误检测方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[3] 孙羽, 裘江南. 基于网络分析和文本挖掘的意见领袖影响力研究 [J]. 数据分析与知识发现, 0, (): 1-.
[4] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] 李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[6] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[7] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[9] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[10] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[11] 阮小芸,廖健斌,李祥,杨阳,李岱峰. 基于人才知识图谱推理的强化学习可解释推荐研究*[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[12] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[13] 陈文杰,文奕,杨宁. 基于节点向量表示的模糊重叠社区划分算法*[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[14] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[15] 闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 *[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn