[目的]实现对学术文本章节功能类型的自动判定。[方法]首先构建能够捕获章节结构信息的不同粒度的层次注意力网络模型,对比分析使用不同文本特征向量的传统机器学习模型、Bert模型与层次注意力网络模型在Plos四种期刊规范数据集上的学术文本结构功能的识别结果,以获取最佳模型;随后,使用获取的最佳模型识别Atmospheric Chemistry and Physics (ACP, IF 5.6)期刊中章节标题命名缺乏规范且人工标注结构功能一致性较低的章节的结构功能,提出使用参考文献分布相似、动词线索词分布相似评估识别结果;最后,对所构建的层次注意力网络模型的领域适应性进行分析。[结果]以Bi-Lstm+Attention为编码器的句子级层次注意力网络模型识别效果优于其他模型,Macro-F1值为0.8661;其次,存在领域适应问题,在差异较大的领域中模型识别性能下降明显,Macro-F1值最低为0.4554。[局限]不能识别具有混合结构的章节的功能;模型中未考虑文章结构之间的逻辑关系。[结论]句子级层次注意力网络模型能够较好的识别章节的结构功能,引入学术文本结构信息能够丰富和拓展基于学术论文全文本相关研究的研究内容与范围。
[Objective]The goal of the functional recognition of academic text structure is to automatically recognize the function of the academic text section. [Methods]We construct different-grained hierarchical attention network model and use multiple deep learning models as encoder to automatically identify the function of academic text structure. In addition, the effect of the traditional machine learning models with different text feature vectors and Bert model in the functional recognition of academic text structure are analyzed. And then, we used the distribution similarity of the references, and the similarity of cue word distribution to evaluate the effect of the model in real data. The domain adaptability of the hierarchical attention network model is also analyzed. [Results]The hierarchical attention network model at the sentence level with Bi-Lstm+Attention as the encoder outperforms other methods,the value of Macro-F1 is 0.8661; Secondly, the performance of model classification has dropped significantly in the fields with great differences, Macro-F1 has a minimum value of 0.4554. [Limitations] The function of section with mixed structure can not be recognized, and the logical relationship in article structures is not used in the HAN model. [Conclusions] Sentence level HAN model can better recognize the structure function, and incorporating of academic text structure information can enrich and expand the research content and scope based on the whole text of academic papers