Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (5): 104-114     https://doi.org/10.11925/infotech.2096-3467.2020.1109
     研究论文 本期目录 | 过刊浏览 | 高级检索 |
民事裁判文书两阶段式自动摘要研究*
王义真,欧石燕(),陈金菊
南京大学信息管理学院 南京 210023
Automatic Abstracting Civil Judgment Documents with Two-Stage Procedure
Wang Yizhen,Ou Shiyan(),Chen Jinju
School of Information Management, Nanjing University, Nanjing 210023, China
全文: PDF (939 KB)   HTML ( 13
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对民事一审裁判文书内容进行文本自动摘要,为裁判文书的用户提供简练可读、连贯通顺和准确高效的摘要文本。【方法】 提出一种面向裁判文书自动摘要的新方法,该方法由抽取式摘要和生成式摘要两个阶段构成。在第一阶段抽取式摘要中,在预训练模型的基础上加入膨胀残差门控卷积神经网络进行裁判文书关键句子抽取得到抽取式文摘;在第二阶段生成式摘要中,将抽取式文摘作为模型的输入,通过序列到序列模型生成最终的裁判文书摘要。【结果】 本文所提模型在裁判文书自动摘要实验中的ROUGE指标分别是50.31、36.60、48.86,较基准模型LEAD-3分别提高25.00、23.25、24.66。【局限】 将第一阶段得到的抽取式摘要作为第二阶段生成式模型的输入,存在模型的累计误差,模型的整体效果受到第一阶段抽取式模型的影响。【结论】 本文模型可以有效地应用在裁判文书自动摘要服务中,解决裁判文书信息过载问题,为裁判文书用户提供了一种快速阅读裁判文书、获取知识的新途径。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王义真
欧石燕
陈金菊
关键词 预训练语言模型自动摘要裁判文书生成式摘要抽取式摘要    
Abstract

[Objective] This paper tries to automatically summarize the contents of civil judgment documents in the first-instance, aiming to provide concise, readable, coherent, accurate and efficient knowledge services. [Methods] We proposed an automatic abstracting method for judgment documents, which includes extractive summary stage and abstract summary stage. We first added the expanded residual gate convolution to the pre-training model to extract key sentences from the judgment documents. Then, we input the extractive summary to the sequence to sequence model and generated the final judgment document abstracts. [Results] The ROUGE indicators of the proposed model were 50.31, 36.60, and 48.86 with the experimental data sets of judgment documents, which were 25.00, 23.25, 24.66 higher than the results of the benchmark model (LEAD-3). [Limitations] The extractive summary obtained in the first stage is used as the input of the second stage abstract model, which creates cumulative error issue. The overall performance of the proposed model is decided by the extractive model of the first stage. [Conclusions] The proposed model could summarize judgment texts automatically, which solve the information overload issue and help users quickly read judgment documents.

Key wordsPre-trained Language Model    Automatic Summary    Judgment Documents    Abstract Summarization    Extractive Summarization
收稿日期: 2020-11-11      出版日期: 2021-05-27
ZTFLH:  TP391  
基金资助:*本文系国家社会科学基金项目的研究成果之一(17ATQ001)
通讯作者: 欧石燕     E-mail: oushiyan@nju.edu.cn
引用本文:   
王义真,欧石燕,陈金菊. 民事裁判文书两阶段式自动摘要研究*[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
Wang Yizhen,Ou Shiyan,Chen Jinju. Automatic Abstracting Civil Judgment Documents with Two-Stage Procedure. Data Analysis and Knowledge Discovery, 2021, 5(5): 104-114.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1109      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I5/104
篇章结构组成
成分
功能说明
案件基本信息 记录案件基础信息,包含:文书名、案号、文书性质、裁判日期、审理程序等信息
当事人信息 记录当事人信息,包含:原被告信息简介、原被告类型和数量以及代理律所等信息
审理经过 记录案件审理的过程,包含:案由、原被告名称、案件立案时间等信息
原告诉称 记录原告当事人的诉讼请求和对应诉讼请求的理由等信息
被告辩称 记录被告当事人针对原告当事人的诉讼请求和理由提出反驳的抗辩事由等信息
法院查明 记录法院针对当前案件进行事实、证据调查的结果,查明事实包含详细的证据、事情经过等信息
本院认为 记录裁判文书的说理过程,是法院就案件作出的说理评判信息,包含:当事人双方的争议焦点、案件说理逻辑、引用的法律法规
判决结果 记录案件的详细判决结果,包含原告的权利和义务信息
其他信息 记录审判人员、书记员、判决日期等信息
Table 1  民事一审裁判文书的篇章结构
Fig.1  民事一审裁判文书句子功能框架
Fig.2  裁判文书两阶段自动摘要模型
Fig.3  裁判文书抽取式摘要模型
Fig.4  裁判文书生成式摘要模型
Fig.5  民事一审裁判文书摘要标注流程
操作系统 GPU Python Cuda Tensorflow-GPU Rouge Keras
Ubuntu 18.04 TITAN RTX 3.6.9 10.0 1.14.0 1.5.5 2.3.1
Table 2  实验环境
方法 ROUGE
n=1 n=2 n=L
Baseline LEAD-3 25.31 13.35 24.20
抽取式 NeuSum 44.36 17.79 41.96
BERT+Classifier 45.60 19.99 43.57
BERT+Transformer 47.75 30.81 46.28
生成式 Transformer-Abstractive 44.15 28.62 43.14
Pointer-Generator Networks 47.78 32.82 47.13
Bottom-Up Abstractive 48.01 25.17 46.47
本文模型 TSSM-Extractive 49.69 31.94 48.85
TSSM 50.31 36.60 48.86
Table 3  对比实验结果
[1] Edmundson H P. New Methods in Automatic Extracting[J]. Journal of the ACM, 1969,16(2):264-285.
doi: 10.1145/321510.321519
[2] Liu M, Yu Y, Qi Q, et al. Extractive Single Document Summarization via Multi-feature Combination and Sentence Compression[C]// Proceedings of the 6th CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Cham, 2017: 807-817.
[3] Kupiec J, Pedersen J, Chen F. A Trainable Document Summarizer[C]// Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle. ACM, 1995: 68-73.
[4] Conroy J, O’leary D. Text Summarization via Hidden Markov Models[C]// Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA. Association for Computing Machinery, 2001: 406-407.
[5] Alguliyev R M, Aliguliyev R M, Isazade N R, et al. COSUM: Text Summarization Based on Clustering and Optimization[J]. Expert Systems, 2019,36(1):e12340.
doi: 10.1111/exsy.v36.1
[6] Osborne T J, Nielsen M A. Entanglement in a Simple Quantum Phase Transition[J]. Physical Review A, 2002,66(3):0321103.
[7] Svore K, Vanderwende L, Burges C. Enhancing Single-document Summarization by Combining RankNet and Third-party Sources[C]// Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, 2007: 448-457.
[8] Liu L, Lu Y, Yang M, et al. Generative Adversarial Network for Abstractive Text Summarization[C]//Proceedings of the 2018 AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA. AAAI Press, 2018: 8109-8110.
[9] Al-Sabahi K, Zhang Z, Nadher M. A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)[J]. IEEE Access, 2018,6:24205-24212.
doi: 10.1109/ACCESS.2018.2829199
[10] Cho K, van Merrienboer B, Gulcehre C, et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv: 1406. 1078.
[11] Tan J, Wan X, Xiao J. Abstractive Document Summarization with a Graph-based Attentional Neural Model[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada. Association for Computational Linguistics, 2017: 1171-1181.
[12] Siddiqui T, Shamsi J A. Generating Abstractive Summaries Using Sequence to Sequence Attention[C]// Proceedings of the 2018 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan. IEEE Computer Society, 2018: 212-217.
[13] Celikyilmaz A, Bosselut A, He X D, et al. Deep Communicating Agents for Abstractive Summarization[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans. Association for Computational Linguistics, 2018,1:1662-1675.
[14] 江跃华, 丁磊, 李娇娥, 等. 融合词汇特征的生成式摘要模型[J]. 河北科技大学学报, 2019,40(2):152-158.
[14] ( Jiang Yuehua, Ding Lei, Li Jiao’e, et al. Abstractive Summarization Model Considering Hybrid Lexical Features[J]. Journal of Hebei University of Science and Technology, 2019,40(2):152-158.)
[15] Hachey B, Grover C. Automatic Legal Text Summarization: Experiments with Summary Structuring[C]// Proceedings of the 10th International Conference on Artificial Intelligence and Law, Bologna, Italy. ACM, 2005: 75-84.
[16] Anand D, Wagh R. Effective Deep Learning Approaches for Summarization of Legal Texts[J]. Journal of King Saud University-Computer and Information Sciences, 2019.
[17] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis. Association for Computational Linguistics, 2019: 4171-4186.
[18] Li Y, Zhang X, Chen D. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake. IEEE, 2018: 1091-1100.
[19] Dong L, Yang N, Wang W, et al. Unified Language Model Pre-training for Natural Language Understanding and Generation[J]. Advances in Neural Information Processing Systems, 2019,32:13063-13075.
[20] Gu J, Lu Z, Li H, et al. Incorporating Copying Mechanism in Sequence-to-Sequence Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin. ACL, 2016: 1631-1640.
[21] Lin C. Rouge: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the 2004 Workshop on Text Summarization Branches Out, Spain. 2004: 74-81.
[22] Zhou Q, Yang N, Wei F, et al. Neural Document Summarization by Jointly Learning to Score and Select Sentences[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne. ACL, 2018: 654-663.
[23] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada. Association for Computational Linguistics, 2017: 1073-1083.
[24] Gehrmann S, Deng Y T, Rush A. Bottom-up Abstractive Summarization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. ACL, 2018: 4098-4109.
[1] 陶兴,张向先,郭顺利,张莉曼. 学术问答社区用户生成内容的W2V-MMR自动摘要方法研究*[J]. 数据分析与知识发现, 2020, 4(4): 109-118.
[2] 沈卓,李艳. 基于PreLM-FT细粒度情感分析的餐饮业用户评论挖掘[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[3] 贾晓婷, 王名扬, 曹宇. 结合Doc2Vec与改进聚类算法的中文单文档自动摘要方法研究*[J]. 数据分析与知识发现, 2018, 2(2): 86-95.
[4] 张琳, 秦策, 叶文豪. 基于条件随机场的法言法语实体自动识别模型研究*[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
[5] 刘天祎,步一,赵丹群,黄文彬. 自动引文摘要研究述评[J]. 现代图书情报技术, 2016, 32(5): 1-8.
[6] 唐晓波, 邱鑫. 面向主题的高质量评论挖掘模型研究[J]. 现代图书情报技术, 2015, 31(7-8): 104-112.
[7] 王知津. 基于句子选择的自动文本摘要方法及其评价[J]. 现代图书情报技术, 1998, 14(1): 46-51.
[8] 莫燕,王永成. 中文文献摘要的自动编制[J]. 现代图书情报技术, 1993, 9(3): 10-12.
[9] 王永成. 自动编制文献摘要及知识的自动提取[J]. 现代图书情报技术, 1993, 9(3): 13-13.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn