|
|
Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation |
Wang Hao,Lin Kerou( ),Meng Zhen,Li Xinlei |
School of Information Management, Nanjing University, Nanjing 210023, China Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] This paper investigates the performance of entity recognition models for legal judgments, aiming to construct better legal knowledge base in the future. [Methods] First, we extracted the court trial process and court opinions from criminal judgment texts to build an experimental dataset. Then, we compared the entity recognition results of the CRFs model (with artificially constructed features), the IDCNN-CRFs model (with automatically generated features), and the BiLSTM-CRFs model. Both of the IDCNN-CRFs and BiLSTM-CRFs models used pre-trained word vectors for their char embedding. The models’ transferred abilities on other types of legal judgment texts were also compared. [Results] The ALBERT-BiLSTM-CRFs model had the best recognition performance. Its F1 micro-average value reached 95.28%. However, the training time of the IDCNN-CRFs model was about 1/6 of the ALBERT-BiLSTM-CRFs model. Both models had good transferred abilities. [Limitations] Most of the recognized entities were the general ones. More domain-related entities are needed in future studies to enhance the model’s practical value. [Conclusions] The ALBERT-BiLSTM-CRFs and IDCNN-CRFs models could more effectively recognize entities from legal judgments and show better transferred ability than the CRFs model.
|
Received: 07 December 2020
Published: 11 August 2021
|
|
Fund:National Natural Science Foundation of China(72074108);Youth Interdisciplinary Team of Liberal Arts in Nanjing University(2020300093);Jiangsu Young Talents in Social Sciences;Tang Scholar of Nanjing University |
Corresponding Authors:
Lin Kerou,ORCID:0000-0003-0026-8771
E-mail: keroulin@foxmail.com
|
[1] |
徐娟, 杜家明. 智慧司法实施的风险及其法律规制[J]. 河北法学, 2020, 38(8):188-200.
|
[1] |
(Xu Juan, Du Jiaming. Risks and Legal Regulation of Intelligent Justice Implementation[J]. Hebei Law Science, 2020, 38(8):188-200.)
|
[2] |
徐亚文, 伍德志. 法律修辞、语言游戏与判决合法化——对“判决书上网”的法理思考[J]. 河南省政法管理干部学院学报, 2011, 26(1):11-18.
|
[2] |
(Xu Yawen, Wu Dezhi. Legal Eloquence, Language Game and Sentence Legalization——Jurisprudence Thought about “Judgment Online”[J]. Journal of Henan Administrative Institute of Politics and Law, 2011, 26(1):11-18.)
|
[3] |
杨金晶, 覃慧, 何海波. 裁判文书上网公开的中国实践——进展、问题与完善[J]. 中国法律评论, 2019(6):125-147.
|
[3] |
(Yang Jinjing, Qin Hui, He Haibo. China’s Practice of Disclosing Judgment Documents Online: Progress, Problems and Improvements[J]. China Law Review, 2019(6):125-147.)
|
[4] |
冯瑞. 基于深度学习的法院裁判文书命名实体识别研究[D]. 成都: 西南财经大学, 2019.
|
[4] |
(Feng Rui. Research on Named Entity Recognition of Court Judgment Documents Based on Deep Learning[D]. Chengdu: Southwestern University of Finance and Economics, 2019.)
|
[5] |
谢云. 面向中文法律文本的命名实体识别研究[D]. 南京: 南京师范大学, 2018.
|
[5] |
(Xie Yun. Research on Named Entity Recognition for Chinese Legal Texts[D]. Nanjing: Nanjing Normal University, 2018.)
|
[6] |
佘贵清, 张永安. 审判案例自动抽取与标注模型研究[J]. 现代图书情报技术, 2013(6):23-29.
|
[6] |
(She Guiqing, Zhang Yongan. Study on the Model of Automatic Extraction and Annotation of Trail Cases[J]. New Technology of Library and Information Service, 2013(6):23-29.)
|
[7] |
王得贤, 王素格, 裴文生, 等. 基于JCWA-DLSTM的法律文书命名实体识别方法[J]. 中文信息学报, 2020, 34(10):51-58.
|
[7] |
(Wang Dexian, Wang Suge, Pei Wensheng, et al. Named Entity Recognition Based on JCWA-DLSTM for Legal Instruments[J]. Journal of Chinese Information Processing, 2020, 34(10):51-58.)
|
[8] |
Strubell E, Verga P, Belanger D, et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions[OL]. arXiv Preprint, arXiv:1702.02098.
|
[9] |
Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv:1508.01991.
|
[10] |
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
|
[11] |
Devlin J, Chang M-W, Lee K, et al. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
|
[12] |
Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite Bert for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv:1909.11942.
|
[13] |
Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]// Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
|
[14] |
周晓辉. 基于隐式马尔科夫模型的法律命名实体识别模型的设计与应用[D]. 广州: 华南理工大学, 2017.
|
[14] |
(Zhou Xiaohui. Design and Implementation of a Hidden Markov Model Based Model for Legal Named Entity Recognition[D]. Guangzhou: South China University of Technology, 2017.)
|
[15] |
贡保才让. 深层神经网络的藏文命名实体识别研究[D]. 西宁: 青海师范大学, 2018.
|
[15] |
(Gongbaocairang. Study on Tibetan Named Entity Recognition Using Deep Neural Networks[D]. Xining: Qinghai Normal University, 2018.)
|
[16] |
孔玲玲. 面向少量标注数据的中文命名实体识别技术研究[D]. 杭州: 浙江大学, 2019.
|
[16] |
(Kong Lingling. Research on Chinese Named Entity Recognition Technology from Sparsely Annotated Data[D]. Hangzhou: Zhejiang University, 2019.)
|
[17] |
刘玉娇, 琚生根, 李若晨, 等. 基于深度学习的中文微博命名实体识别[J]. 四川大学学报(工程科学版), 2016, 48(S2):142-146.
|
[17] |
(Liu Yujiao, Ju Shenggen, Li Ruochen, et al. Named Entity Recognition in Chinese Micro-blog Based on Deep Learning[J]. Journal of Sichuan University (Engineering Science Edition), 2016, 48(S2):142-146.)
|
[18] |
Hu Z K, Li X, Tu C C, et al. Few-Shot Charge Prediction with Discriminative Legal Attributes[C]// Proceedings of the 27th International Conference on Computational Linguistics. 2018: 487-498.
|
[19] |
Jiang H J, Wang R P, Shan S G, et al. Learning Discriminative Latent Attributes for Zero-Shot Classification[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. 2017: 4223-4232.
|
[20] |
Mencia E L, Fürnkranz J. Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain[C]// Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2008: 50-65.
|
[21] |
Leitner E, Rehm G, Moreno-Schneider J. A Dataset of German Legal Documents for Named Entity Recognition[OL]. arXiv Preprint, arXiv: 2003. 13016.
|
[22] |
de Araujo P H L, de Campos T E, de Oliveira R R, et al. LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text[C]// Proceedings of International Conference on Computational Processing of the Portuguese Language. 2018: 313-323.
|
[23] |
Hovy E, Marcus M, Palmer M, et al. OntoNotes: The 90% Solution[C]// Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. 2006: 57-60.
|
[24] |
Thomas A, Sangeetha S. Performance Analysis of the State-of-the-Art Neural Named Entity Recognition Model on Judicial Domain[A]// Pant M, Sharma T, Verma O, et al. Soft Computing: Theories and Applications[M]. Berlin: Springer, 2020: 147-154.
|
[25] |
Dozier C, Kondadadi R, Light M, et al. Named Entity Recognition and Resolution in Legal Text[A]// Francesconi E, Montemagni S, Peter W, et al. Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language[M]. Berlin: Springer, 2010: 27-43.
|
[26] |
徐建忠, 朱俊, 赵瑞, 等. 基于超图的非连续法律实体识别[J]. 信息技术与信息化, 2017(5):19-22.
|
[26] |
(Xu Jianzhong, Zhu Jun, Zhao Rui, et al. Recognition of Discontiguous Law Entities Based on Hypergraph[J]. Information Technology & Informatization, 2017(5):19-22.)
|
[27] |
张琳, 秦策, 叶文豪. 基于条件随机场的法言法语实体自动识别模型研究[J]. 数据分析与知识发现, 2017, 1(11):46-52.
|
[27] |
(Zhang Lin, Qin Ce, Ye Wenhao. Automatic Recognition of Legal Language Entities Based on Conditional Random Fields[J]. Data Analysis and Knowledge Discovery, 2017, 1(11):46-52.)
|
[28] |
王礼敏. 面向法律文书的中文命名实体识别方法研究[D]. 苏州: 苏州大学, 2018.
|
[28] |
(Wang Limin. Research on Chinese Named Entity Recognition for Legal Documents[D]. Suzhou: Soochow University, 2018.)
|
[29] |
刘晨玥, 李兵, 吴卫星. 基于罪名相关成分标注的刑事裁判文书概要信息提取[J]. 山东科技大学学报(自然科学版), 2018, 37(4):92-101,124.
|
[29] |
(Liu Chenyue, Li Bing, Wu Weixing. Information Extraction of Judical Documents Based on Crime-related Tags[J]. Journal of Shandong University of Science and Technology (Natural Science), 2018, 37(4):92-101, 124.)
|
[30] |
林义孟. 面向司法领域的命名实体识别研究[D]. 昆明: 云南财经大学, 2019.
|
[30] |
(Lin Yimeng. Research on Named Entity Recognition in Judicial Field[D]. Kunming: Yunnan University of Finance and Economics, 2019.)
|
[31] |
黄菡, 王宏宇, 王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别[J]. 数据分析与知识发现, 2019, 3(6):66-74.
|
[31] |
(Huang Han, Wang Hongyu, Wang Xiaoguang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. Data Analysis and Knowledge Discovery, 2019, 3(6):66-74.)
|
[32] |
周晓磊, 赵薛蛟, 刘堂亮, 等. 基于SVM-BiLSTM-CRF模型的财产纠纷命名实体识别方法[J]. 计算机系统应用, 2019, 28(1):245-250.
|
[32] |
(Zhou Xiaolei, Zhao Xuejiao, Liu Tangliang, et al. Named Entity Recognition Method of Judgment Documents with SVM-BiLSTM-CRF[J]. Computer Systems & Applications, 2019, 28(1):245-250.)
|
[33] |
孟昕. 基于深度学习的法律文书识别方法研究[J]. 电子科技, 2019, 32(12):84-86.
|
[33] |
(Meng Xin. Research on Recognition Method of Legal Documents Based on Deep Learning[J]. Electronic Science and Technology, 2019, 32(12):84-86.)
|
[34] |
Carletta J. Assessing Agreement on Classification Tasks: The Kappa Statistic[J]. Computational Linguistics, 1996, 22(2):249-254.
|
[35] |
Hripcsak G, Rothschild A S. Agreement, the F-Measure, and Reliability in Information Retrieval[J]. Journal of the American Medical Informatics Association, 2005, 12(3):296-298.
pmid: 15684123
|
[36] |
Brandsen A, Verberne S, Wansleeben M, et al. Creating a Dataset for Named Entity Recognition in the Archaeology Domain[C]// Proceedings of the 12th Language Resources and Evaluation Conference, Marseille. Paris: European Language Resources Association, 2020: 4573-4577.
|
[37] |
殷章志, 李欣子, 黄德根, 等. 融合字词模型的中文命名实体识别研究[J]. 中文信息学报, 2019, 33(11):95-100, 106.
|
[37] |
(Yin Zhangzhi, Li Xinzi, Huang Degen, et al. Chinese Named Entity Recognition Ensembled with Character[J]. Journal of Chinese Information Processing, 2019, 33(11):95-100, 106.)
|
[38] |
王昊, 邓三鸿, 朱立平, 等. 大数据环境下政务数据的情报价值及其利用研究——以海关报关商品归类风险规避为例[J]. 科技情报研究, 2020, 2(4):74-89.
|
[38] |
(Wang Hao, Deng Sanhong, Zhu Liping, et al. A Study of Intelligence Value and Employment of Political Data in Big Data Environment——The Risk Avoidance of Customs Declaration Commodities[J]. Scientific Information Research, 2020, 2(4):74-89.)
|
[39] |
CRF++: Yet Another CRF toolkit[EB/OL]. [2021-01-20]. https://taku910.github.io/crfpp/.
|
[40] |
Jieba分词工具[EB/OL]. [2021-01-20]. https://github.com/fxsjy/jieba.
|
[40] |
(Chinese Text Segmentation “Jieba” [EB/OL]. [2021-01-20]. https://github.com/fxsjy/jieba.)
|
[41] |
Hinton G E, Srivastava N, Krizhevsky A, et al. Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors[OL]. arXiv Preprint, arXiv:1207.0580.
|
[42] |
Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
doi: 10.1145/3065386
|
[43] |
Eziz E. Kashgari[EB/OL]. [2021-01-20]. https://github.com/BrikerMan/Kashgari.
|
[44] |
朱茂然, 王奕磊, 高松, 等. 中文比较关系的识别:基于注意力机制的深度学习模型[J]. 情报学报, 2019, 38(6):612-621.
|
[44] |
(Zhu Maoran, Wang Yilei, Gao Song, et al. A Deep-Learning Model Based on Attention Mechanism for Chinese Comparative Relation Detection[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(6):612-621.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|