Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (7): 99-106    DOI: 10.11925/infotech.2096-3467.2022.0040
Original article Current Issue | Archive | Adv Search |
Matching Similar Cases with Legal Knowledge Fusion
Zheng Jie1,Huang Hui2,Qin Yongbin2()
1Department of Information Science, Guiyang Vocational and Technical College, Guiyang 550081, China
2College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
Download: PDF (999 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      

[Objective] This paper constructs a model to match similar cases with integrated legal knowledge, aiming to improve the accuracy of case matching. [Methods] First, we concatenated the legal knowledge with the case texts, which helped the model learn characteristics of legal knowledge and text information simultaneously. Then, we used the LSTM network to model text segmentally, and increased the length of the accommodated texts. Finally, we used triplet loss and adversarial-based contrastive loss to jointly train the model and enhanced its robustness. [Results] The proposed model significantly improved the accuracy of similar case matching, which is 7.07% higher than the baseline BERT model. [Limitations] We used longer text sequences for matching, which is more time consuming than other models. [Conclusions] The proposed model has stronger matching and generalization ability, which helps legal case retrieval.

Key wordsCase Matching      BERT      Legal Knowledge      Segmented Modelling      Triplet Loss     
Received: 13 January 2022      Published: 01 March 2022
ZTFLH:  TP391  
Fund:National Natural Science Foundation of China(62066008)
Corresponding Authors: Qin Yongbin,ORCID: 0000-0002-1960-8628     E-mail:

Cite this article:

Zheng Jie, Huang Hui, Qin Yongbin. Matching Similar Cases with Legal Knowledge Fusion. Data Analysis and Knowledge Discovery, 2022, 6(7): 99-106.

URL:     OR

Architecture of Similar Case Matching Model with Fusion Legal Knowledge
属性 属性值
出借人基本属性 法人,自然人,其他组织
借款人基本属性 法人,自然人,其他组织
借款用途 个人生活,企业生产经营,夫妻生活,违法犯罪,其他
借贷合意的凭据 微信、短信、电话等聊天记录,收据、收条,还款承诺,借款合同、借条、借据,担保,欠条,未知或模糊,其他
出借意图 正常出借、转贷牟利、其他
借款交付形式 银行出账,未出借,票据,授权支配特定资金账户,现金,网上电子汇款,网络贷款平台,未知或模糊,其他
担保类型 保证,无担保,抵押,质押
约定期内利率(换算成年利率) 24%(含)以下,24%(不含)~36%(含),36%(不含)以上,其他
约定计息方式 无利息,单利,复利,约定不明,其他
还款交付形式 银行转账,票据,现金,部分还款,网上电子汇款,未还款,未知或模糊,其他
Attributes of Private Loan Elements
An Example of the Input Construction
Distribution of Text Length
属性名称 参数值
最大句子长度 512
切分段数 2
迭代轮次 4
学习率 2e-5
单卡批次大小 4
梯度累积步数 4
优化器 AdamW
权重衰减指数 0.01
Parameter Setting
模型 准确率/%
验证集 测试集
Baseline[1] CNN 62.27 69.53
LSTM 62.00 68.00
BERT 61.93 67.32
Team[1] 11.2 yuan(ensemble) 66.73 72.07
backward(ensemble) 67.73 71.81
AlphaCourt(ensemble) 70.07 72.66
Ours BERT(single) 68.73 72.72
MS-BERT(single) 68.51 73.24
MS-BERT(ensemble) 70.10 74.39
Model Performance
模型 准确率/%
验证集 测试集
BERT+Triplet 63.93 68.50
BERT+Triplet+CL 64.34 69.09
BERT+Triplet+Multi 65.95 70.71
BERT+Triplet+Feature 65.86 70.64
BERT+Triplet+Feature+Multi 68.47 72.07
BERT+Triplet+Feature+Multi+CL 68.73 72.72
MS-BERT+Triplet+Feature+Multi+CL 68.51 73.24
Results of Ablation Experiments
[1] Xiao C J, Zhong H X, Guo Z P, et al. CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain[OL]. arXiv Preprint, arXiv:1911.08962.
[2] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[3] Schroff F, Kalenichenko D, Philbin J. FaceNet: A Unified Embedding for Face Recognition and Clustering[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2015: 815-823.
[4] Robertson S, Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond[J]. Foundations and Trends® in Information Retrieval, 2009, 3(4): 333-389.
[5] 黄名选, 卢守东, 徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[5] ( Huang Mingxuan, Lu Shoudong, Xu Hui. Cross-Language Information Retrieval Based on Weighted Association Patterns and Rule Consequent Expansion[J]. Data Analysis and Knowledge Discovery, 2019, 3(9): 77-87.)
[6] Mikolov T, Yih W, Zweig G. Linguistic Regularities in Continuous Space Word Representations[C]// Proceedings of NAACL-HLT 2013. Association for Computational Linguistics, 2013: 746-751.
[7] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[8] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2014: 1532-1543.
[9] Li B H, Zhou H, He J X, et al. On the Sentence Embeddings from Pre-Trained Language Models[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020: 9119-9130.
[10] Su J L, Cao J R, Liu W J, et al. Whitening Sentence Representations for Better Semantics and Faster Retrieval[OL]. arXiv Preprint, arXiv: 2103.15316.
[11] Gao T Y, Yao X C, Chen D Q. SimCSE: Simple Contrastive Learning of Sentence Embeddings[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021: 6894-6910.
[12] Shen Y L, He X D, Gao J F, et al. Learning Semantic Representations Using Convolutional Neural Networks for Web Search[C]// Proceedings of the 23rd International Conference on World Wide Web. 2014: 373-374.
[13] Shen Y L, He X D, Gao J F, et al. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval[C]// Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014: 101-110.
[14] Chen Q, Zhu X D, Ling Z H, et al. Enhanced LSTM for Natural Language Inference[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017: 1657-1668.
[15] Wang Z G, Hamza W, Florian R. Bilateral Multi-Perspective Matching for Natural Language Sentences[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017: 4144-4150.
[16] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[OL]. arXiv Preprint, arXiv:1706.03762.
[17] Chen H J, Cai D, Dai W, et al. Charge-Based Prison Term Prediction with Deep Gating Network[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2019: 6362-6367.
[18] Tran V le Nguyen M, Satoh K. Building Legal Case Retrieval Systems with Lexical Matching and Summarization Using a Pre-Trained Phrase Scoring Model[C]// Proceedings of the 17th International Conference on Artificial Intelligence and Law. 2019: 275-282.
[19] 李佳敏, 刘兴波, 聂秀山, 等. 三元组深度哈希学习的司法案例相似匹配方法[J]. 智能系统学报, 2020, 15(6): 1147-1153.
[19] ( Li Jiamin, Liu Xingbo, Nie Xiushan, et al. Triplet Deep Hashing Learning for Judicial Case Similarity Matching Method[J]. CAAI Transactions on Intelligent Systems, 2020, 15(6): 1147-1153.)
[20] Jing L L, Tian Y L. Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(11): 4037-4058.
doi: 10.1109/TPAMI.2020.2992393
[21] Miyato T, Dai A M, Goodfellow I. Adversarial Training Methods for Semi-Supervised Text Classification[OL]. arXiv Preprint, arXiv: 1605.07725.
[22] Zhong H, Zhang Z, Liu Z, et al. Open Chinese Language Pre-Trained Model Zoo[R]. 2019.
[1] Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[2] Pan Huiping, Li Baoan, Zhang Le, Lv Xueqiang. Extracting Keywords from Government Work Reports with Multi-feature Fusion[J]. 数据分析与知识发现, 2022, 6(5): 54-63.
[3] Xiao Yuejun, Li Honglian, Zhang Le, Lv Xueqiang, You Xindong. Classifying Chinese Patent Texts with Feature Fusion[J]. 数据分析与知识发现, 2022, 6(4): 49-59.
[4] Yang Lin, Huang Xiaoshuo, Wang Jiayang, Ding Lingling, Li Zixiao, Li Jiao. Identifying Subtypes of Clinical Trial Diseases with BERT-TextCNN[J]. 数据分析与知识发现, 2022, 6(4): 69-81.
[5] Guo Hangcheng, He Yanqing, Lan Tian, Wu Zhenfeng, Dong Cheng. Identifying Moves from Scientific Abstracts Based on Paragraph-BERT-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 298-307.
[6] Zhang Yunqiu, Wang Yang, Li Bocheng. Identifying Named Entities of Chinese Electronic Medical Records Based on RoBERTa-wwm Dynamic Fusion Model[J]. 数据分析与知识发现, 2022, 6(2/3): 242-250.
[7] Wang Yongsheng, Wang Hao, Yu Wei, Zhou Zeyu. Extracting Relationship Among Characters from Local Chronicles with Text Structures and Contents[J]. 数据分析与知识发现, 2022, 6(2/3): 318-328.
[8] Xie Xingyu, Yu Bengong. Automatic Classification of E-commerce Comments with Multi-Feature Fusion Model[J]. 数据分析与知识发现, 2022, 6(1): 101-112.
[9] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[10] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[11] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[12] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[13] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[14] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[15] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938