Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model
Han Huang1,Hongyu Wang2,3,Xiaoguang Wang2,3()
1(School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China) 2(Center for Studies of Information Resource, Wuhan University, Wuhan 430072, China) 3(School of Information Management, Wuhan University, Wuhan 430072, China)
[Objective] This paper tries to identify legal terminologies automatically from the large-scale legal texts, aiming to structuralize legal big data. [Methods] We used the Conditional Random Field model as the classifier of the Active Learning algorithm, and then identify legal terms. Once the corpus was clustered by K-means, we extracted the initial list used to initiate the Active Learning algorithm with stratified sampling. Entropy was used as the basis of sample selection for Active Learning. The learning and sample selection process of active learning were carried out iteratively until the harmonic mean F value of the model was stabilized. Finally, the legal domain entity recognition model (AL-CRF) was generated. [Results] We ran the proposed model with Chinese judgment documents and found the precision and recall rates of AL-CRF model reached more than 90%, and its F value was 4.85% higher than that of the CRF model with equal labeling workload training. [Limitations] K-means clustering method is sensitive to noise and outliers, which may affect performance of the model. [Conclusions] The conditional random fields combined with active learning could reduce the workload with low-quality samples and ensure the recognition quality.
黄菡,王宏宇,王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别*[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model. Data Analysis and Knowledge Discovery, 2019, 3(6): 66-74.
Simmons R.Quantifying Criminal Procedure: How to Unlock the Potential of Big Data in Our Criminal Justice System[J]. Social Science Electronic Publishing, 2016(1): 947-1017.
[2]
Moses L B, Chan J.Using Big Data for Legal and Law Enforcement Decisions: Testing the New Tools[J]. Social Science Electronic Publishing, 2014, 37(2): 643-678.
[3]
Ferguson A G.The Big Data Jury[J]. Notre Dame Law Review, 2016, 91(3): 935-1006.
[4]
左卫民. 迈向大数据法律研究[J]. 法学研究, 2018, 40(4): 139-150.
[4]
(Zuo Weimin.Towards Big Data Based Legal Research[J]. Chinese Journal of Law, 2018, 40(4): 139-150.)
(Zuo Weimin.Some Thoughts on the Application Prospect of Artificial Intelligence in Chinese Legal Field[J]. Tsinghua University Law Journal, 2018, 12(2): 108-124.)
(Sun Zhen, Wang Huilin.Overview on the Advance of the Research on Named Entity Recognition[J]. New Technology of Library and Information Service, 2010(6): 42-47.)
[7]
Goyal A, Gupta V, Kumar M.Recent Named Entity Recognition and Classification Techniques: A Systematic Review[J]. Computer Science Review, 2018, 29: 21-43.
[8]
Chinchor N.MUC-6 Named Entity Task Definition (Version 2.1)[C]// Proceedings of the 6th Conference on Message Under-Standing. 1995.
(Tang Huihui, Wang Hao, Zhang Zixuan, et al.Extracting Names of Historical Events Based on Chinese Character Tags[J]. Data Analysis and Knowledge Discovery, 2018, 2(7): 89-100.)
[10]
Bikel D M, Miller S, Schwartz R, et al.Nymble: A High-Performance Learning Name-Finder[C]// Proceedings of the 5th Conference on Applied Natural Language Processing. Strouds-burg: Association for Computational Linguistics, 1997: 194-201.
(Cen Yonghua, Han Zhe, Ji Peipei.Chinese Term Recognition Based on Hidden Markov Model[J]. New Technology of Library and Information Service, 2008(12): 54-58.)
(Zhang Yuejie, Xu Zhiting, Xue Xiangyang.Fusion of Multiple Features for Chinese Named Entity Recognition Based on Maximum Entropy Model[J]. Journal of Computer Research and Development, 2008, 45(6): 1004-1010.)
[13]
Borthwick A E.A Maximum Entropy Approach to Named Entity Recognition[D]. New York: New York University, 1999.
[14]
Isozaki H, Kazawa H.Efficient Support Vector Classifiers for Named Entity Recognition[C]// Proceedings of the 19th International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2002, 1: 1-7.
(Zhang Chuanyan, Hong Xiaoguang, Peng Zhaohui, et al.Extracting Web Entity Activities Based on SVM and Extended Conditional Random Fields[J]. Journal of Software, 2012, 23(10): 2612-2627.)
(Zhou Junsheng, Dai Xinyu, Yin Cunyan, et al.Automatic Recognition of Chinese Organization Name Based on Cascaded Conditional Random Fields[J]. Acta Electronica Sinica, 2006, 34(5): 804-809.)
(Li Xiang, Wei Xiaohong, Jia Lu, et al.Recognition of Crops, Diseases and Pesticides Named Entities in Chinese Based on Conditional Random Fields[J]. Transactions of the Chinese Society for Agricultural Machinery, 2017, 48(S1): 178-185.)
(Zhu Nana, Jing Dong, Xue Han.A Deep Neural Network for Book Title Identification in Microblog[J]. Library and Information Service, 2016, 60(4): 102-106.)
(Sun Juanjuan, Yu Hong, Feng Yanhong, et al.Recognition of Nominated Fishery Domain Entity Based on Deep Learning Architectures[J]. Journal of Dalian Ocean University, 2018, 33(2): 265-269.)
[20]
Wei Q K, Chen T, Xu R F, et al. Disease Named Entity Recognition by Combining Conditional Random Fields and Bidirectional Recurrent Neural Networks[J]. Database, 2016, 2016: Article No. 140.
[21]
王礼敏. 面向法律文书的中文命名实体识别方法研究[D]. 苏州: 苏州大学, 2018.
[21]
(Wang Limin.Research on Chinese Named Entity Recognition for Legal Documents[D]. Suzhou: Soochow University, 2018.)
(Zhou Xiaohui.Design and Implementation of a Hidden Markov Model Based Model for Legal Named Entity Recognition[D]. Guangzhou: South China University of Technology, 2017.)
(Zhang Lin, Qin Ce, Ye Wenhao.Automatic Recognition of Legal Language Entities Based on Conditional Random Fields[J]. Data Analysis and Knowledge Discovery, 2017, 1(11): 46-52.)
(Xu Jianzhong, Zhu Jun, Zhao Rui, et al.Recognition of Discontinuous Law Entities Based on Hypergraph[J]. Information Technology and Informatization, 2017(5): 19-22.)
(Yang Wenzhu, Tian Xiaoxiao, Wang Sile, et al.Recent Advances in Active Learning Algorithms[J]. Journal of Hebei University: Natural Science Edition, 2017, 37(2): 216-224.)
[26]
程志. 对裁判文书改革与深化的研究[J]. 当代法学, 2002(11): 117-120.
[26]
(Cheng Zhi.Research on Reforming and Deepening of Judgment Documents[J]. Contemporary Law Review, 2002(11): 117-120.)