Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (6): 66-74    DOI: 10.11925/infotech.2096-3467.2018.1226
Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model
Han Huang1,Hongyu Wang2,3,Xiaoguang Wang2,3()
1(School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China)
2(Center for Studies of Information Resource, Wuhan University, Wuhan 430072, China)
3(School of Information Management, Wuhan University, Wuhan 430072, China)
[Objective] This paper tries to identify legal terminologies automatically from the large-scale legal texts, aiming to structuralize legal big data. [Methods] We used the Conditional Random Field model as the classifier of the Active Learning algorithm, and then identify legal terms. Once the corpus was clustered by K-means, we extracted the initial list used to initiate the Active Learning algorithm with stratified sampling. Entropy was used as the basis of sample selection for Active Learning. The learning and sample selection process of active learning were carried out iteratively until the harmonic mean F value of the model was stabilized. Finally, the legal domain entity recognition model (AL-CRF) was generated. [Results] We ran the proposed model with Chinese judgment documents and found the precision and recall rates of AL-CRF model reached more than 90%, and its F value was 4.85% higher than that of the CRF model with equal labeling workload training. [Limitations] K-means clustering method is sensitive to noise and outliers, which may affect performance of the model. [Conclusions] The conditional random fields combined with active learning could reduce the workload with low-quality samples and ensure the recognition quality.

Key wordsLegal Text      Named Entity Recognition      Active Learning      Conditional Random Field      Sample Selection     
Received: 06 November 2018      Published: 15 August 2019

Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model. Data Analysis and Knowledge Discovery, 2019, 3(6): 66-74.

