Automatic Recognition of Legal Language Entities Based on Conditional Random Fields
Zhang Lin1(), Qin Ce2, Ye Wenhao1
1 College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China 2 School of Law, Nanjing Normal University, Nanjing 210023, China
[Objective] This paper aims to automatically identify the Legal Language Entities, which lays foundations for text mining of the Judgements. [Methods] First, we used a crawler to retrieve the needed data and manually marked the corpus. Then, we applied the NLPIR to load the legal field dictionary for corpus segmentation. Finally, we constructed the feature template based on the conditional random field and automatically recognize the Legal Language Entities. [Results] The conditional random field model with internal and external features of Legal Language could automatically identify the legal words, and its harmonic mean was over 90%. [Limitations] The proposed model has some limitations in field expansion. [Conclusions] It is feasible to automatically extract Legal Language Entities with the help of conditional random fields.
张琳, 秦策, 叶文豪. 基于条件随机场的法言法语实体自动识别模型研究*[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
Zhang Lin,Qin Ce,Ye Wenhao. Automatic Recognition of Legal Language Entities Based on Conditional Random Fields. Data Analysis and Knowledge Discovery, 2017, 1(11): 46-52.
(Xiong Xiaomei, Liu Yonglang.Application of Quadratic Dimension Reduction Method Based on LSA in Classification of the Chinese Legal Text[J]. Electronic Measurement Technology, 2007, 30(10): 111-114.)
doi: 10.3969/j.issn.1002-7300.2007.10.032
(Cheng Chunhui, He Qinming.Naive Bayes Based Criminal Text Classification of Unbalanced Classes[J]. Computer Engineering and Applications, 2009, 45(35): 126-128, 131.)
doi: 10.3778/j.issn.1002-8331.2009.35.038
(She Guiqing, Zhang Yongan.Study on the Model of Automatic Extraction and Annotation of Trail Cases[J]. New Technology of Library and Information Service, 2013(6): 23-29.)
(Ma Chao, Yu Xiaohong, He Haibo.Big Data Analysis: Public Report of China Judgements Online[J]. China Law Review, 2016(4): 195-246.)
[7]
Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. 1991: 29-32.
[8]
Grishman R, Sundheim B.Message Understanding Conference-6: A Brief Histroy[C]// Proceedings of the 16th International Conference on Computational Linguistics (COLING-96). 1996: 466-471.
[9]
Bikel D M, Schwartz R, Weischedel R M.An Algorithm that Learns What’s in a Name[J]. Machine Learning, 1999, 34(1-3): 211-231.
doi: 10.1023/A:1007558221122
[10]
Chen H H, Ding Y W, Tsai S C, et al.Description of the NTU System Used for MET2[C]//Proceedings of the 7th Message Understanding Conference, 1998.
[11]
Yu S H, Bai S H, Wu P.Description of the Kent Ridge Digital Lads System Used for MUC-7[C]// Proceedings of the 7th Message Understanding Conference, 1998.
[12]
Wikipedia: Named Entity Recognition[EB/OL]. [2017-02- 03]..
(Sun Maosong, Huang Changning, Gao Haiyan, et al.Identifying Chinese Names in Unrestricted Texts[J]. Journal of Chinese Information Processing, 1995, 9(2): 16-27.)
(Yu Hongkui, Zhang Huaping, Liu Qun, et al.Chinese Named Entity Identification Using Cascaded Hidden Markov Model[J]. Journal on Communications, 2006, 27(2): 87-93.)
doi: 10.3321/j.issn:1000-436X.2006.02.013
(Tang Xuri, Chen Xiaohe, Xu Chao, et al.Discourse-Based Chinese Location Name Recognition[J]. Journal of Chinese Information Processing, 2010, 24(2): 24-32.)
doi: 10.3969/j.issn.1003-0077.2010.02.003
(Ju Jiupeng, Zhang Weiwei, Ning Jianjun, et al.Geospatial Named Entities Recognition Using Combination of CRF and Rules[J]. Computer Engineering, 2011, 37(7): 210-212, 215.)
doi: 10.3969/j.issn.1000-3428.2011.07.071
(Ye Feng, Chen Yingying, Zhou Gengui, et al.Intelligent Recognition of Named Entity in Electronic Medical Records[J]. Chinese Journal of Biomedical Engineering, 2011, 30(2): 256-262.)
doi: 10.3969/j.issn.0258-8021.2011.02.014
(Wang Chunyu, Wang Fang.Study on Recognition of Chinese Agricultural Named Entity with Conditional Random Fields[J]. Journal of Agricultural University of Hebei, 2014, 37(1): 132-135.)
(Sui Mingshuang, Cui Lei.Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. New Technology of Library and Information Service, 2016(10): 91-97.)
(Wang Dongbo, Wu Yi, Ye Wenhao, et al.Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. Data Analysis and Knowledge Discovery, 2017(3): 54-61.)
[21]
吴云芳. 面向语言信息处理的现代汉语并列结构研究[M]. 北京: 北京师范大学出版社, 2004.
[21]
(Wu Yunfang.Researches of Modern Chinese Coordinate Construction for Language Information Processing[M]. Beijing: Beijing Normal University Press, 2004.)
[22]
Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown: Williams College, 2001: 282-289.
[23]
McCallum A, Freitag D, Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of the 17th International Conference on Machine Learning. 2000: 591-598.