Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (11): 46-52    DOI: 10.11925/infotech.2096-3467.2017.0442
Orginal Article Current Issue | Archive | Adv Search |
Automatic Recognition of Legal Language Entities Based on Conditional Random Fields
Zhang Lin1(), Qin Ce2, Ye Wenhao1
1 College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
2 School of Law, Nanjing Normal University, Nanjing 210023, China
Download: PDF (460 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to automatically identify the Legal Language Entities, which lays foundations for text mining of the Judgements. [Methods] First, we used a crawler to retrieve the needed data and manually marked the corpus. Then, we applied the NLPIR to load the legal field dictionary for corpus segmentation. Finally, we constructed the feature template based on the conditional random field and automatically recognize the Legal Language Entities. [Results] The conditional random field model with internal and external features of Legal Language could automatically identify the legal words, and its harmonic mean was over 90%. [Limitations] The proposed model has some limitations in field expansion. [Conclusions] It is feasible to automatically extract Legal Language Entities with the help of conditional random fields.

Key wordsJudgements      Conditional Random Field Model      Legal Language Entity     
Received: 19 May 2017      Published: 27 November 2017
ZTFLH:  G350  

Cite this article:

Zhang Lin,Qin Ce,Ye Wenhao. Automatic Recognition of Legal Language Entities Based on Conditional Random Fields. Data Analysis and Knowledge Discovery, 2017, 1(11): 46-52.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0442     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I11/46

实体
长度
数量(个) 实体
长度
数量(个) 实体
长度
数量(个) 实体
长度
数量(个)
2 39 803 7 1 210 12 93 17 25
3 23 017 8 444 13 59 18 4
4 26 555 9 309 14 41 19 19
5 6 488 10 316 15 26 20 1
6 1 671 11 22 16 25 21 4
左边界词分布 右边界词分布
词长度 频率 词长度 频率
1 17.57% 1 29.82%
2 81.52% 2 63.28%
3 0.68% 3 6.07%
4 0.22% 4 0.83%
词语 词性 词长度 是否
实体词
是否
左边界
是否
右边界
标记
作案 vi 2 Y Y Y S
ng 1 N N N S
具备 v 2 N N N S
刑事 b 2 Y Y N B
责任 n 2 Y N N M
能力 n 2 Y N Y E
, wd 1 N N N S
应予 v 2 N N N S
严惩 v 2 N N N S
编号 模板 模板含义
1 %x[-2, 0] 当前词的前2个词
2 %x[-1, 0] 当前词的前1个词
3 %x[0, 0] 当前词
4 %x[1, 0] 当前词的后1个词
5 %x[2, 0] 当前词的后2个词
6 %x[-2, 0]/%x[-1, 0] 前2个词到前1个词的转移概率
7 %x[-1, 0]/%x[0, 0] 前1个词到当前词的转移概率
8 %x[0, 0]/%x[1, 0] 当前词到后1个词的转移概率
编号 P R F
1 0.957209 0.974524 0.965789
2 0.934819 0.951670 0.943169
3 0.942223 0.959492 0.950779
4 0.934009 0.950114 0.941992
5 0.933376 0.948381 0.940819
6 0.938468 0.949555 0.943979
7 0.939941 0.949402 0.944647
8 0.942211 0.949419 0.945801
9 0.944823 0.950231 0.947519
10 0.945409 0.949339 0.947370
均值 0.941249 0.953213 0.947186
编号 P R F
1 0.835947 0.883422 0.859029
2 0.885392 0.915164 0.900032
3 0.890849 0.927982 0.909037
4 0.902713 0.930428 0.916361
5 0.915151 0.934568 0.924758
6 0.921697 0.939949 0.930733
7 0.928558 0.942517 0.935485
8 0.931797 0.943780 0.937750
9 0.935462 0.945968 0.940686
10 0.937246 0.946705 0.941952
均值 0.908481 0.931048 0.919582
[1] 中国裁判文书网[EB/OL]. [2016-12-31]. .
[1] (China Judgements Online [EB/OL]. [2016-12-31].
[2] 熊小梅, 刘永浪. 基于LSA 的二次降维法在中文法律案情文本分类中的应用[J]. 电子测量技术, 2007, 30(10): 111-114.
doi: 10.3969/j.issn.1002-7300.2007.10.032
[2] (Xiong Xiaomei, Liu Yonglang.Application of Quadratic Dimension Reduction Method Based on LSA in Classification of the Chinese Legal Text[J]. Electronic Measurement Technology, 2007, 30(10): 111-114.)
doi: 10.3969/j.issn.1002-7300.2007.10.032
[3] 程春惠, 何钦铭. 面向不均衡类别朴素贝叶斯犯罪案件文本分类[J]. 计算机工程与应用, 2009, 45(35): 126-128, 131.
doi: 10.3778/j.issn.1002-8331.2009.35.038
[3] (Cheng Chunhui, He Qinming.Naive Bayes Based Criminal Text Classification of Unbalanced Classes[J]. Computer Engineering and Applications, 2009, 45(35): 126-128, 131.)
doi: 10.3778/j.issn.1002-8331.2009.35.038
[4] 佘贵清, 张永安. 审判案例自动抽取与标注模型研究[J]. 现代图书情报技术, 2013(6): 23-29.
[4] (She Guiqing, Zhang Yongan.Study on the Model of Automatic Extraction and Annotation of Trail Cases[J]. New Technology of Library and Information Service, 2013(6): 23-29.)
[5] 张忠民. 生态破坏的司法救济——基于5792份环境裁判文书样本的分析[J]. 法学, 2016(10): 111-124.
[5] (Zhang Zhongmin. Judicial Relief of Ecological Destruction - An Analysis Based on5792 Environmental Judgements[J]. Law Science, 2016(10): 111-124.)
[6] 马超, 于晓虹, 何海波. 大数据分析: 中国司法裁判文书上网公开报告[J]. 中国法律评论, 2016(4): 195-246.
[6] (Ma Chao, Yu Xiaohong, He Haibo.Big Data Analysis: Public Report of China Judgements Online[J]. China Law Review, 2016(4): 195-246.)
[7] Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. 1991: 29-32.
[8] Grishman R, Sundheim B.Message Understanding Conference-6: A Brief Histroy[C]// Proceedings of the 16th International Conference on Computational Linguistics (COLING-96). 1996: 466-471.
[9] Bikel D M, Schwartz R, Weischedel R M.An Algorithm that Learns What’s in a Name[J]. Machine Learning, 1999, 34(1-3): 211-231.
doi: 10.1023/A:1007558221122
[10] Chen H H, Ding Y W, Tsai S C, et al.Description of the NTU System Used for MET2[C]//Proceedings of the 7th Message Understanding Conference, 1998.
[11] Yu S H, Bai S H, Wu P.Description of the Kent Ridge Digital Lads System Used for MUC-7[C]// Proceedings of the 7th Message Understanding Conference, 1998.
[12] Wikipedia: Named Entity Recognition[EB/OL]. [2017-02- 03]..
[13] 孙茂松, 黄昌宁, 高海燕, 等. 中文姓名的自动辨识[J]. 中文信息学报, 1995, 9(2): 16-27.
[13] (Sun Maosong, Huang Changning, Gao Haiyan, et al.Identifying Chinese Names in Unrestricted Texts[J]. Journal of Chinese Information Processing, 1995, 9(2): 16-27.)
[14] 俞鸿魁, 张华平, 刘群, 等. 基于层叠隐马尔可夫模型的中文命名实体识别[J]. 通信学报, 2006, 27(2): 87-93.
doi: 10.3321/j.issn:1000-436X.2006.02.013
[14] (Yu Hongkui, Zhang Huaping, Liu Qun, et al.Chinese Named Entity Identification Using Cascaded Hidden Markov Model[J]. Journal on Communications, 2006, 27(2): 87-93.)
doi: 10.3321/j.issn:1000-436X.2006.02.013
[15] 唐旭日, 陈小荷, 许超, 等. 基于篇章的中文地名识别研究[J]. 中文信息学报, 2010, 24(2): 24-32.
doi: 10.3969/j.issn.1003-0077.2010.02.003
[15] (Tang Xuri, Chen Xiaohe, Xu Chao, et al.Discourse-Based Chinese Location Name Recognition[J]. Journal of Chinese Information Processing, 2010, 24(2): 24-32.)
doi: 10.3969/j.issn.1003-0077.2010.02.003
[16] 鞠久朋, 张伟伟, 宁建军, 等. CRF与规则相结合的地理空间命名实体识别[J]. 计算机工程, 2011, 37(7): 210-212, 215.
doi: 10.3969/j.issn.1000-3428.2011.07.071
[16] (Ju Jiupeng, Zhang Weiwei, Ning Jianjun, et al.Geospatial Named Entities Recognition Using Combination of CRF and Rules[J]. Computer Engineering, 2011, 37(7): 210-212, 215.)
doi: 10.3969/j.issn.1000-3428.2011.07.071
[17] 叶枫, 陈莺莺, 周根贵, 等. 电子病历中命名实体的智能识别[J]. 中国生物医学工程学报, 2011, 30(2): 256-262.
doi: 10.3969/j.issn.0258-8021.2011.02.014
[17] (Ye Feng, Chen Yingying, Zhou Gengui, et al.Intelligent Recognition of Named Entity in Electronic Medical Records[J]. Chinese Journal of Biomedical Engineering, 2011, 30(2): 256-262.)
doi: 10.3969/j.issn.0258-8021.2011.02.014
[18] 王春雨, 王芳. 基于条件随机场的农业命名实体识别研究[J]. 河北农业大学学报, 2014, 37(1): 132-135.
[18] (Wang Chunyu, Wang Fang.Study on Recognition of Chinese Agricultural Named Entity with Conditional Random Fields[J]. Journal of Agricultural University of Hebei, 2014, 37(1): 132-135.)
[19] 隋明爽, 崔雷. 结合多种特征的CRF模型用于化学物质-疾病命名实体识别[J]. 现代图书情报技术, 2016(10): 91-97.
[19] (Sui Mingshuang, Cui Lei.Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. New Technology of Library and Information Service, 2016(10): 91-97.)
[20] 王东波, 吴毅, 叶文豪, 等. 多特征知识下的食品安全事件实体抽取研究[J]. 数据分析与知识发现, 2017(3): 54-61.
[20] (Wang Dongbo, Wu Yi, Ye Wenhao, et al.Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. Data Analysis and Knowledge Discovery, 2017(3): 54-61.)
[21] 吴云芳. 面向语言信息处理的现代汉语并列结构研究[M]. 北京: 北京师范大学出版社, 2004.
[21] (Wu Yunfang.Researches of Modern Chinese Coordinate Construction for Language Information Processing[M]. Beijing: Beijing Normal University Press, 2004.)
[22] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown: Williams College, 2001: 282-289.
[23] McCallum A, Freitag D, Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of the 17th International Conference on Machine Learning. 2000: 591-598.
[1] Jian DU. Measuring Uncertainty of Medical Knowledge: A Literature Review [J]. 数据分析与知识发现, 0, (): 1-.
[2] Nie Lei,Fu Juan,Yi Chengqi,Yang Daoling. Measuring Enterprise’s Offline Resumption with Mobile Device Positioning Data[J]. 数据分析与知识发现, 2020, 4(7): 38-49.
[3] Yue Lixin,Liu Ziqiang,Hu Zhengyin. Evolution Analysis of Hot Topics with Trend-Prediction[J]. 数据分析与知识发现, 2020, 4(6): 22-34.
[4] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[5] Liang Yanping,An Lu,Liu Jing. Topic Resonance of Micro-blogs on Similar Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[6] Deng Jiangao,Zhang Xuan,Fu Zhu,Wei Qingming. Tracking Online Public Opinion Based on System Dynamics: Case Study of “Xiangshui Explosion Accident”[J]. 数据分析与知识发现, 2020, 4(2/3): 110-121.
[7] Zhe Hu,Xianjin Zha,Yalan Yan. Interactive Behaviors of Online Health Community Users in Emergency[J]. 数据分析与知识发现, 2019, 3(12): 10-20.
[8] Guanghui Ye,Jinqing Yang. Route Recommendation Based on Two-way Link Analysis of Urban Name Entities[J]. 数据分析与知识发现, 2019, 3(11): 79-88.
[9] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[10] Bowen Liu,Rujiang Bai,Yanting Zhou,Xiaoyue Wang. Identifying Frontier Topics from Funding and Paper——Case Study of Carbon Nanotube[J]. 数据分析与知识发现, 2019, 3(8): 114-122.
[11] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[12] Shiqi Deng,Liang Hong. Constructing Domain Ontology for Intelligent Applications: Case Study of Anti Tele-Fraud[J]. 数据分析与知识发现, 2019, 3(7): 73-84.
[13] Zhai Dongsheng,Cai Wenhao,Zhang Jie,Li Zhenfei. An Improved Method of Semantic Similarity Calculation of Chinese Trademarks[J]. 数据分析与知识发现, 2017, 1(11): 19-28.
[14] Wu Cong,Zhao Yuxiang,Zhu Qinghua. Analyzing Crowdfunding Videos Based on Task Presentation——Case Study of zhongchou.com[J]. 数据分析与知识发现, 2017, 1(10): 64-76.
[15] Han Pu,Wang Peng. Simulating Public Opinion Evolution with Scale-Free Network Model and Infectious Disease Model[J]. 数据分析与知识发现, 2017, 1(10): 53-63.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn