|
|
Automatic Recognition of Legal Language Entities Based on Conditional Random Fields |
Zhang Lin1(), Qin Ce2, Ye Wenhao1 |
1 College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China 2 School of Law, Nanjing Normal University, Nanjing 210023, China |
|
|
Abstract [Objective] This paper aims to automatically identify the Legal Language Entities, which lays foundations for text mining of the Judgements. [Methods] First, we used a crawler to retrieve the needed data and manually marked the corpus. Then, we applied the NLPIR to load the legal field dictionary for corpus segmentation. Finally, we constructed the feature template based on the conditional random field and automatically recognize the Legal Language Entities. [Results] The conditional random field model with internal and external features of Legal Language could automatically identify the legal words, and its harmonic mean was over 90%. [Limitations] The proposed model has some limitations in field expansion. [Conclusions] It is feasible to automatically extract Legal Language Entities with the help of conditional random fields.
|
Received: 19 May 2017
Published: 27 November 2017
|
|
[1] |
中国裁判文书网[EB/OL]. [2016-12-31]. .
|
[1] |
(China Judgements Online [EB/OL]. [2016-12-31].
|
[2] |
熊小梅, 刘永浪. 基于LSA 的二次降维法在中文法律案情文本分类中的应用[J]. 电子测量技术, 2007, 30(10): 111-114.
doi: 10.3969/j.issn.1002-7300.2007.10.032
|
[2] |
(Xiong Xiaomei, Liu Yonglang.Application of Quadratic Dimension Reduction Method Based on LSA in Classification of the Chinese Legal Text[J]. Electronic Measurement Technology, 2007, 30(10): 111-114.)
doi: 10.3969/j.issn.1002-7300.2007.10.032
|
[3] |
程春惠, 何钦铭. 面向不均衡类别朴素贝叶斯犯罪案件文本分类[J]. 计算机工程与应用, 2009, 45(35): 126-128, 131.
doi: 10.3778/j.issn.1002-8331.2009.35.038
|
[3] |
(Cheng Chunhui, He Qinming.Naive Bayes Based Criminal Text Classification of Unbalanced Classes[J]. Computer Engineering and Applications, 2009, 45(35): 126-128, 131.)
doi: 10.3778/j.issn.1002-8331.2009.35.038
|
[4] |
佘贵清, 张永安. 审判案例自动抽取与标注模型研究[J]. 现代图书情报技术, 2013(6): 23-29.
|
[4] |
(She Guiqing, Zhang Yongan.Study on the Model of Automatic Extraction and Annotation of Trail Cases[J]. New Technology of Library and Information Service, 2013(6): 23-29.)
|
[5] |
张忠民. 生态破坏的司法救济——基于5792份环境裁判文书样本的分析[J]. 法学, 2016(10): 111-124.
|
[5] |
(Zhang Zhongmin. Judicial Relief of Ecological Destruction - An Analysis Based on5792 Environmental Judgements[J]. Law Science, 2016(10): 111-124.)
|
[6] |
马超, 于晓虹, 何海波. 大数据分析: 中国司法裁判文书上网公开报告[J]. 中国法律评论, 2016(4): 195-246.
|
[6] |
(Ma Chao, Yu Xiaohong, He Haibo.Big Data Analysis: Public Report of China Judgements Online[J]. China Law Review, 2016(4): 195-246.)
|
[7] |
Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. 1991: 29-32.
|
[8] |
Grishman R, Sundheim B.Message Understanding Conference-6: A Brief Histroy[C]// Proceedings of the 16th International Conference on Computational Linguistics (COLING-96). 1996: 466-471.
|
[9] |
Bikel D M, Schwartz R, Weischedel R M.An Algorithm that Learns What’s in a Name[J]. Machine Learning, 1999, 34(1-3): 211-231.
doi: 10.1023/A:1007558221122
|
[10] |
Chen H H, Ding Y W, Tsai S C, et al.Description of the NTU System Used for MET2[C]//Proceedings of the 7th Message Understanding Conference, 1998.
|
[11] |
Yu S H, Bai S H, Wu P.Description of the Kent Ridge Digital Lads System Used for MUC-7[C]// Proceedings of the 7th Message Understanding Conference, 1998.
|
[12] |
Wikipedia: Named Entity Recognition[EB/OL]. [2017-02- 03]..
|
[13] |
孙茂松, 黄昌宁, 高海燕, 等. 中文姓名的自动辨识[J]. 中文信息学报, 1995, 9(2): 16-27.
|
[13] |
(Sun Maosong, Huang Changning, Gao Haiyan, et al.Identifying Chinese Names in Unrestricted Texts[J]. Journal of Chinese Information Processing, 1995, 9(2): 16-27.)
|
[14] |
俞鸿魁, 张华平, 刘群, 等. 基于层叠隐马尔可夫模型的中文命名实体识别[J]. 通信学报, 2006, 27(2): 87-93.
doi: 10.3321/j.issn:1000-436X.2006.02.013
|
[14] |
(Yu Hongkui, Zhang Huaping, Liu Qun, et al.Chinese Named Entity Identification Using Cascaded Hidden Markov Model[J]. Journal on Communications, 2006, 27(2): 87-93.)
doi: 10.3321/j.issn:1000-436X.2006.02.013
|
[15] |
唐旭日, 陈小荷, 许超, 等. 基于篇章的中文地名识别研究[J]. 中文信息学报, 2010, 24(2): 24-32.
doi: 10.3969/j.issn.1003-0077.2010.02.003
|
[15] |
(Tang Xuri, Chen Xiaohe, Xu Chao, et al.Discourse-Based Chinese Location Name Recognition[J]. Journal of Chinese Information Processing, 2010, 24(2): 24-32.)
doi: 10.3969/j.issn.1003-0077.2010.02.003
|
[16] |
鞠久朋, 张伟伟, 宁建军, 等. CRF与规则相结合的地理空间命名实体识别[J]. 计算机工程, 2011, 37(7): 210-212, 215.
doi: 10.3969/j.issn.1000-3428.2011.07.071
|
[16] |
(Ju Jiupeng, Zhang Weiwei, Ning Jianjun, et al.Geospatial Named Entities Recognition Using Combination of CRF and Rules[J]. Computer Engineering, 2011, 37(7): 210-212, 215.)
doi: 10.3969/j.issn.1000-3428.2011.07.071
|
[17] |
叶枫, 陈莺莺, 周根贵, 等. 电子病历中命名实体的智能识别[J]. 中国生物医学工程学报, 2011, 30(2): 256-262.
doi: 10.3969/j.issn.0258-8021.2011.02.014
|
[17] |
(Ye Feng, Chen Yingying, Zhou Gengui, et al.Intelligent Recognition of Named Entity in Electronic Medical Records[J]. Chinese Journal of Biomedical Engineering, 2011, 30(2): 256-262.)
doi: 10.3969/j.issn.0258-8021.2011.02.014
|
[18] |
王春雨, 王芳. 基于条件随机场的农业命名实体识别研究[J]. 河北农业大学学报, 2014, 37(1): 132-135.
|
[18] |
(Wang Chunyu, Wang Fang.Study on Recognition of Chinese Agricultural Named Entity with Conditional Random Fields[J]. Journal of Agricultural University of Hebei, 2014, 37(1): 132-135.)
|
[19] |
隋明爽, 崔雷. 结合多种特征的CRF模型用于化学物质-疾病命名实体识别[J]. 现代图书情报技术, 2016(10): 91-97.
|
[19] |
(Sui Mingshuang, Cui Lei.Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. New Technology of Library and Information Service, 2016(10): 91-97.)
|
[20] |
王东波, 吴毅, 叶文豪, 等. 多特征知识下的食品安全事件实体抽取研究[J]. 数据分析与知识发现, 2017(3): 54-61.
|
[20] |
(Wang Dongbo, Wu Yi, Ye Wenhao, et al.Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. Data Analysis and Knowledge Discovery, 2017(3): 54-61.)
|
[21] |
吴云芳. 面向语言信息处理的现代汉语并列结构研究[M]. 北京: 北京师范大学出版社, 2004.
|
[21] |
(Wu Yunfang.Researches of Modern Chinese Coordinate Construction for Language Information Processing[M]. Beijing: Beijing Normal University Press, 2004.)
|
[22] |
Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown: Williams College, 2001: 282-289.
|
[23] |
McCallum A, Freitag D, Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of the 17th International Conference on Machine Learning. 2000: 591-598.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|