Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (7): 10-25     https://doi.org/10.11925/infotech.2096-3467.2020.1230
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
文本表示及其特征生成对法律判决书中多类型实体识别的影响分析
王昊,林克柔(),孟镇,李心蕾
南京大学信息管理学院 南京 210023
江苏省数据工程与知识服务重点实验室 南京 210023
Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation
Wang Hao,Lin Kerou(),Meng Zhen,Li Xinlei
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
全文: PDF (1196 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 探索法律判决书中不同模型的实体识别效果,为法律知识库的构建奠定基础。【方法】 提取刑事判决书中的庭审过程和法院意见构造数据集,比较人工构造特征的CRFs模型和加入预训练词向量做文本表示的自动生成特征的IDCNN-CRFs模型与BiLSTM-CRFs模型的实体识别效果,并在少量其他类型法律判决书文本上比较模型的迁移能力。【结果】 ALBERT-BiLSTM-CRFs模型实体识别效果最好,F1微平均值达95.28%;IDCNN-CRFs模型的识别效果低于前者,但训练时间是前者的1/6,两个模型均具有较好的迁移能力。【局限】 识别的实体多为通用实体,后续考虑标注更多领域特有实体,增强研究对实际应用的参考价值。【结论】 法律判决书的实体识别中,ALBERT-BiLSTM-CRFs和IDCNN-CRFs模型比CRFs模型效果更好,且迁移能力更强。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王昊
林克柔
孟镇
李心蕾
关键词 法律判决书特征生成条件随机场IDCNN-CRFsALBERT-BiLSTM-CRFs    
Abstract

[Objective] This paper investigates the performance of entity recognition models for legal judgments, aiming to construct better legal knowledge base in the future. [Methods] First, we extracted the court trial process and court opinions from criminal judgment texts to build an experimental dataset. Then, we compared the entity recognition results of the CRFs model (with artificially constructed features), the IDCNN-CRFs model (with automatically generated features), and the BiLSTM-CRFs model. Both of the IDCNN-CRFs and BiLSTM-CRFs models used pre-trained word vectors for their char embedding. The models’ transferred abilities on other types of legal judgment texts were also compared. [Results] The ALBERT-BiLSTM-CRFs model had the best recognition performance. Its F1 micro-average value reached 95.28%. However, the training time of the IDCNN-CRFs model was about 1/6 of the ALBERT-BiLSTM-CRFs model. Both models had good transferred abilities. [Limitations] Most of the recognized entities were the general ones. More domain-related entities are needed in future studies to enhance the model’s practical value. [Conclusions] The ALBERT-BiLSTM-CRFs and IDCNN-CRFs models could more effectively recognize entities from legal judgments and show better transferred ability than the CRFs model.

Key wordsLegal Judgments    Feature Generation    CRFs    IDCNN-CRFs    ALBERT-BiLSTM-CRFs
收稿日期: 2020-12-07      出版日期: 2021-08-11
ZTFLH:  TP393  
基金资助:*国家自然科学基金面上项目(72074108);南京大学文科青年跨学科团队专项(2020300093);江苏青年社科英才;南京大学仲英青年学者等人才培养计划
通讯作者: 林克柔,ORCID:0000-0003-0026-8771     E-mail: keroulin@foxmail.com
引用本文:   
王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation. Data Analysis and Knowledge Discovery, 2021, 5(7): 10-25.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1230      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I7/10
Fig.1  研究框架
序号 实体类别 标签 实例
1 罪名 Crime 交通肇事罪
2 人名 Per 张××、程某甲
3 地名 Loc 临清市、永馆路
4 机构名 Org 淮安区人民检察院、淮安市公安局淮安分局物证鉴定室
5 日期 Date 2016年2月11日
6 时间 Time 上午10时30分、晚
Table 1  实体类别及相应实例
序号 实体类别 训练集 测试集
Token数量 实体数量(不去重) 实体数量(去重) Token数量 实体数量(不去重) 实体数量(去重)
1 Crime 3 449 742 58 400 84 17
2 Date 15 560 1 815 977 1 420 171 127
3 Loc 9 964 3 019 1 264 611 198 110
4 Org 15 612 1 888 994 1 412 169 107
5 Per 33 708 12 625 1 348 2 942 1 137 194
6 Time 2 825 831 216 205 67 30
7 O 307 167 25 028
Table 2  训练集和测试集的实体和Token数量
Fig.2  训练集和测试集实体不重复比例
Fig.3  测试集中新实体比例
序号 标签 意义 实例
1 B 实体的开头 “晚”字和“豫”字
2 I 实体除开头外的其他字符 “上9时30分”
3 O 非实体的字 其他字
Table 3  BIO标注体系的解释
序号 标签 意义 实例
1 B 实体的开头 “晚”字
2 I 实体的中间部分 “上9时30”
3 E 实体的结尾 “分”字
4 S 单个字组成的实体 “豫”字
5 O 非实体的字 其他字
Table 4  BIEOS标注体系的解释
Fig.4  CRFs模型设计
Fig.5  IDCNN-CRFs模型结构
序号 参数 默认值
1 Batch Size 32
2 Epoch 100
3 dropout_keep 0.5
4 字嵌入维度 100
5 Filter数量 100
6 Gradient Clip 5
7 学习率 0.001
8 优化器 Adam
Table 5  IDCNN-CRFs模型默认参数
Fig.6  随机数嵌入的BiLSTM-CRFs和BERT-BiLSTM-CRFs模型结构
序号 参数 默认值
1 Batch Size 64
2 Epoch 100
3 LSTM层Units 128
4 LSTM层return_sequences TRUE
5 全连接层Units 64
6 全连接层激活函数 tanh
Table 6  BiLSTM模型默认参数
Fig.7  ALBERT-BiLSTM-CRFs模型结构
序号 实体类别 BIO BIEOS
P R F 1 P R F 1
1 Crime 98.80% 97.62% 98.20% 98.78% 96.43% 97.59%
2 Date 97.48% 90.64% 93.94% 92.55% 87.13% 89.76%
3 Loc 81.28% 76.77% 78.96% 75.89% 68.55% 72.03%
4 Org 89.13% 72.78% 80.13% 87.32% 73.37% 79.74%
5 Per 96.03% 93.58% 94.79% 94.75% 92.00% 93.35%
6 Time 100.00% 89.55% 94.49% 98.46% 95.52% 96.97%
Table 7  不同标注体系实验结果对比
Fig.8  CRFs模型加入不同特征实验的F1值
Fig.9  IDCNN-CRFs模型中Epoch=1时不同Batch Size实验的F1值
Fig.10  IDCNN-CRFs模型中Epoch=10时不同Batch Size实验的F1值
序号 实体类别 Epoch = 1 Epoch = 10 Epoch = 20
1 Crime 48.10% 100.00% 97.62%
2 Date 85.96% 93.29% 96.49%
3 Loc 68.85% 85.20% 84.16%
4 Org 43.48% 81.85% 77.91%
5 Per 93.34% 95.08% 95.47%
6 Time 76.39% 99.25% 94.49%
Table 8  IDCNN-CRFs模型中Bath Size=8时不同Epoch实验的F1值
序号 实体类别 Dropout = 0.4 Dropout = 0.5 Dropout = 0.6 Dropout = 0.7 Dropout = 0.8
1 Crime 98.80% 100.00% 99.40% 100.00% 97.59%
2 Date 93.49% 93.29% 93.53% 93.49% 94.15%
3 Loc 84.97% 85.20% 86.15% 87.11% 85.27%
4 Org 78.79% 81.85% 81.93% 81.08% 79.17%
5 Per 94.08% 95.08% 95.02% 95.78% 96.27%
6 Time 97.71% 99.25% 96.97% 94.49% 92.19%
Table 9  IDCNN-CRFs模型中Batch Size=8且Epoch=10时不同Dropout率实验的F1值
序号 实体类别 Epoch = 1 Epoch = 3 Epoch = 5
1 Crime 81.25% 96.89% 97.44%
2 Date 86.53% 92.04% 94.05%
3 Loc 75.63% 76.59% 80.20%
4 Org 67.68% 75.57% 77.17%
5 Per 93.29% 92.79% 94.43%
6 Time 95.52% 98.49% 99.26%
7 Micro Avg 88.38% 89.78% 91.58%
Table 10  随机数嵌入的BiLSTM-CRFs模型中不同Epoch实验的F1值
序号 Epoch loss acc val _ loss val _ acc
1 1 11.0966 97.56% 209.3262 98.33%
2 2 1.9536 99.46% 198.5077 98.44%
3 3 1.1717 99.63% 188.8806 98.16%
4 4 0.8223 99.72% 178.6167 98.44%
5 5 0.5982 99.78% 170.9082 98.19%
Table 11  Epoch=5时迭代过程中loss和acc的变化
序号 Epoch loss acc val _ loss val _ acc
1 1 11.6478 97.39% 202.4137 98.93%
2 2 1.9299 99.42% 192.3672 99.12%
3 3 1.1491 99.61% 183.2427 99.16%
Table 12  Epoch=3时迭代过程中loss和acc的变化
序号 实体类别 字嵌入方式
随机数 BERT ALBERT
1 Crime 96.89% 100.00% 100.00%
2 Date 92.04% 91.01% 93.53%
3 Loc 76.59% 88.38% 87.28%
4 Org 75.57% 80.97% 83.54%
5 Per 92.79% 99.02% 98.48%
6 Time 98.49% 91.18% 99.25%
7 Micro Avg 89.78% 95.17% 95.28%
Table 13  BiLSTM-CRFs模型中不同字嵌入实验的F1值
Fig.11  三类模型实体识别F1值
序号 模型 结果
类型
实体数量
Crime Date Loc Org Per Time
1 CRFs TP + FP 1 8 16 6 30 1
TP 1 8 14 3 29 1
2 IDCNN-CRFs TP + FP 3 8 15 8 32 1
TP 3 8 14 4 31 1
3 ALBERT-BiLSTM-CRFs TP + FP 3 8 17 5 36 3
TP 3 8 15 4 26 1
Total 7 8 19 5 32 1
Table 14  不同模型新数据实验结果比较
[1] 徐娟, 杜家明. 智慧司法实施的风险及其法律规制[J]. 河北法学, 2020, 38(8):188-200.
[1] (Xu Juan, Du Jiaming. Risks and Legal Regulation of Intelligent Justice Implementation[J]. Hebei Law Science, 2020, 38(8):188-200.)
[2] 徐亚文, 伍德志. 法律修辞、语言游戏与判决合法化——对“判决书上网”的法理思考[J]. 河南省政法管理干部学院学报, 2011, 26(1):11-18.
[2] (Xu Yawen, Wu Dezhi. Legal Eloquence, Language Game and Sentence Legalization——Jurisprudence Thought about “Judgment Online”[J]. Journal of Henan Administrative Institute of Politics and Law, 2011, 26(1):11-18.)
[3] 杨金晶, 覃慧, 何海波. 裁判文书上网公开的中国实践——进展、问题与完善[J]. 中国法律评论, 2019(6):125-147.
[3] (Yang Jinjing, Qin Hui, He Haibo. China’s Practice of Disclosing Judgment Documents Online: Progress, Problems and Improvements[J]. China Law Review, 2019(6):125-147.)
[4] 冯瑞. 基于深度学习的法院裁判文书命名实体识别研究[D]. 成都: 西南财经大学, 2019.
[4] (Feng Rui. Research on Named Entity Recognition of Court Judgment Documents Based on Deep Learning[D]. Chengdu: Southwestern University of Finance and Economics, 2019.)
[5] 谢云. 面向中文法律文本的命名实体识别研究[D]. 南京: 南京师范大学, 2018.
[5] (Xie Yun. Research on Named Entity Recognition for Chinese Legal Texts[D]. Nanjing: Nanjing Normal University, 2018.)
[6] 佘贵清, 张永安. 审判案例自动抽取与标注模型研究[J]. 现代图书情报技术, 2013(6):23-29.
[6] (She Guiqing, Zhang Yongan. Study on the Model of Automatic Extraction and Annotation of Trail Cases[J]. New Technology of Library and Information Service, 2013(6):23-29.)
[7] 王得贤, 王素格, 裴文生, 等. 基于JCWA-DLSTM的法律文书命名实体识别方法[J]. 中文信息学报, 2020, 34(10):51-58.
[7] (Wang Dexian, Wang Suge, Pei Wensheng, et al. Named Entity Recognition Based on JCWA-DLSTM for Legal Instruments[J]. Journal of Chinese Information Processing, 2020, 34(10):51-58.)
[8] Strubell E, Verga P, Belanger D, et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions[OL]. arXiv Preprint, arXiv:1702.02098.
[9] Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv:1508.01991.
[10] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
[11] Devlin J, Chang M-W, Lee K, et al. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
[12] Lan Z Z, Chen M D, Goodman S, et al. ALBERT: A Lite Bert for Self-Supervised Learning of Language Representations[OL]. arXiv Preprint, arXiv:1909.11942.
[13] Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]// Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
[14] 周晓辉. 基于隐式马尔科夫模型的法律命名实体识别模型的设计与应用[D]. 广州: 华南理工大学, 2017.
[14] (Zhou Xiaohui. Design and Implementation of a Hidden Markov Model Based Model for Legal Named Entity Recognition[D]. Guangzhou: South China University of Technology, 2017.)
[15] 贡保才让. 深层神经网络的藏文命名实体识别研究[D]. 西宁: 青海师范大学, 2018.
[15] (Gongbaocairang. Study on Tibetan Named Entity Recognition Using Deep Neural Networks[D]. Xining: Qinghai Normal University, 2018.)
[16] 孔玲玲. 面向少量标注数据的中文命名实体识别技术研究[D]. 杭州: 浙江大学, 2019.
[16] (Kong Lingling. Research on Chinese Named Entity Recognition Technology from Sparsely Annotated Data[D]. Hangzhou: Zhejiang University, 2019.)
[17] 刘玉娇, 琚生根, 李若晨, 等. 基于深度学习的中文微博命名实体识别[J]. 四川大学学报(工程科学版), 2016, 48(S2):142-146.
[17] (Liu Yujiao, Ju Shenggen, Li Ruochen, et al. Named Entity Recognition in Chinese Micro-blog Based on Deep Learning[J]. Journal of Sichuan University (Engineering Science Edition), 2016, 48(S2):142-146.)
[18] Hu Z K, Li X, Tu C C, et al. Few-Shot Charge Prediction with Discriminative Legal Attributes[C]// Proceedings of the 27th International Conference on Computational Linguistics. 2018: 487-498.
[19] Jiang H J, Wang R P, Shan S G, et al. Learning Discriminative Latent Attributes for Zero-Shot Classification[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. 2017: 4223-4232.
[20] Mencia E L, Fürnkranz J. Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain[C]// Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2008: 50-65.
[21] Leitner E, Rehm G, Moreno-Schneider J. A Dataset of German Legal Documents for Named Entity Recognition[OL]. arXiv Preprint, arXiv: 2003. 13016.
[22] de Araujo P H L, de Campos T E, de Oliveira R R, et al. LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text[C]// Proceedings of International Conference on Computational Processing of the Portuguese Language. 2018: 313-323.
[23] Hovy E, Marcus M, Palmer M, et al. OntoNotes: The 90% Solution[C]// Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. 2006: 57-60.
[24] Thomas A, Sangeetha S. Performance Analysis of the State-of-the-Art Neural Named Entity Recognition Model on Judicial Domain[A]// Pant M, Sharma T, Verma O, et al. Soft Computing: Theories and Applications[M]. Berlin: Springer, 2020: 147-154.
[25] Dozier C, Kondadadi R, Light M, et al. Named Entity Recognition and Resolution in Legal Text[A]// Francesconi E, Montemagni S, Peter W, et al. Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language[M]. Berlin: Springer, 2010: 27-43.
[26] 徐建忠, 朱俊, 赵瑞, 等. 基于超图的非连续法律实体识别[J]. 信息技术与信息化, 2017(5):19-22.
[26] (Xu Jianzhong, Zhu Jun, Zhao Rui, et al. Recognition of Discontiguous Law Entities Based on Hypergraph[J]. Information Technology & Informatization, 2017(5):19-22.)
[27] 张琳, 秦策, 叶文豪. 基于条件随机场的法言法语实体自动识别模型研究[J]. 数据分析与知识发现, 2017, 1(11):46-52.
[27] (Zhang Lin, Qin Ce, Ye Wenhao. Automatic Recognition of Legal Language Entities Based on Conditional Random Fields[J]. Data Analysis and Knowledge Discovery, 2017, 1(11):46-52.)
[28] 王礼敏. 面向法律文书的中文命名实体识别方法研究[D]. 苏州: 苏州大学, 2018.
[28] (Wang Limin. Research on Chinese Named Entity Recognition for Legal Documents[D]. Suzhou: Soochow University, 2018.)
[29] 刘晨玥, 李兵, 吴卫星. 基于罪名相关成分标注的刑事裁判文书概要信息提取[J]. 山东科技大学学报(自然科学版), 2018, 37(4):92-101,124.
[29] (Liu Chenyue, Li Bing, Wu Weixing. Information Extraction of Judical Documents Based on Crime-related Tags[J]. Journal of Shandong University of Science and Technology (Natural Science), 2018, 37(4):92-101, 124.)
[30] 林义孟. 面向司法领域的命名实体识别研究[D]. 昆明: 云南财经大学, 2019.
[30] (Lin Yimeng. Research on Named Entity Recognition in Judicial Field[D]. Kunming: Yunnan University of Finance and Economics, 2019.)
[31] 黄菡, 王宏宇, 王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别[J]. 数据分析与知识发现, 2019, 3(6):66-74.
[31] (Huang Han, Wang Hongyu, Wang Xiaoguang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. Data Analysis and Knowledge Discovery, 2019, 3(6):66-74.)
[32] 周晓磊, 赵薛蛟, 刘堂亮, 等. 基于SVM-BiLSTM-CRF模型的财产纠纷命名实体识别方法[J]. 计算机系统应用, 2019, 28(1):245-250.
[32] (Zhou Xiaolei, Zhao Xuejiao, Liu Tangliang, et al. Named Entity Recognition Method of Judgment Documents with SVM-BiLSTM-CRF[J]. Computer Systems & Applications, 2019, 28(1):245-250.)
[33] 孟昕. 基于深度学习的法律文书识别方法研究[J]. 电子科技, 2019, 32(12):84-86.
[33] (Meng Xin. Research on Recognition Method of Legal Documents Based on Deep Learning[J]. Electronic Science and Technology, 2019, 32(12):84-86.)
[34] Carletta J. Assessing Agreement on Classification Tasks: The Kappa Statistic[J]. Computational Linguistics, 1996, 22(2):249-254.
[35] Hripcsak G, Rothschild A S. Agreement, the F-Measure, and Reliability in Information Retrieval[J]. Journal of the American Medical Informatics Association, 2005, 12(3):296-298.
pmid: 15684123
[36] Brandsen A, Verberne S, Wansleeben M, et al. Creating a Dataset for Named Entity Recognition in the Archaeology Domain[C]// Proceedings of the 12th Language Resources and Evaluation Conference, Marseille. Paris: European Language Resources Association, 2020: 4573-4577.
[37] 殷章志, 李欣子, 黄德根, 等. 融合字词模型的中文命名实体识别研究[J]. 中文信息学报, 2019, 33(11):95-100, 106.
[37] (Yin Zhangzhi, Li Xinzi, Huang Degen, et al. Chinese Named Entity Recognition Ensembled with Character[J]. Journal of Chinese Information Processing, 2019, 33(11):95-100, 106.)
[38] 王昊, 邓三鸿, 朱立平, 等. 大数据环境下政务数据的情报价值及其利用研究——以海关报关商品归类风险规避为例[J]. 科技情报研究, 2020, 2(4):74-89.
[38] (Wang Hao, Deng Sanhong, Zhu Liping, et al. A Study of Intelligence Value and Employment of Political Data in Big Data Environment——The Risk Avoidance of Customs Declaration Commodities[J]. Scientific Information Research, 2020, 2(4):74-89.)
[39] CRF++: Yet Another CRF toolkit[EB/OL]. [2021-01-20]. https://taku910.github.io/crfpp/.
[40] Jieba分词工具[EB/OL]. [2021-01-20]. https://github.com/fxsjy/jieba.
[40] (Chinese Text Segmentation “Jieba” [EB/OL]. [2021-01-20]. https://github.com/fxsjy/jieba.)
[41] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors[OL]. arXiv Preprint, arXiv:1207.0580.
[42] Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
doi: 10.1145/3065386
[43] Eziz E. Kashgari[EB/OL]. [2021-01-20]. https://github.com/BrikerMan/Kashgari.
[44] 朱茂然, 王奕磊, 高松, 等. 中文比较关系的识别:基于注意力机制的深度学习模型[J]. 情报学报, 2019, 38(6):612-621.
[44] (Zhu Maoran, Wang Yilei, Gao Song, et al. A Deep-Learning Model Based on Attention Mechanism for Chinese Comparative Relation Detection[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(6):612-621.)
[1] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[2] 赵平,孙连英,涂帅,卞建玲,万莹. 改进的知识迁移景点实体识别算法研究及应用*[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[3] 李成梁,赵中英,李超,亓亮,温彦. 基于依存关系嵌入与条件随机场的商品属性抽取方法*[J]. 数据分析与知识发现, 2020, 4(5): 54-65.
[4] 黄菡,王宏宇,王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别*[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[5] 肖连杰,孟涛,王伟,吴志祥. 基于深度学习的情报分析方法识别研究 * ——以安全情报领域为例[J]. 数据分析与知识发现, 2019, 3(10): 20-28.
[6] 唐慧慧, 王昊, 张紫玄, 王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[7] 王东波, 吴毅, 叶文豪, 刘睿伦. 多特征知识下的食品安全事件实体抽取研究*[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[8] 张越, 王东波, 朱丹浩. 面向食品安全突发事件汉语分词的特征选择及模型优化研究*[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[9] 张琳, 秦策, 叶文豪. 基于条件随机场的法言法语实体自动识别模型研究*[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
[10] 王密平,王昊,邓三鸿,吴志祥. 基于CRFs的冶金领域中文专利术语抽取研究*[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[11] 贺惠新,刘丽娟. 主动学习的科技文献研究对象标引体系研究*[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[12] 隋明爽,崔雷. 结合多种特征的CRF模型用于化学物质-疾病命名实体识别[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[13] 段宇锋, 朱雯晶, 陈巧, 刘伟, 刘凤红. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[14] 姜春涛. 自动标注中文专利的引文信息[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[15] 何宇, 吕学强, 徐丽萍. 新能源汽车领域中文术语抽取方法[J]. 现代图书情报技术, 2015, 31(10): 88-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn