Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (3): 12-24     https://doi.org/10.11925/infotech.2096-3467.2019.1031
  专题 本期目录 | 过刊浏览 | 高级检索 |
基于深度学习的食品安全事件实体一体化呈现平台构建*
胡昊天1,2,吉晋锋3,王东波3,4(),邓三鸿1,2
1南京大学信息管理学院 南京 210023
2江苏省数据工程与知识服务重点实验室 南京 210023
3南京农业大学信息管理学院 南京 210095
4南京农业大学领域知识关联研究中心 南京 210095
An Integrated Platform for Food Safety Incident Entities Based on Deep Learning
Hu Haotian1,2,Ji Jinfeng3,Wang Dongbo3,4(),Deng Sanhong1,2
1School of Information Management, Nanjing University, Nanjing 210023, China
2Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
3School of Information Management, Nanjing Agricultural University, Nanjing 210095, China
4Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University,Nanjing 210095, China
全文: PDF (14352 KB)   HTML ( 20
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 促进食品安全监管,加强对食品安全突发事件的预测、预警和应急响应工作,方便相关专业科研人员开展拓展研究,以及向民众简明直观地呈现食品安全事件发展态势。【方法】 基于各大权威新闻网站上的食品安全事件新闻报道,经语料清洗、标注、组织构建了食品安全事件实体语料库,利用深度学习技术,对比分析Bi-LSTM、Bi-LSTM-CRF、IDCNN、IDCNN-CRF和BERT模型在食品安全事件实体语料库上的实体识别效果。【结果】 在十折交叉验证中,BERT模型表现最佳,最高F值达到81.39%,平均F值较IDCNN-CRF和Bi-LSTM-CRF模型分别提升5.50%和2.58%。基于Bi-LSTM-CRF模型,构建了食品安全事件实体一体化呈现平台。【局限】 对复合式行政区划地点实体的识别能力有待提升。【结论】 构建的语料库、模型及呈现平台能有效地为政策制定及食品行业监管提供参考建议。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
胡昊天
吉晋锋
王东波
邓三鸿
关键词 深度学习食品安全事件实体Bi-LSTM-CRFBERT    
Abstract

[Objective] This paper tries to promote the national administration of food safety, and strengthen the prediction, warning and response of related emergencies. It not only facilitates research, but also informs the public on food safety issues concisely and intuitively. [Methods] We collected news reports on food safety incidents from leading websites and constructed a corpus for the food safety incident entities through data cleansing, annotation, and organization. Then, we compared performance of Bi-LSTM, Bi-LSTM-CRF, IDCNN, IDCNN-CRF and BERT models on entity recognition. [Results] In the 10-fold cross validation, the highest F-score of the BERT model reached 81.39%, while its average F-score was 5.50% and 2.58% higher than those of IDCNN-CRF and Bi-LSTM-CRF models respectively. We built the integrated presentation platform for food safety incident entities based on the Bi-LSTM-CRF model. [Limitations] More research is needed to identify location entities from complex administrative regions. [Conclusions] The constructed platform supports policy formulation and food industry administration.

Key wordsDeep Learning    Food Safety Incident Entity    Bi-LSTM-CRF    BERT
收稿日期: 2019-09-11      出版日期: 2021-04-12
ZTFLH:  G255  
基金资助:*江苏省高校哲学社会科学研究基金和南京农业大学中央高校基金项目(2018SJA0034);国家社会科学基金重大项目(15ZDB168);2011湖北省协同创新中心项目(JD20150101)
通讯作者: 王东波     E-mail: db.wang@njau.edu.cn
引用本文:   
胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning. Data Analysis and Knowledge Discovery, 2021, 5(3): 12-24.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.1031      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I3/12
Fig.1  食品安全事件实体手工标注示例
Fig. 2  CRF模型的主要架构
Fig.3  Bi-LSTM-CRF模型架构
Fig.4  BERT模型架构
Fig.5  食品安全事件实体识别实验流程
标记 含义
B-fd 食品与诱因实体起始字
I-fd 食品与诱因实体中间字
E-fd 食品与诱因实体结束字
B-ot 时间与地点实体起始字
I-ot 时间与地点实体中间字
E-ot 时间与地点实体结束字
O 非食品安全事件实体字
Table 1  标记及其含义
序号 准确率 召回率 F值
1 73.65% 77.58% 75.57%
2 74.03% 77.82% 75.88%
3 73.25% 77.50% 75.31%
4 73.75% 77.34% 75.50%
5 71.78% 77.42% 74.49%
6 71.86% 77.77% 74.70%
7 72.50% 77.44% 74.89%
8 72.88% 79.02% 75.83%
9 72.98% 78.96% 75.85%
10 71.15% 79.49% 75.09%
平均值 72.78% 78.03% 75.31%
Table 2  基于IDCNN-CRF模型的食品安全事件实体识别十折交叉实验
标记 准确率 召回率 F值
fd 75.56% 79.30% 77.38%
ot 70.56% 74.45% 72.45%
全部 74.03% 77.82% 75.88%
Table 3  性能最佳IDCNN-CRF模型的实体识别情况
序号 准确率 召回率 F值
1 73.92% 80.34% 77.00%
2 76.60% 81.08% 78.78%
3 74.60% 82.09% 78.17%
4 76.15% 81.24% 78.61%
5 75.98% 79.70% 77.79%
6 76.24% 79.94% 78.05%
7 74.38% 81.85% 77.94%
8 76.98% 81.57% 79.21%
9 74.82% 81.55% 78.04%
10 75.43% 82.31% 78.72%
平均值 75.51% 81.17% 78.23%
Table 4  基于Bi-LSTM-CRF模型的食品安全事件实体识别 十折交叉实验
标记 准确率 召回率 F值
fd 77.02% 82.87% 79.84%
ot 76.92% 79.62% 78.25%
全部 76.98% 81.57% 79.21%
Table 5  性能最佳Bi-LSTM-CRF模型的实体识别情况
序号 准确率 召回率 F值
1 77.46% 82.85% 80.06%
2 78.89% 82.96% 80.87%
3 77.79% 83.91% 80.74%
4 78.21% 84.30% 81.14%
5 77.80% 83.00% 80.32%
6 78.52% 83.75% 81.05%
7 77.42% 83.11% 80.16%
8 79.28% 83.61% 81.39%
9 78.18% 84.04% 81.00%
10 78.64% 84.25% 81.35%
平均值 78.22% 83.58% 80.81%
Table 6  基于BERT模型的食品安全事件实体识别十折交叉实验
标记 准确率 召回率 F值
fd 81.71% 85.23% 83.44%
ot 75.65% 81.13% 78.29%
全部 79.28% 83.61% 81.39%
Table 7  性能最佳BERT模型的实体识别情况
模型 准确率 召回率 F值
IDCNN 59.20% 74.55% 65.99%
IDCNN-CRF 72.78% 78.03% 75.31%
Bi-LSTM 54.08% 78.83% 64.15%
Bi-LSTM-CRF 75.51% 81.17% 78.23%
BERT 78.22% 83.58% 80.81%
Table 8  整体识别效果对比
Fig.6  2007-2017年中国食品安全事件省份分布
Fig.7  2007-2017年中国食品安全事件年份分布
Fig.8  2007-2017年中国食品安全事件月份分布
Fig.9  食品与诱因实体
Fig.10  时间与地点实体
Fig.11  API调用界面截图
Fig.12  资料库界面截图
[1] 高小于. “新常态”下我国食品安全监管面临的问题及解决措施[J]. 现代食品, 2019(11):104-106.
[1] ( Gao Xiaoyu. Problems and Solutions of Food Safety Supervision in China Under the New Normal[J]. Modern Food, 2019(11):104-106.)
[2] 黄水清, 王东波, 何琳. 基于先秦语料库的古汉语地名自动识别模型构建研究[J]. 图书情报工作, 2015,59(12):135-140.
[2] ( Huang Shuiqing, Wang Dongbo, He Lin. Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qi Corpus[J]. Library and Information Service, 2015,59(12):135-140.)
[3] 江美辉, 安海忠, 高湘昀, 等. 基于复杂网络的食品安全事件新闻文本可视化及分析[J]. 情报杂志, 2015,34(12):121-127.
[3] ( Jiang Meihui, An Haizhong, Gao Xiangyun, et al. The Visualization and Analysis of News Texts About Food Safety Incidents Based on Complex Networks[J]. Journal of Intelligence, 2015,34(12):121-127.)
[4] 王东波, 吴毅, 叶文豪, 等. 多特征知识下的食品安全事件实体抽取研究[J]. 数据分析与知识发现, 2017,1(3):54-61.
[4] ( Wang Dongbo, Wu Yi, Ye Wenhao, et al. Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. Data Analysis and Knowledge Discovery, 2017,1(3):54-61.)
[5] 向晓雯, 史晓东, 曾华琳. 一个统计与规则相结合的中文命名实体识别系统[J]. 计算机应用, 2005,25(10):2404-2406.
[5] ( Xiang Xiaowen, Shi Xiaodong, Zeng Hualin. Chinese Named Entity Recognition System Using Statistics-Based and Rules-Based Method[J]. Journal of Computer Applications, 2005,25(10):2404-2406.)
[6] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报, 2009,23(2):3-17.
[6] ( Zhao Jun. A Survey on Named Entity Recognition, Disambiguation and Cross-Lingual Co-reference Resolution[J]. Journal of Chinese Information Processing, 2009,23(2):3-17.)
[7] 张剑, 吴青, 羊昕旖, 等. 基于条件随机场的农业命名实体识别[J]. 计算机与现代化, 2018(1):123-126.
[7] ( Zhang Jian, Wu Qing, Yang Xinyi, et al. Chinese Agricultural Named Entity Recognition Based on Conditional Random Fields[J]. Computer and Modernization, 2018(1):123-126.)
[8] 乔维, 孙茂松. 基于M~3N的中文分词与命名实体识别一体化[J]. 清华大学学报(自然科学版), 2010,50(5):758-762, 767.
[8] ( Qiao Wei, Sun Maosong. Joint Chinese Word Segmentation and Named Entity Recognition Based on Max-Margin Markov Networks[J]. Journal of Tsinghua University (Science and Technology), 2010,50(5):758-762, 767.)
[9] 黄诗琳, 郑小林, 陈德人. 针对产品命名实体识别的半监督学习方法[J]. 北京邮电大学学报, 2013, 36(2): 20-23, 54.
[9] ( Huang Shilin, Zheng Xiaolin, Chen Deren. A Semi-Supervised Learning Method for Product Named Entity Recognition[J]. Journal of Beijing University of Posts and Telecommunications, 2013,36(2):20-23, 54.)
[10] 王国昱. 基于深度学习的中文命名实体识别研究[D]. 北京: 北京工业大学, 2015.
[10] ( Wang Guoyu. Research of Chinese Named Entity Recognition Based on Deep Learning[D]. Beijing: Beijing University of Technology, 2015.)
[11] 徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究[J]. 数据分析与知识发现, 2020,4(8):86-97.
[11] ( Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. Data Analysis and Knowledge Discovery, 2020,4(8):86-97.)
[12] 冯蕴天, 张宏军, 郝文宁, 等. 基于深度信念网络的命名实体识别[J]. 计算机科学, 2016,43(4):224-230.
[12] ( Feng Yuntian, Zhang Hongjun, Hao Wenning, et al. Named Entity Recognition Based on Deep Belief Net[J]. Computer Science, 2016,43(4):224-230.)
[13] 沈思, 朱丹浩. 基于深度学习的中文地名识别研究[J]. 北京理工大学学报, 2017,37(11):1150-1155.
[13] ( Shen Si, Zhu Danhao. Chinese Place Name Recognition Based on Deep Learning[J]. Transactions of Beijing Institute of Technology, 2017,37(11):1150-1155.)
[14] Lafferty J D, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]// Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
[15] Pham T H, Phuong L H . End-to-End Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level[C]// Proceedings of the 15th International Conference of the Pacific Association for Computational Linguistics. Springer, 2017: 534-542.
[16] Lample G, Ballesteros M, Subramanian S, et al. Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[17] Strubell E, Verga P, Belanger D, et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions[OL]. arXiv Preprint, arXiv: 1702.02098.
[18] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (Almost) from Scratch[J]. Journal of Machine Learning Research, 2011,12(1):2493-2537.
[19] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
[20] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 5998-6008.
[1] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3] 马江微, 吕学强, 游新冬, 肖刚, 韩君妹. 融合BERT与关系位置特征的军事领域关系抽取方法*[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[4] 李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[5] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[6] 喻雪寒, 何琳, 徐健. 基于RoBERTa-CRF的古文历史事件抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[7] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[8] 陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[9] 刘文斌, 何彦青, 吴振峰, 董诚. 基于BERT和多相似度融合的句子对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[10] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[11] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[12] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[13] 尹鹏博,潘伟民,张海军,陈德刚. 基于BERT-BiGA模型的标题党新闻识别研究*[J]. 数据分析与知识发现, 2021, 5(6): 126-134.
[14] 宋若璇,钱力,杜宇. 基于科技论文中未来工作句集的学术创新构想话题自动生成方法研究*[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[15] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn