Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (3): 54-61    DOI: 10.11925/infotech.2096-3467.2017.03.07
Orginal Article Current Issue | Archive | Adv Search |
Extracting Events of Food Safety Emergencies with Characteristics Knowledge
Wang Dongbo1,2(), Wu Yi1, Ye Wenhao1, Liu Ruilun1
1College of Information and Technology, Nanjing Agricultural University, Nanjing 210095, China
2Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF (765 KB)   HTML ( 19
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to extract the events of food safety emergencies from large food safety emergencies. [Methods] First, we built the food safety emergency corpus based on the past events, as well as the data acquisition, labeling, and organization methods of information science. Then, we extracted the corresponding events with the help of conditional random field model, and the distribution characteristics knowledge of the food safety emergencies. [Limitations] We might not be able to apply the feature template created by this research to other fields. [Results] We examined the proposed model with a food safety emergency corpus of 15 million Chinese words, and the F value of this model reached 91.94%. [Conclusions] It is feasible for us to extract the events from food safety emergency corpus with the help of conditional random field model.

Key wordsCharacteristics Knowledge      Conditional Random Fields      Event      Food Safety Emergency     
Received: 03 August 2016      Published: 20 April 2017
ZTFLH:  G350  

Cite this article:

Wang Dongbo,Wu Yi,Ye Wenhao,Liu Ruilun. Extracting Events of Food Safety Emergencies with Characteristics Knowledge. Data Analysis and Knowledge Discovery, 2017, 1(3): 54-61.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.03.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I3/54

实体长度 数量(个) 实体长度 数量(个)
2 48 036 13 13
3 23 499 9 9
4 6 878 10 7
1 6 594 12 5
5 1 383 14 2
6 394 11 1
7 182 15 1
8 37 20 1
实体 数量(个) 实体 数量(个)
添加剂 2 243 大米 899
奶粉 1 661 牛奶 810
地沟油 1 178 药袋 733
酱油 1 078 菌落总数 377
1 006 亚硝酸盐 352
猪肉 943 反式脂肪酸 95
甲醛 904 过氧化苯甲酰 90
词语 词性 词长度 是否
实体词
是否
左边界
是否右边界 标记
有关 p 2 N N N S
反式 b 2 Y N N B
脂肪酸 n 3 Y N N E
问题 n 1 N N N S
, wd 1 N N N S
浙江省 ns 3 N N N S
金华市 ns 3 N N N S
公安局 n 3 N N N S
江南 ns 2 N N N S
分局 n 2 N N N S
接到 v 2 N N N S
群众 n 2 N N N S
举报 vn 2 N N N S
v 1 N N N S
测试编号 准确率 召回率 F值
1 89.95% 90.17% 90.06%
2 90.46% 91.01% 90.73%
3 91.89% 90.68% 91.28%
4 88.35% 91.88% 90.08%
5 90.37% 91.06% 90.71%
6 91.01% 90.07% 90.54%
7 91.43% 91.74% 91.58%
8 90.48% 91.01% 90.74%
9 92.12% 91.77% 91.94%
10 90.54% 91.65% 91.09%
均值 90.66% 91.10% 90.88%
测试编号 准确率 召回率 F值
1 72.55% 62.50% 67.15%
2 73.72% 61.89% 67.29%
3 81.90% 65.19% 72.60%
4 84.10% 59.97% 70.01%
5 81.67% 62.49% 70.80%
6 86.52% 63.70% 73.38%
7 81.66% 65.74% 72.84%
8 72.71% 67.10% 69.79%
9 74.72% 63.37% 68.58%
10 80.88% 65.40% 72.32%
均值 79.04% 63.74% 70.48%
编号 条件随机场模型 最大熵模型
训练耗时
(秒)
测试耗时
(毫秒)
训练耗时
(秒)
测试耗时
(毫秒)
1 43 837.09 810 78.01 4
2 41 660.11 1 045 67.01 5
3 43 267.72 980 89.06 78
4 42 078.04 124 67.35 9
5 41 863.00 450 56.43 45
6 43 287.12 160 67.50 7
7 45 677.87 678 57.49 67
8 48 814.89 410 67 56
9 47 691.62 431 78.50 30
10 43 827.01 910 67.59 9
均值 44 200.45 599.8 69.59 31
[1] 掷出窗外 [EB/OL]. [2014-02-18]. .
[1] (Zhi Chu Chuang Wai [EB/OL]. [2014-02-18].
[2] 张慕洁, 沈建华. 关于处置食品药品安全突发事件中信息公开的思考[J]. 上海食品药品监管情报研究, 2012(2): 45-49.
[2] (Zhang Mujie, Shen Jianhua.About the Disposal of the Food and Drug Safety Incident Information to the Public Thinking about the Disposal of the Food and Drug Safety Incident Information[J]. Shanghai Food and Drug Information Research, 2012(2): 45-49.)
[3] 马颖, 张园园, 宋文广. 食品行业事件风险感知的传染病模型研究[J]. 科研管理, 2013, 34(9): 123-130.
[3] (Ma Ying, Zhang Yuanyuan, Song Wenguang.Research on Epidemic Model of Emergency Events Risk Perception in Food Industry[J]. Science Research Management, 2013, 34(9): 123-130.)
[4] 陈宇, 郑德权, 赵铁军. 基于Deep Belief Nets的中文名实体关系抽取[J]. 软件学报, 2012, 23(10): 2572-2585.
doi: 10.3724/SP.J.1001.2012.04181
[4] (Chen Yu, Zheng Dequan,Zhao Tiejun.Chinese Relation Extraction Based on Deep Belief Nets[J]. Journal of Software, 2012, 23(10): 2572-2585.)
doi: 10.3724/SP.J.1001.2012.04181
[5] 邵发, 黄银阁, 周兰江, 等. 基于实体消歧的中文实体关系抽取[J]. 山东大学学报: 工学版, 2014, 44(6): 32-37.
doi: 10.6040/j.issn.1672-3961.1.2014.163
[5] (Shao Fa, Huang Yin’ge, Zhou Lanjiang, et al.Chinese Entity Relation Extraction Based on Entity Disambiguation[J]. Journal of Shandong University: Engineering Science, 2014, 44(6): 32-37.)
doi: 10.6040/j.issn.1672-3961.1.2014.163
[6] 许华, 刘茂福, 姜丽, 等. 基于语言规则的病症菌实体抽取[J]. 武汉大学学报: 理学版, 2015, 61(2): 51-55.
doi: 10.14188/j.1671-8836.2015.02.008
[6] (Xu Hua, Liu Maofu, Jiang Li, et al.Disease and Bacteria Entity Extraction Based on Linguistic Rule[J].Journal of Wuhan University: Natural Science Edition, 2015, 61(2): 51-55.)
doi: 10.14188/j.1671-8836.2015.02.008
[7] 魏秀卓. 食品投诉文本敏感词汇抽取研究[D]. 长春: 东北师范大学, 2015.
[7] (Wei Xiuzhuo.Food Complaint Text Sensitive Words Extraction Research [D]. Changchun: Northeast Normal University, 2015.)
[8] 高蕊. 基于本体的食品投诉文本危害信息抽取研究[D]. 长春: 东北师范大学, 2011.
[8] (Gao Rui.Ontology-based Hazard Information Extraction from Chinese Food Complaint Documents[D]. Changchun: Northeast Normal University, 2011.)
[9] 李丽双, 党延忠, 张婧, 等. 基于条件随机场的汽车领域术语抽取[J]. 大连理工大学学报, 2013, 53(2): 267-272.
doi: 10.7511/dllgxb201302018
[9] (Li Lishuang, Dang Yanzhong, Zhang Jing, et al.Automotive Term Extraction Based on Conditional Random Fields[J]. Journal of Dalian University of Technology, 2013, 53(2): 267-272.)
doi: 10.7511/dllgxb201302018
[10] 王文龙, 王东波. 面向项目申请书的命名实体抽取模型构建研究[J]. 情报资料工作, 2015(1): 30-34.
doi: 10.3969/j.issn.1002-0314.2015.01.005
[10] (Wang Longwen, Wang Dongbo.Project Application-oriented Named Entity Extraction Model Construction[J]. Information and Documentation Services, 2015(1): 30-34.)
doi: 10.3969/j.issn.1002-0314.2015.01.005
[11] 刘凯, 周雪忠, 于剑, 等. 基于条件随机场的中医临床病历命名实体抽取[J]. 计算机工程, 2014, 40(9): 312-316.
doi: 10.3969/j.issn.1000-3428.2014.09.062
[11] (Liu Kai, Zhou Xuezhong, Yu Jian, et al.Named Entity Extraction of Traditional Chinese Medicine Medical Records Based on Conditional Random Field[J]. Computer Engineering, 2014, 40(9): 312-316.)
doi: 10.3969/j.issn.1000-3428.2014.09.062
[12] 吴云芳. 面向语言信息处理的现代汉语并列结构研究[M]. 北京: 北京师范大学出版社, 2004.
[12] (Wu Yunfang.Researches of Modern Chinese Coordinate Construction for Language Information Processing[M]. Beijing: Beijing Normal University Press, 2004.)
[13] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]// Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
[14] McCallum A, Freitag D, Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation[C]//Proceedings of the 17th International Conference on Machine Learning. 2000: 591-598.
[1] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[2] Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[3] Liang Yanping,An Lu,Liu Jing. Topic Resonance of Micro-blogs on Similar Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[4] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[5] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[6] Ling Wang,Qianjin Dai,Xiaojun Wu. The Study on the Temporal and Spatial Distribution of Event Tourism Based on Large-scale Tourism Early Warning Platform[J]. 数据分析与知识发现, 2018, 2(8): 31-40.
[7] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[8] Jiaqi Wang,Junsheng Zhang,Xiaodong Qiao. Analyzing Representation and Semantic Links of Scientific Research Events[J]. 数据分析与知识发现, 2018, 2(5): 32-39.
[9] Yonghua Cen,Can Zhang,Chengyao Wu. Media Information and Overtrading——An Empirical Study on Micro-Blog Posts, Industry News and Company Announcements[J]. 数据分析与知识发现, 2018, 2(4): 20-28.
[10] Yongbing Gao,Guipeng Yang,Di Zhang,Zhanfei Ma. Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words[J]. 数据分析与知识发现, 2017, 1(9): 57-64.
[11] Xiaoyu Wang,Bin Li. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[12] Ding Shengchun,Gong Silan,Li Hongmei. A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 12-20.
[13] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[14] Li Jinhua,An Zhongjie. Analyzing Geographical Coordinates Data for Micro-blog Trending Events[J]. 现代图书情报技术, 2016, 32(2): 90-101.
[15] Wu Peng, Yang Shuang, Zhang Jingjing, Gao Qingning. Agent-Based Modeling and Simulation of Evolution of Netizen Crowd Behavior in Unexpected Events Public Opinion[J]. 现代图书情报技术, 2015, 31(7-8): 65-72.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn