Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (2): 64-72    DOI: 10.11925/infotech.2096-3467.2017.02.09
Segmenting Chinese Words from Food Safety Emergencies
Yue Zhang1,Dongbo Wang1,2(),Danhao Zhu3
1College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
2Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University, Nanjing 210095, China
3Library of Jiangsu Police Institute, Nanjing 210031, China
[Objective] This paper examines the automatic word segmentation models, which plays key roles to build databases for food safety administration. We used the statistical learning method based on conditional random field to segment words from food safety emergencies. [Methods] First, we analyzed the length of target words and conducted multiple experiments on the selection and template of word features for the automatic segmentation methods. Second, we identified the impacts of different features and templates to the segmentation results. [Results] We found that selecting more features might not yield better results due to the characteristics interference. About 46.62% of the phrases from the corpus of food safety emergencies only contained two or three words. The first words before and after the current word of the features template pose more effects to the results. [Conclusions] We have identified the optimal feature and template for the automatic segmentation of words and the F score reaches 92.88% with the 5Tag features.

Key wordsChinese Word Segmentation      Food Safety      Conditional Random Field      Feature Template      Feature Selection     
Received: 22 September 2016      Published: 27 March 2017

Cite this article:

Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies. Data Analysis and Knowledge Discovery, 2017, 1(2): 64-72.

URL:     OR

