|
|
Segmenting Chinese Words from Food Safety Emergencies |
Zhang Yue1, Wang Dongbo1,2( ), Zhu Danhao3 |
1College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China 2Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University, Nanjing 210095, China 3Library of Jiangsu Police Institute, Nanjing 210031, China |
|
|
Abstract [Objective] This paper examines the automatic word segmentation models, which plays key roles to build databases for food safety administration. We used the statistical learning method based on conditional random field to segment words from food safety emergencies. [Methods] First, we analyzed the length of target words and conducted multiple experiments on the selection and template of word features for the automatic segmentation methods. Second, we identified the impacts of different features and templates to the segmentation results. [Results] We found that selecting more features might not yield better results due to the characteristics interference. About 46.62% of the phrases from the corpus of food safety emergencies only contained two or three words. The first words before and after the current word of the features template pose more effects to the results. [Conclusions] We have identified the optimal feature and template for the automatic segmentation of words and the F score reaches 92.88% with the 5Tag features.
|
Received: 22 September 2016
Published: 27 March 2017
|
|
[1] |
李洪峰. 食品安全社会共治的现实困境与发展对策[J]. 食品与机械, 2016, 32(4): 234-236.
|
[1] |
(Li Hongfeng.Analysis of Realistic Plights and Countermeasures in Social Co- governance on Food Safety in China[J]. Food & Machinery, 2016, 32(4): 234-236.)
|
[2] |
王辉霞. 公众参与食品安全治理法治探析[J]. 商业研究, 2012(4): 170-177.
doi: 10.3969/j.issn.1001-148X.2012.04.028
|
[2] |
(Wang Huixia.Public Participation in Food Safety Management of the Rule of Law[J]. Commercial Research, 2012(4): 170-177.)
doi: 10.3969/j.issn.1001-148X.2012.04.028
|
[3] |
奉国和, 郑伟.国内中文自动分词技术研究综述[J].图书情报工作, 2011, 55(2): 41-45.
|
[3] |
(Feng Guohe, Zheng Wei.Review of Chinese Automatic Word Segmentation[J]. Library and Information Service, 2011, 55(2): 41-45.)
|
[4] |
张星联, 唐晓纯. 我国食品安全预警数据库系统的建设与实现[J]. 食品科技, 2008, 33(12): 250-254.
doi: 10.3969/j.issn.1005-9989.2008.12.065
|
[4] |
(Zhang Xinglian, Tang Xiaochun.Establishment on Database System of Food Safety Early-warning in China[J]. Food Science and Technology, 2008, 33(12): 250-254.)
doi: 10.3969/j.issn.1005-9989.2008.12.065
|
[5] |
吴云红, 朱亮, 初炜, 等. 食品监管改革的关键——基于互联网的动态第三方数据库[J]. 食品工业科技, 2009(9): 272-274.
|
[5] |
(Wu Yunhong, Zhu Liang, Chu Wei, et al.Key of Food Supervision and Administration Reform-dynamic and Third Party Database Based on Internet[J]. Science and Technology of Food Industry, 2009 (9): 272-274.)
|
[6] |
余清, 洪源. 加工食品风险数据库的构建思路[J]. 价值工程, 2013(30): 174-175.
doi: 10.3969/j.issn.1006-4311.2013.30.092
|
[6] |
(Yu Qing, Hong Yuan.Construction Idea for Risk Database of Processed Food[J]. Value Engineering, 2013(30): 174-175.)
doi: 10.3969/j.issn.1006-4311.2013.30.092
|
[7] |
贾凯, 彭培好, 阮伟玲. 四川省彭州市三界镇农民专业合作社调查研究[J].北京农业, 2014(3): 247-248.
doi: 10.3969/j.issn.1000-6966.2014.03.190
|
[7] |
(Jia Kai, Peng Peihao, Ruan Weiling.Study on the Investigation of Farmer Cooperatives in Sanjie Town, Pengzhou City, Sichuan Province[J]. Beijing Agriculture, 2014(3): 247-248.)
doi: 10.3969/j.issn.1000-6966.2014.03.190
|
[8] |
黄昌宁, 赵海. 中文分词十年回顾[J]. 中文信息学报, 2007, 21(3): 8-19.
doi: 10.3969/j.issn.1003-0077.2007.03.002
|
[8] |
(Huang Changning, Zhao Hai.Chinese Word Segmentation: A Decade Review[J]. Journal of Chinese Information Processing, 2007, 21(3): 8-19.)
doi: 10.3969/j.issn.1003-0077.2007.03.002
|
[9] |
Zeng D, Wei D, Chau M, et al.Domain-specific Chinese Word Segmentation Using Suffix Tree and Mutual Information[J]. Information Systems Frontiers, 2011, 13(1): 115-125.
doi: 10.1007/s10796-010-9278-5
|
[10] |
刘泽文, 丁冬, 李春文. 基于条件随机场的中文短文本分词方法[J]. 清华大学学报:自然科学版, 2015, 55(8): 16-20.
|
[10] |
(Liu Zewen, Ding Dong, Li Chunwen.Chinese Word Segmentation Method for Short Chinese Text Based on Conditional Random Fields[J]. Journal of Tsinghua University:Science and Technology, 2015, 55(8): 16-20.)
|
[11] |
Lafferty J D, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
|
[12] |
Pearl J.Bayes and Markov Networks:A Comparison of Two Graphical Representations of Probabilistic Knowledge [R]. Los Angeles, California, USA: University of California, 1986.
|
[13] |
Wallach H M.Conditional Random Fields: An Introduction [EB/OL]. (2004-02-24). .
|
[14] |
CRF++: Yet Another CRF Toolkit [EB/OL]. [2014-08-04]. .
|
[15] |
中国科学院计算技术研究所. ICTCLAS汉语分词系统 [CP/OL]. (2016-02-17). [2016-06-30]. .
|
[15] |
(Institute of Computing Technology of the Chinese Academy of Sciences. ICTCLAS Chinese Word Segmentation System [CP/OL]. (2016-02-17). [2016-06-30].
|
[16] |
岳金媛, 徐金安, 张玉洁. 面向专利文献的汉语分词技术研究[J]. 北京大学学报: 自然科学版, 2013, 49(1): 159-164 .
|
[16] |
(Yue Jinyuan, Xu Jin’an, Zhang Yujie.Chinese Word Segmentation for Patent Documents[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(1): 159-164.)
|
[17] |
Chen L, Li M, Zhang J, et al.A Double-Layer Word Segmentation Combined with Local Ambiguity Word Grid and CRF[J]. Transactions on Computer Science & Technology, 2013, 2(1): 1-8.
|
[18] |
黄水清, 王东波, 何琳. 以《汉学引得丛刊》为领域词表的先秦典籍自动分词探讨[J]. 图书情报工作, 2015, 59(11): 127-133.
doi: 10.13266/j.issn.0252-3116.2015.11.018
|
[18] |
(Huang Shuiqing, Wang Dongbo, He Lin.Exploring of Word Segmentation for Fore-Qin Literature Based on the Domain Glossary of Sinological Index Series[J]. Library and Information Service, 2015, 59(11): 127-133.)
doi: 10.13266/j.issn.0252-3116.2015.11.018
|
[19] |
Zhao H, Huang C N, Li M, et al.An Improved Chinese Word Segmentation System with Conditional Random Field[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing.2006: 162-165.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|