[Objective] Introduce external food field data to enhance the word vector representation of foodborne exposure food, and use machine learning methods to identify foodborne disease pathogens.
[Methods] By extracting space, time, patient information, exposure food information from foodborne disease case data as feature data for identification of foodborne disease pathogens, and further using word vector representation technology that integrates domain knowledge to embed foodborne disease exposure food, and utilizing XGBoost machine learning model to mine and learn the correlation between features, to realize the identification of several important foodborne disease pathogens.
[Results] Through the word vector representation method, which integrates domain data, a more accurate word vector representation of exposure food can be obtained compared with the word vector model based on general corpus. In the identification of foodborne disease pathogens, it can achieve 68% precision and recall on four important foodborne disease pathogens: Salmonella, Escherichia coli, Vibrio parahaemolyticus and Norovirus, which assistance for the auxiliary diagnosis and treatment of pathogens of foodborne diseases.
[Limitations] Only four major foodborne disease pathogens were analyzed.
[Conclusions] Relevant analysis results can guide the management and prevent of foodborne diseases, and the identification of foodborne pathogens based on the analysis results and machine learning methods can provide beneficial support for the clinical diagnosis and treatment of foodborne diseases.
王寒雪, 崔文娟, 周园春, 杜一. 一种基于机器学习的食源性疾病致病菌识别方法
[J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2020.1105.
Wang Hanxue, Cui Wenjuan, Zhou Yuanchun, Du Yi. A Method for Identifying Pathogens of Foodborne Diseases based on Machine Learning
. Data Analysis and Knowledge Discovery, 0, (): 1-.