[Objective] This study proposes a new convolutional neural network model, aiming to process the imbalanced data of online patient reviews.[Methods] First, we established the new model with mixed sampling and transfer learning techniques. Then we used end-to-end deep learning architecture based on Word2Vector and convolutional neural network for the distributed representation, feature extraction and topic classification of online patient reviews.[Results] Compared with traditional machine learning algorithm represented by SVM and single convolutional neural network, the proposed model significantly improved the accuracy, recall and F1 values.[Limitations] The imbalanced data of this study was only from online patient reviews.[Conclusions] The proposed model could effectively improve the recognition results of imbalanced data.
向菲,谢耀谈. 基于混合采样与迁移学习的患者评论识别模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 39-47.
Xiang Fei,Xie Yaotan. Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 39-47.
Hao H, Zhang K, Wang W , et al. A Tale of Two Countries: International Comparison of Online Doctor Reviews Between China and the United States[J]. International Journal of Medical Informatics, 2017,99:37-44.
( Chen Xu, Liu Penghe, Sun Yuzhong , et al. Research on Disease Prediction Models Based on Imbalanced Medical Data Sets[J]. Chinese Journal of Computers, 2019,42(3):596-609.)
Johns B T, Mewhort D J K, Jones M N . The Role of Negative Information in Distributional Semantic Learning[J]. Cognitive Science, 2019,43(5):e12730.
Liang H, Sun X, Sun Y , et al. Text Feature Extraction Based on Deep Learning: A Review[J]. EURASIP Journal on Wireless Communications and Networking, 2017: Article No. 211.
Luque C, Luna J M, Luque M , et al. An Advanced Review on Text Mining in Medicine[J]. Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 2019,9(3):e1302.
Lu Y, Wu Y, Liu J , et al. Understanding Health Care Social Media Use from Different Stakeholder Perspectives: A Content Analysis of an Online Health Community[J]. Journal of Medical Internet Research, 2017,19(4):e109.
Hao H, Zhang K . The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews[J]. Journal of Medical Internet Research, 2016,18(5):e108.
Rivas R, Montazeri N, Le N X T , et al. Automatic Classification of Online Doctor Reviews: Evaluation of Text Classifier Algorithms[J]. Journal of Medical Internet Research, 2018,20(11):e11141.
( Jin Xu, Wang Lei, Sun Guozi , et al. Under-Sampling Method for Unbalanced Data Based on Centroid Space[J]. Computer Science, 2019,46(2):50-55.)
Wilson D L . Asymptotic Properties of Nearest Neighbor Rules Using Edited Data[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1972,2(3):408-421.
Kermanidis K, Maragoudakis M, Fakotakis N , et al. Learning Greek Verb Complements: Addressing the Class Imbalance [C]//Proceedings of the 20th International Conference on Computational Linguistics. 2004: 1065-1071.
( Gu Ping, Ouyang Yuanyou . Classification Research for Unbalanced Data Based on Mixed-Sampling[J]. Application Research of Computers, 2015,32(2):379-381.)
Chawla N V, Bowyer K W, Hall L O , et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002,16:321-357.
Han H, Wang W Y, Mao B H . Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning [C]// Proceedings of the 2005 International Conference on Intelligent Computing. 2005: 878-887.
Perez-Ortiz M, Gutierrez P A, Hervas-Martinez C . Borderline Kernel Based Over-Sampling [C]// Proceedings of the 8th International Conference on Hybrid Artificial Intelligence Systems. 2013: 472-481.
Ling X, Dai W, Xue G R , et al. Spectral Domain-Transfer Learning [C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008: 488-496.
Dai W, Chen Y, Xue G R , et al. Translated Learning: Transfer Learning Across Different Feature Spaces [C]// Proceedings of the 22nd Annual Conference on Neural Information Processing Systems. 2008: 353-360.
Pan S J, Ni X, Sun J , et al. Cross-Domain Sentiment Classification via Spectral Feature Alignment [C]// Proceedings of the 19th International Conference on World Wide Web. 2010: 751-760.
Pan S J, Kwok J T, Yang Q . Transfer Learning via Dimensionality Reduction [C]// Proceedings of the 23rd AAAI Conference on Artificial Intelligence. AAAI, 2008: 677-682.
Si S, Tao D, Geng B . Bregman Divergence-Based Regularization for Transfer Subspace Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010,22(7):929-942.
Bonilla E V, Chai K M A, Williams C K I . Multi-Task Gaussian Process Prediction[J]. Advances in Neural Information Processing Systems, 2008,20:153-160.
Dai W Y, Yang Q, Xue G R , et al. Boosting for Transfer Learning [C]// Proceedings of the 24th International Conference on Machine Learning. 2007: 193-200.
Davis J, Domingos P . Deep Transfer via Second-Order Markov Logic [C]// Proceedings of the 26th International Conference on Machine Learning. 2009: 217-224.
Artem B, Victor L . Aggregating Deep Convolutional Features for Image Retrieval [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 1269-1277.
Zhou B, Khosla A, Lapedriza A , et al. Object Detectors Emerge in Deep Scene CNNs[OL]. arXiv Preprint, arXiv:1412.6856.
Jaipurkar S S, Jie W, Zeng Z , et al. Automated Classification Using End-to-End Deep Learning [C]// Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2018: 706-709.
Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint,arXiv:1408.5882.
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
Alcala-Fdez J, Fernandez A, Luengo J , et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework[J]. Journal of Multiple-Valued Logic and Soft Computing, 2011,17:255-287.