|
|
Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning |
Xiang Fei( ),Xie Yaotan |
School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China |
|
|
Abstract [Objective] This study proposes a new convolutional neural network model, aiming to process the imbalanced data of online patient reviews.[Methods] First, we established the new model with mixed sampling and transfer learning techniques. Then we used end-to-end deep learning architecture based on Word2Vector and convolutional neural network for the distributed representation, feature extraction and topic classification of online patient reviews.[Results] Compared with traditional machine learning algorithm represented by SVM and single convolutional neural network, the proposed model significantly improved the accuracy, recall and F1 values.[Limitations] The imbalanced data of this study was only from online patient reviews.[Conclusions] The proposed model could effectively improve the recognition results of imbalanced data.
|
Received: 24 May 2019
Published: 26 April 2020
|
|
Corresponding Authors:
Fei Xiang
E-mail: xiangfei@hust.edu.cn
|
[1] |
Hao H, Zhang K, Wang W , et al. A Tale of Two Countries: International Comparison of Online Doctor Reviews Between China and the United States[J]. International Journal of Medical Informatics, 2017,99:37-44.
|
[2] |
陈旭, 刘鹏鹤, 孙毓忠 , 等. 面向不均衡医学数据集的疾病预测模型研究[J]. 计算机学报, 2019,42(3):596-609.
|
[2] |
( Chen Xu, Liu Penghe, Sun Yuzhong , et al. Research on Disease Prediction Models Based on Imbalanced Medical Data Sets[J]. Chinese Journal of Computers, 2019,42(3):596-609.)
|
[3] |
Johns B T, Mewhort D J K, Jones M N . The Role of Negative Information in Distributional Semantic Learning[J]. Cognitive Science, 2019,43(5):e12730.
|
[4] |
Liang H, Sun X, Sun Y , et al. Text Feature Extraction Based on Deep Learning: A Review[J]. EURASIP Journal on Wireless Communications and Networking, 2017: Article No. 211.
|
[5] |
Luque C, Luna J M, Luque M , et al. An Advanced Review on Text Mining in Medicine[J]. Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 2019,9(3):e1302.
|
[6] |
Lu Y, Wu Y, Liu J , et al. Understanding Health Care Social Media Use from Different Stakeholder Perspectives: A Content Analysis of an Online Health Community[J]. Journal of Medical Internet Research, 2017,19(4):e109.
|
[7] |
Hao H, Zhang K . The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews[J]. Journal of Medical Internet Research, 2016,18(5):e108.
|
[8] |
Rivas R, Montazeri N, Le N X T , et al. Automatic Classification of Online Doctor Reviews: Evaluation of Text Classifier Algorithms[J]. Journal of Medical Internet Research, 2018,20(11):e11141.
|
[9] |
金旭, 王磊, 孙国梓 , 等. 一种基于质心空间的不均衡数据欠采样方法[J]. 计算机科学, 2019,46(2):50-55.
|
[9] |
( Jin Xu, Wang Lei, Sun Guozi , et al. Under-Sampling Method for Unbalanced Data Based on Centroid Space[J]. Computer Science, 2019,46(2):50-55.)
|
[10] |
Wilson D L . Asymptotic Properties of Nearest Neighbor Rules Using Edited Data[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1972,2(3):408-421.
|
[11] |
Kermanidis K, Maragoudakis M, Fakotakis N , et al. Learning Greek Verb Complements: Addressing the Class Imbalance [C]//Proceedings of the 20th International Conference on Computational Linguistics. 2004: 1065-1071.
|
[12] |
古平, 欧阳源遊 . 基于混合采样的非平衡数据集分类研究[J]. 计算机应用研究, 2015,32(2):379-381.
|
[12] |
( Gu Ping, Ouyang Yuanyou . Classification Research for Unbalanced Data Based on Mixed-Sampling[J]. Application Research of Computers, 2015,32(2):379-381.)
|
[13] |
Chawla N V, Bowyer K W, Hall L O , et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002,16:321-357.
|
[14] |
Han H, Wang W Y, Mao B H . Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning [C]// Proceedings of the 2005 International Conference on Intelligent Computing. 2005: 878-887.
|
[15] |
Perez-Ortiz M, Gutierrez P A, Hervas-Martinez C . Borderline Kernel Based Over-Sampling [C]// Proceedings of the 8th International Conference on Hybrid Artificial Intelligence Systems. 2013: 472-481.
|
[16] |
Ling X, Dai W, Xue G R , et al. Spectral Domain-Transfer Learning [C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008: 488-496.
|
[17] |
Dai W, Chen Y, Xue G R , et al. Translated Learning: Transfer Learning Across Different Feature Spaces [C]// Proceedings of the 22nd Annual Conference on Neural Information Processing Systems. 2008: 353-360.
|
[18] |
Pan S J, Ni X, Sun J , et al. Cross-Domain Sentiment Classification via Spectral Feature Alignment [C]// Proceedings of the 19th International Conference on World Wide Web. 2010: 751-760.
|
[19] |
Pan S J, Kwok J T, Yang Q . Transfer Learning via Dimensionality Reduction [C]// Proceedings of the 23rd AAAI Conference on Artificial Intelligence. AAAI, 2008: 677-682.
|
[20] |
Si S, Tao D, Geng B . Bregman Divergence-Based Regularization for Transfer Subspace Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010,22(7):929-942.
|
[21] |
Bonilla E V, Chai K M A, Williams C K I . Multi-Task Gaussian Process Prediction[J]. Advances in Neural Information Processing Systems, 2008,20:153-160.
|
[22] |
Dai W Y, Yang Q, Xue G R , et al. Boosting for Transfer Learning [C]// Proceedings of the 24th International Conference on Machine Learning. 2007: 193-200.
|
[23] |
Davis J, Domingos P . Deep Transfer via Second-Order Markov Logic [C]// Proceedings of the 26th International Conference on Machine Learning. 2009: 217-224.
|
[24] |
Artem B, Victor L . Aggregating Deep Convolutional Features for Image Retrieval [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 1269-1277.
|
[25] |
Zhou B, Khosla A, Lapedriza A , et al. Object Detectors Emerge in Deep Scene CNNs[OL]. arXiv Preprint, arXiv:1412.6856.
|
[26] |
Jaipurkar S S, Jie W, Zeng Z , et al. Automated Classification Using End-to-End Deep Learning [C]// Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2018: 706-709.
|
[27] |
Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint,arXiv:1408.5882.
|
[28] |
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
|
[29] |
Alcala-Fdez J, Fernandez A, Luengo J , et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework[J]. Journal of Multiple-Valued Logic and Soft Computing, 2011,17:255-287.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|