Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 39-47    DOI: 10.11925/infotech.2096-3467.2019.0549
Current Issue | Archive | Adv Search |
Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning
Xiang Fei(),Xie Yaotan
School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
Download: PDF (890 KB)   HTML ( 6
Export: BibTeX | EndNote (RIS)      

[Objective] This study proposes a new convolutional neural network model, aiming to process the imbalanced data of online patient reviews.[Methods] First, we established the new model with mixed sampling and transfer learning techniques. Then we used end-to-end deep learning architecture based on Word2Vector and convolutional neural network for the distributed representation, feature extraction and topic classification of online patient reviews.[Results] Compared with traditional machine learning algorithm represented by SVM and single convolutional neural network, the proposed model significantly improved the accuracy, recall and F1 values.[Limitations] The imbalanced data of this study was only from online patient reviews.[Conclusions] The proposed model could effectively improve the recognition results of imbalanced data.

Key wordsMixed Sampling      Transfer Learning      Imbalanced Data      Convolutional Neural Network      Patient Reviews Recognition     
Received: 24 May 2019      Published: 26 April 2020
ZTFLH:  TP393  
Corresponding Authors: Fei Xiang     E-mail:

Cite this article:

Xiang Fei,Xie Yaotan. Recognition Model of Patient Reviews Based on Mixed Sampling and Transfer Learning. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 39-47.

URL:     OR

Recognition Framework of Multi-label Data Based on Mixed Sampling and Transfer Learning
Mixed Sampling Process
Patient Reviews Recognition Model Based on End-to-End CNN
Skip-Gram Model
主题名称 正例数(个) 负例数(个) IR
态度 1 313 687 1.91
能力 515 1 485 2.88
措施 841 1 159 1.38
效果 596 1 404 2.36
环境 357 1 643 4.60
费用 107 1 893 17.69
Description of Experimental Data Set
主题1 主题2 共现频次
环境 态度 190
环境 能力 54
环境 措施 116
环境 效果 72
费用 态度 51
费用 能力 25
费用 措施 62
费用 效果 31
Co-occurrence of Topic Labels
参数名称 参数取值 参数含义
size 200 词向量维度
window 5 窗口大小,当前词与预测词在句中最远距离
sg 1 词向量训练模型:Skip-Gram
min_count 5 词频阈值
Parameters of Word2Vec Training
参数名称 参数取值 参数含义
filter size [1,2,3] 卷积核大小
filter number 128 卷积核数量
dropout rate 0.50-0.75 随机失活比率
l2_alpha 10 L2正则化系数
learning rate 1e-4-1e-3 随机梯度下降学习率
Parameters of CNN
算法 态度 能力 措施 效果 环境 费用
SVM 0.9377 0.8083 0.6363 0.7424 0.6363 0.3792
CNN 0.9628 0.9580 0.8488 0.8090 0.8186 0.8026
CNN+MS 0.9653 0.9333 0.8427 0.8501 0.7621 0.7145
CNN+TL - - - - 0.8369 0.8375
CNN+MS+TL - - - - 0.7483 0.7554
Accuracy of Classification Models for Different Topic Datasets
算法 态度 能力 措施 效果 环境 费用
SVM 0.908 0.7648 0.6243 0.7097 0.6243 0.2336
CNN 0.8956 0.7998 0.8193 0.7062 0.6617 0.5337
CNN+MS 0.8957 0.8288 0.8418 0.7535 0.7339 0.6236
CNN+TL - - - - 0.6948 0.5518
CNN+MS+TL - - - - 0.8038 0.6818
Recall of Classification Models for Different Topic Datasets
算法 态度 能力 措施 效果 环境 费用
SVM 0.9221 0.8190 0.6747 0.7244 0.6195 0.2850
CNN 0.9277 0.8678 0.8322 0.7527 0.7235 0.6319
CNN+MS 0.9289 0.8764 0.8406 0.7970 0.7433 0.6541
CNN+TL - - - - 0.7560 0.6556
CNN+MS+TL - - - - 0.7724 0.7124
F1 Value of Classification Models for Different Topic Datasets
[1] Hao H, Zhang K, Wang W , et al. A Tale of Two Countries: International Comparison of Online Doctor Reviews Between China and the United States[J]. International Journal of Medical Informatics, 2017,99:37-44.
[2] 陈旭, 刘鹏鹤, 孙毓忠 , 等. 面向不均衡医学数据集的疾病预测模型研究[J]. 计算机学报, 2019,42(3):596-609.
[2] ( Chen Xu, Liu Penghe, Sun Yuzhong , et al. Research on Disease Prediction Models Based on Imbalanced Medical Data Sets[J]. Chinese Journal of Computers, 2019,42(3):596-609.)
[3] Johns B T, Mewhort D J K, Jones M N . The Role of Negative Information in Distributional Semantic Learning[J]. Cognitive Science, 2019,43(5):e12730.
[4] Liang H, Sun X, Sun Y , et al. Text Feature Extraction Based on Deep Learning: A Review[J]. EURASIP Journal on Wireless Communications and Networking, 2017: Article No. 211.
[5] Luque C, Luna J M, Luque M , et al. An Advanced Review on Text Mining in Medicine[J]. Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 2019,9(3):e1302.
[6] Lu Y, Wu Y, Liu J , et al. Understanding Health Care Social Media Use from Different Stakeholder Perspectives: A Content Analysis of an Online Health Community[J]. Journal of Medical Internet Research, 2017,19(4):e109.
[7] Hao H, Zhang K . The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews[J]. Journal of Medical Internet Research, 2016,18(5):e108.
[8] Rivas R, Montazeri N, Le N X T , et al. Automatic Classification of Online Doctor Reviews: Evaluation of Text Classifier Algorithms[J]. Journal of Medical Internet Research, 2018,20(11):e11141.
[9] 金旭, 王磊, 孙国梓 , 等. 一种基于质心空间的不均衡数据欠采样方法[J]. 计算机科学, 2019,46(2):50-55.
[9] ( Jin Xu, Wang Lei, Sun Guozi , et al. Under-Sampling Method for Unbalanced Data Based on Centroid Space[J]. Computer Science, 2019,46(2):50-55.)
[10] Wilson D L . Asymptotic Properties of Nearest Neighbor Rules Using Edited Data[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1972,2(3):408-421.
[11] Kermanidis K, Maragoudakis M, Fakotakis N , et al. Learning Greek Verb Complements: Addressing the Class Imbalance [C]//Proceedings of the 20th International Conference on Computational Linguistics. 2004: 1065-1071.
[12] 古平, 欧阳源遊 . 基于混合采样的非平衡数据集分类研究[J]. 计算机应用研究, 2015,32(2):379-381.
[12] ( Gu Ping, Ouyang Yuanyou . Classification Research for Unbalanced Data Based on Mixed-Sampling[J]. Application Research of Computers, 2015,32(2):379-381.)
[13] Chawla N V, Bowyer K W, Hall L O , et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002,16:321-357.
[14] Han H, Wang W Y, Mao B H . Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning [C]// Proceedings of the 2005 International Conference on Intelligent Computing. 2005: 878-887.
[15] Perez-Ortiz M, Gutierrez P A, Hervas-Martinez C . Borderline Kernel Based Over-Sampling [C]// Proceedings of the 8th International Conference on Hybrid Artificial Intelligence Systems. 2013: 472-481.
[16] Ling X, Dai W, Xue G R , et al. Spectral Domain-Transfer Learning [C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008: 488-496.
[17] Dai W, Chen Y, Xue G R , et al. Translated Learning: Transfer Learning Across Different Feature Spaces [C]// Proceedings of the 22nd Annual Conference on Neural Information Processing Systems. 2008: 353-360.
[18] Pan S J, Ni X, Sun J , et al. Cross-Domain Sentiment Classification via Spectral Feature Alignment [C]// Proceedings of the 19th International Conference on World Wide Web. 2010: 751-760.
[19] Pan S J, Kwok J T, Yang Q . Transfer Learning via Dimensionality Reduction [C]// Proceedings of the 23rd AAAI Conference on Artificial Intelligence. AAAI, 2008: 677-682.
[20] Si S, Tao D, Geng B . Bregman Divergence-Based Regularization for Transfer Subspace Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010,22(7):929-942.
[21] Bonilla E V, Chai K M A, Williams C K I . Multi-Task Gaussian Process Prediction[J]. Advances in Neural Information Processing Systems, 2008,20:153-160.
[22] Dai W Y, Yang Q, Xue G R , et al. Boosting for Transfer Learning [C]// Proceedings of the 24th International Conference on Machine Learning. 2007: 193-200.
[23] Davis J, Domingos P . Deep Transfer via Second-Order Markov Logic [C]// Proceedings of the 26th International Conference on Machine Learning. 2009: 217-224.
[24] Artem B, Victor L . Aggregating Deep Convolutional Features for Image Retrieval [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 1269-1277.
[25] Zhou B, Khosla A, Lapedriza A , et al. Object Detectors Emerge in Deep Scene CNNs[OL]. arXiv Preprint, arXiv:1412.6856.
[26] Jaipurkar S S, Jie W, Zeng Z , et al. Automated Classification Using End-to-End Deep Learning [C]// Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2018: 706-709.
[27] Kim Y . Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint,arXiv:1408.5882.
[28] Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv:1301.3781.
[29] Alcala-Fdez J, Fernandez A, Luengo J , et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework[J]. Journal of Multiple-Valued Logic and Soft Computing, 2011,17:255-287.
[1] Weng Mengjuan,Yao Changqing,Han Hongqi,Wang Lijun,Ran Yaxin. Classification and Indexing Method with CNN for Imbalanced Datasets[J]. 数据分析与知识发现, 2020, 4(7): 87-95.
[2] Qiu Erli,He Hongwei,Yi Chengqi,Li Huiying. Research on Public Policy Support Based on Character-level CNN Technology[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[3] Liu Weijiang,Wei Hai,Yun Tianhe. Evaluation Model for Customer Credits Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[4] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[5] Liu Tong,Ni Weijian,Sun Yujian,Zeng Qingtian. Predicting Remaining Business Time with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(2/3): 134-142.
[6] Xu Yuemei,Liu Yunwen,Cai Lianqiao. Predicitng Retweets of Government Microblogs with Deep-combined Features[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
[7] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[8] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[9] Kan Liu,Lu Chen. Deep Neural Network Learning for Medical Triage[J]. 数据分析与知识发现, 2019, 3(6): 99-108.
[10] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[11] Meishan Chen,Chenxi Xia. Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[12] Xu Yuemei,Lv Sining,Cai Lianqiao,Zhang Xiaoya. Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec[J]. 数据分析与知识发现, 2018, 2(9): 31-41.
[13] Wu Jiehua,Shen Jing,Zhou Bei. Classifying Multilayer Social Network Links Based on Transfer Component Analysis[J]. 数据分析与知识发现, 2018, 2(9): 88-99.
[14] Jiang Cuiqing,Song Kailun,Ding Yong,Liu Yao. Identifying Potential Customers Based on User-Generated Contents[J]. 数据分析与知识发现, 2018, 2(3): 1-8.
[15] Yu Chuanming,Feng Bolin,An Lu. Sentiment Analysis in Cross-Domain Environment with Deep Representative Learning[J]. 数据分析与知识发现, 2017, 1(7): 73-81.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938