[Objective] This paper proposes a classification model for health questions based on keywords vector expansion, aiming to improve the user experience of medical question-answering community.[Methods] First, we extracted keywords from the questions using TF-IDF and LDA models.Then, we extended the word vector features with Word2Vec and applied them to the classification of health questions.[Results] The proposed method yielded better classification results with the TF-IDF as keyword extraction method and the complete questions/answers as training corpus. The number of words in the reserved dictionary was 600, and the language model was CBOW. The values of our optimal model’s P, R, F were 0.987 2, 0.972 5 and 0.979 8 respectively.[Limitations] We did not extracted keywords of short medical texts with semantic depth.[Conclusions] Our new classification model has better performance than the existing ones.
唐晓波,高和璇. 基于关键词词向量特征扩展的健康问句分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords. Data Analysis and Knowledge Discovery, 2020, 4(7): 66-75.
( The “Internet+Medical and Health Convenience and Benefited Activities” Printed and Distributed by National Health Commission[EB/OL]. [ 2018- 08- 12]. http://gcs.satcm.gov.cn/zhengcewenjian/2018-07-18/7410.html
Wang X, Zuo Z Y, Zhao K. The Evolution of User Roles in Online Health Communities-a Social Support Perspective[C] //Proceedings of Pacific Asia Conference on Information Systems. 2015: 48-56.
Dogan T, Uysal A K. Improved Inverse Gravity Moment Term Weighting for Text Classification[J]. Expert Systems with Applications, 2019,130:45-59.
( Wang Dongbo, He Lin, Huang Shuiqing. Research of Automatic Classification for Pre-Qin Philosophers Literature Based on the Support Vector Machine[J]. Library and Information Service, 2017,61(12):71-76.)
Rushdi M, Saleh M T, Martín V A, et al. Experiments with SVM to Classify Opinions in Different Domains[J]. Expert Systems with Applications, 2011,38(12):14799-14804.
Huang G, Li Y, Wang Q, et al. Automatic Classification Method for Software Vulnerability Based on Deep Neural Network[J]. IEEE Access, 2019.DOI: 10.1109/ACCESS.2019.2900462.
( Xie Zhifeng, Wu Jiaping, Ma Lizhuang. Chinese Financial News Classification Method Based on Convolutional Neural Network[J]. Journal of Shandong University (Engineering Science), 2018,48(3):34-39.)
张闯. 基于深度学习的知乎标题的多标签文本分类[D]. 北京: 北京交通大学, 2018.
( Zhang Chuang. Multi-Label Text Categorization of Zhihu Title Based on Deep Learning[D]. Beijing: Beijing Jiaotong University, 2018.)
Christodoulou V, Filgueira R, Bee E, et al. Automatic Classification of Aurora-related Tweets Using Machine Learning Methods[C] //Proceedings of the 2nd International Conference on Geoinformatics and Data Analysis. 2019: 115-119.
Yang Z, Fan K F, Lai X X, et al. Short Texts Classification Through Reference Document Expansion[J]. Chinese Journal of Electronics, 2014,32(2):315-321.
Li X, Gao F, Ding C. The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension HowNet[C] //Proceedings of the International Conference on Intelligent and Control and Computer Application. 2016: 244-247.
( Zhang Qun, Wang Hongjun, Wang Lunwen. Classifying Short Texts with Word Embedding and LDA Model[J]. New Technology of Library and Information Service, 2016(12):27-35.)
Sun F, Chen H. Feature Extension for Chinese Short Text Classification Based on LDA and Word2vec[C] //Proceedings of the 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). 2018: 1189-1194.