Classification of Health Questions Based on Vector Extension of Keywords
Tang Xiaobo1,2,Gao Hexuan1()
1School of Information Management, Wuhan University, Wuhan 430072, China 2Center for Studies of Information Systems, Wuhan University, Wuhan 430072, China
[Objective] This paper proposes a classification model for health questions based on keywords vector expansion, aiming to improve the user experience of medical question-answering community.[Methods] First, we extracted keywords from the questions using TF-IDF and LDA models.Then, we extended the word vector features with Word2Vec and applied them to the classification of health questions.[Results] The proposed method yielded better classification results with the TF-IDF as keyword extraction method and the complete questions/answers as training corpus. The number of words in the reserved dictionary was 600, and the language model was CBOW. The values of our optimal model’s P, R, F were 0.987 2, 0.972 5 and 0.979 8 respectively.[Limitations] We did not extracted keywords of short medical texts with semantic depth.[Conclusions] Our new classification model has better performance than the existing ones.
唐晓波,高和璇. 基于关键词词向量特征扩展的健康问句分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords. Data Analysis and Knowledge Discovery, 2020, 4(7): 66-75.
( The “Internet+Medical and Health Convenience and Benefited Activities” Printed and Distributed by National Health Commission[EB/OL]. [ 2018- 08- 12]. http://gcs.satcm.gov.cn/zhengcewenjian/2018-07-18/7410.html
[2]
Wang X, Zuo Z Y, Zhao K. The Evolution of User Roles in Online Health Communities-a Social Support Perspective[C] //Proceedings of Pacific Asia Conference on Information Systems. 2015: 48-56.
[3]
Dogan T, Uysal A K. Improved Inverse Gravity Moment Term Weighting for Text Classification[J]. Expert Systems with Applications, 2019,130:45-59.
( Lei Shuo, Liu Xumin, Xu Weixiang. Chinese Short Text Classification Based on Word Vector Extension[J]. Computer Applications and Software, 2018,35(8):269-274.)
( Zeng Qingtian, Hu Xiaohui, Li Chao. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. Data Analysis and Knowledge Discovery, 2019,3(7):52-60.)
[6]
夏威. 基于深度学习模型的问题分类[D]. 长沙: 湖南大学, 2018.
[6]
( Xia Wei. Question Classification Based on Deep Learning Model[D]. Changsha: Hunan University, 2018.)
( Chen Kewen, Zhang Zuping, Long Jun. Research on Entropy-Based Term Weighting Methods in Text Categorization[J]. Journal of Frontiers of Computer Science and Technology, 2016,10(9):1299-1309.)
( Wang Xiangxiang, Fang Hui, Chen Chongcheng. Classification Technique of Cultural Tourism Text Based on Naive Bayes[J]. Journal of Fuzhou University (Natural Science Edition), 2018,46(5):644-649.)
( Wang Dongbo, He Lin, Huang Shuiqing. Research of Automatic Classification for Pre-Qin Philosophers Literature Based on the Support Vector Machine[J]. Library and Information Service, 2017,61(12):71-76.)
[10]
Rushdi M, Saleh M T, Martín V A, et al. Experiments with SVM to Classify Opinions in Different Domains[J]. Expert Systems with Applications, 2011,38(12):14799-14804.
[11]
Huang G, Li Y, Wang Q, et al. Automatic Classification Method for Software Vulnerability Based on Deep Neural Network[J]. IEEE Access, 2019.DOI: 10.1109/ACCESS.2019.2900462.
doi: 10.1109/ACCESS.2019.2894092
pmid: 31741809
( Lv Chaozhen, Ji Donghong, Wu Feifei. Short Text Classification Based on Expanding Feature of LDA[J]. Computer Engineering and Applications, 2015,51(4):123-127.)
( Huang Xianying, Xie Jin, Long Shuyan. Question Classification Method Combining Word Vector and BTM Model[J]. Computer Engineering and Design, 2019,40(2):91-95.)
[14]
Luo L. Network Text Sentiment Analysis Method Combining LDA Text Representation and GRU-CNN[J]. Personal and Ubiquitous Computing, 2019,23(3-4):405-412.
[15]
De Boom C, Van Canneyt S, Demeester T, et al. Representation Learning for Very Short Texts Using Weighted Word Embedding Aggregation[J]. Pattern Recognition Letters, 2016,80:150-156.
[16]
蔡慧苹. 基于卷积神经网络的短文本分类方法研究[D]. 重庆: 西南大学, 2016.
[16]
( Cai Huiping. Research of Short-text Classification Method Based on Convolution Neural Network[D]. Chongqing: Southwest University, 2016.)
[17]
Mikolov T, Zweig G. Context Dependent Recurrent Neural Network Language Model[C] //Proceedings of the 2012 IEEE Workshop on Spoken Language Technology. 2012: 234-239.
[18]
杨开平. 基于语义相似度的中文文本聚类算法研究[D]. 成都: 电子科技大学, 2018.
[18]
( Yang Kaiping. Study on the Chinese Text Clustering Algorithm Based on Semantic Similarity[D]. Chengdu: University of Electronic Science and Technology of China, 2018.)
( Xie Zhifeng, Wu Jiaping, Ma Lizhuang. Chinese Financial News Classification Method Based on Convolutional Neural Network[J]. Journal of Shandong University (Engineering Science), 2018,48(3):34-39.)
[20]
张闯. 基于深度学习的知乎标题的多标签文本分类[D]. 北京: 北京交通大学, 2018.
[20]
( Zhang Chuang. Multi-Label Text Categorization of Zhihu Title Based on Deep Learning[D]. Beijing: Beijing Jiaotong University, 2018.)
[21]
Christodoulou V, Filgueira R, Bee E, et al. Automatic Classification of Aurora-related Tweets Using Machine Learning Methods[C] //Proceedings of the 2nd International Conference on Geoinformatics and Data Analysis. 2019: 115-119.
[22]
Yang Z, Fan K F, Lai X X, et al. Short Texts Classification Through Reference Document Expansion[J]. Chinese Journal of Electronics, 2014,32(2):315-321.
[23]
Li X, Gao F, Ding C. The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension HowNet[C] //Proceedings of the International Conference on Intelligent and Control and Computer Application. 2016: 244-247.
( Jin Yifan, Fu Yingxun, Ma Li. Method of Short Text Classification Based on Frequent Item Feature Extension[J]. Computer Science, 2019,46(S1):478-481.)
( Zhang Qun, Wang Hongjun, Wang Lunwen. Classifying Short Texts with Word Embedding and LDA Model[J]. New Technology of Library and Information Service, 2016(12):27-35.)
[27]
Sun F, Chen H. Feature Extension for Chinese Short Text Classification Based on LDA and Word2vec[C] //Proceedings of the 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). 2018: 1189-1194.
( Li Jie, Li Huan. Research on Product Feature Extraction and Sentiment Classification of Short Online Review Based on Deep Learning[J]. Information Studies: Theory & Application, 2018,41(2):143-148.)
( Qiu Ningjia, Cong Lin, Zhou Sicheng, et al. SVD-CNN Barrage Text Classification Algorithm Combined with Improved Active Learning[J]. Journal of Computer Applications, 2019,39(3):644-650.)
( He Mingyue, Zhao Guihua. Research Progress of Quality of Life in Patients with Chronic Dermatosis[J]. Nursing Practice and Research, 2013,10(5):118-119.)
[32]
王茂全. 深度特征学习在句子文本分类中的研究及应用[D]. 上海: 华东师范大学, 2018.
[32]
( Wang Maoquan. Study and Application of Deep Learning in Sentence-level Text Classification[D]. Shanghai: East China Normal University, 2018.)