|
|
Classification of Health Questions Based on Vector Extension of Keywords |
Tang Xiaobo1,2,Gao Hexuan1() |
1School of Information Management, Wuhan University, Wuhan 430072, China 2Center for Studies of Information Systems, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This paper proposes a classification model for health questions based on keywords vector expansion, aiming to improve the user experience of medical question-answering community.[Methods] First, we extracted keywords from the questions using TF-IDF and LDA models.Then, we extended the word vector features with Word2Vec and applied them to the classification of health questions.[Results] The proposed method yielded better classification results with the TF-IDF as keyword extraction method and the complete questions/answers as training corpus. The number of words in the reserved dictionary was 600, and the language model was CBOW. The values of our optimal model’s P, R, F were 0.987 2, 0.972 5 and 0.979 8 respectively.[Limitations] We did not extracted keywords of short medical texts with semantic depth.[Conclusions] Our new classification model has better performance than the existing ones.
|
Received: 04 December 2019
Published: 25 July 2020
|
|
Corresponding Authors:
Gao Hexuan
E-mail: gaohexuan@whu.edu.com
|
[1] |
国家中医药管理局. 关于深入开展“互联网+医疗健康”便民惠民活动的通知[EB/OL] [ 2018- 08- 12]. http://gcs.satcm.gov.cn/zhengcewenjian/2018-07-18/7410.html.
|
[1] |
( The “Internet+Medical and Health Convenience and Benefited Activities” Printed and Distributed by National Health Commission[EB/OL]. [ 2018- 08- 12]. http://gcs.satcm.gov.cn/zhengcewenjian/2018-07-18/7410.html
|
[2] |
Wang X, Zuo Z Y, Zhao K. The Evolution of User Roles in Online Health Communities-a Social Support Perspective[C] //Proceedings of Pacific Asia Conference on Information Systems. 2015: 48-56.
|
[3] |
Dogan T, Uysal A K. Improved Inverse Gravity Moment Term Weighting for Text Classification[J]. Expert Systems with Applications, 2019,130:45-59.
|
[4] |
雷朔, 刘旭敏, 徐维祥. 基于词向量特征扩展的中文短文本分类研究[J]. 计算机应用与软件, 2018,35(8):269-274.
|
[4] |
( Lei Shuo, Liu Xumin, Xu Weixiang. Chinese Short Text Classification Based on Word Vector Extension[J]. Computer Applications and Software, 2018,35(8):269-274.)
|
[5] |
曾庆田, 胡晓慧, 李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法[J]. 数据分析与知识发现, 2019,3(7):52-60.
|
[5] |
( Zeng Qingtian, Hu Xiaohui, Li Chao. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. Data Analysis and Knowledge Discovery, 2019,3(7):52-60.)
|
[6] |
夏威. 基于深度学习模型的问题分类[D]. 长沙: 湖南大学, 2018.
|
[6] |
( Xia Wei. Question Classification Based on Deep Learning Model[D]. Changsha: Hunan University, 2018.)
|
[7] |
陈科文, 张祖平, 龙军. 文本分类中基于熵的词权重计算方法研究[J]. 计算机科学与探索, 2016,10(9):1299-1309.
|
[7] |
( Chen Kewen, Zhang Zuping, Long Jun. Research on Entropy-Based Term Weighting Methods in Text Categorization[J]. Journal of Frontiers of Computer Science and Technology, 2016,10(9):1299-1309.)
|
[8] |
王祥翔, 方荟, 陈崇成. 基于朴素贝叶斯的文化旅游文本分类技术研究[J]. 福州大学学报(自然科学版), 2018,46(5):644-649.
|
[8] |
( Wang Xiangxiang, Fang Hui, Chen Chongcheng. Classification Technique of Cultural Tourism Text Based on Naive Bayes[J]. Journal of Fuzhou University (Natural Science Edition), 2018,46(5):644-649.)
|
[9] |
王东波, 何琳, 黄水清. 基于支持向量机的先秦诸子典籍自动分类研究[J]. 图书情报工作, 2017,61(12):71-76.
|
[9] |
( Wang Dongbo, He Lin, Huang Shuiqing. Research of Automatic Classification for Pre-Qin Philosophers Literature Based on the Support Vector Machine[J]. Library and Information Service, 2017,61(12):71-76.)
|
[10] |
Rushdi M, Saleh M T, Martín V A, et al. Experiments with SVM to Classify Opinions in Different Domains[J]. Expert Systems with Applications, 2011,38(12):14799-14804.
|
[11] |
Huang G, Li Y, Wang Q, et al. Automatic Classification Method for Software Vulnerability Based on Deep Neural Network[J]. IEEE Access, 2019.DOI: 10.1109/ACCESS.2019.2900462.
doi: 10.1109/ACCESS.2019.2894092
pmid: 31741809
|
[12] |
吕超镇, 姬东鸿, 吴飞飞. 基于LDA特征扩展的短文本分类[J]. 计算机工程与应用, 2015,51(4):123-127.
|
[12] |
( Lv Chaozhen, Ji Donghong, Wu Feifei. Short Text Classification Based on Expanding Feature of LDA[J]. Computer Engineering and Applications, 2015,51(4):123-127.)
|
[13] |
黄贤英, 谢晋, 龙姝言. 融合词向量及BTM模型的问题分类方法[J]. 计算机工程与设计, 2019,40(2):91-95.
|
[13] |
( Huang Xianying, Xie Jin, Long Shuyan. Question Classification Method Combining Word Vector and BTM Model[J]. Computer Engineering and Design, 2019,40(2):91-95.)
|
[14] |
Luo L. Network Text Sentiment Analysis Method Combining LDA Text Representation and GRU-CNN[J]. Personal and Ubiquitous Computing, 2019,23(3-4):405-412.
|
[15] |
De Boom C, Van Canneyt S, Demeester T, et al. Representation Learning for Very Short Texts Using Weighted Word Embedding Aggregation[J]. Pattern Recognition Letters, 2016,80:150-156.
|
[16] |
蔡慧苹. 基于卷积神经网络的短文本分类方法研究[D]. 重庆: 西南大学, 2016.
|
[16] |
( Cai Huiping. Research of Short-text Classification Method Based on Convolution Neural Network[D]. Chongqing: Southwest University, 2016.)
|
[17] |
Mikolov T, Zweig G. Context Dependent Recurrent Neural Network Language Model[C] //Proceedings of the 2012 IEEE Workshop on Spoken Language Technology. 2012: 234-239.
|
[18] |
杨开平. 基于语义相似度的中文文本聚类算法研究[D]. 成都: 电子科技大学, 2018.
|
[18] |
( Yang Kaiping. Study on the Chinese Text Clustering Algorithm Based on Semantic Similarity[D]. Chengdu: University of Electronic Science and Technology of China, 2018.)
|
[19] |
谢志峰, 吴佳萍, 马利庄. 基于卷积神经网络的中文财经新闻分类方法[J]. 山东大学学报(工学版), 2018,48(3):34-39.
|
[19] |
( Xie Zhifeng, Wu Jiaping, Ma Lizhuang. Chinese Financial News Classification Method Based on Convolutional Neural Network[J]. Journal of Shandong University (Engineering Science), 2018,48(3):34-39.)
|
[20] |
张闯. 基于深度学习的知乎标题的多标签文本分类[D]. 北京: 北京交通大学, 2018.
|
[20] |
( Zhang Chuang. Multi-Label Text Categorization of Zhihu Title Based on Deep Learning[D]. Beijing: Beijing Jiaotong University, 2018.)
|
[21] |
Christodoulou V, Filgueira R, Bee E, et al. Automatic Classification of Aurora-related Tweets Using Machine Learning Methods[C] //Proceedings of the 2nd International Conference on Geoinformatics and Data Analysis. 2019: 115-119.
|
[22] |
Yang Z, Fan K F, Lai X X, et al. Short Texts Classification Through Reference Document Expansion[J]. Chinese Journal of Electronics, 2014,32(2):315-321.
|
[23] |
Li X, Gao F, Ding C. The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension HowNet[C] //Proceedings of the International Conference on Intelligent and Control and Computer Application. 2016: 244-247.
|
[24] |
邵云飞, 刘东苏. 基于类别特征扩展的短文本分类方法研究[J]. 数据分析与知识发现, 2019,3(9):60-67.
|
[24] |
( Shao Yunfei, Liu Dongsu. Classifying Short-texts with Class Feature Extension[J]. Data Analysis and Knowledge Discovery, 2019,3(9):60-67.)
|
[25] |
靳一凡, 傅颖勋, 马礼. 基于频繁项特征扩展的短文本分类方法[J]. 计算机科学, 2019,46(S1):478-481.
|
[25] |
( Jin Yifan, Fu Yingxun, Ma Li. Method of Short Text Classification Based on Frequent Item Feature Extension[J]. Computer Science, 2019,46(S1):478-481.)
|
[26] |
张群, 王红军, 王伦文. 词向量与LDA相融合的短文本分类方法[J]. 现代图书情报技术, 2016(12):27-35.
|
[26] |
( Zhang Qun, Wang Hongjun, Wang Lunwen. Classifying Short Texts with Word Embedding and LDA Model[J]. New Technology of Library and Information Service, 2016(12):27-35.)
|
[27] |
Sun F, Chen H. Feature Extension for Chinese Short Text Classification Based on LDA and Word2vec[C] //Proceedings of the 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). 2018: 1189-1194.
|
[28] |
李杰, 李欢. 基于深度学习的短文本评论产品特征提取及情感分类研究[J]. 情报理论与实践, 2018,41(2):143-148.
|
[28] |
( Li Jie, Li Huan. Research on Product Feature Extraction and Sentiment Classification of Short Online Review Based on Deep Learning[J]. Information Studies: Theory & Application, 2018,41(2):143-148.)
|
[29] |
邱宁佳, 丛琳, 周思丞, 等. 结合改进主动学习的SVD-CNN弹幕文本分类算法[J]. 计算机应用, 2019,39(3):644-650.
|
[29] |
( Qiu Ningjia, Cong Lin, Zhou Sicheng, et al. SVD-CNN Barrage Text Classification Algorithm Combined with Improved Active Learning[J]. Journal of Computer Applications, 2019,39(3):644-650.)
|
[30] |
周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017,40(6):1229-1251.
|
[30] |
( Zhou Feiyan, Jin Linpeng, Dong Jun. Review of Convolutional Neural Network[J]. Chinese Journal of Computers, 2017,40(6):1229-1251.)
|
[31] |
何明月, 赵桂华. 慢性皮肤病患者生活质量的研究进展[J]. 护理实践与研究, 2013,10(5):118-119.
|
[31] |
( He Mingyue, Zhao Guihua. Research Progress of Quality of Life in Patients with Chronic Dermatosis[J]. Nursing Practice and Research, 2013,10(5):118-119.)
|
[32] |
王茂全. 深度特征学习在句子文本分类中的研究及应用[D]. 上海: 华东师范大学, 2018.
|
[32] |
( Wang Maoquan. Study and Application of Deep Learning in Sentence-level Text Classification[D]. Shanghai: East China Normal University, 2018.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|