Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (7): 32-43    DOI: 10.11925/infotech.2096-3467.2021.1148
Original article Current Issue | Archive | Adv Search |
Recommending Medical Literature with Random Forest Model and Query Expansion
Ding Hao1,2(),Hu Guangwei1,2(),Qi Jianglei1,Zhuang Guangguang3
1School of Information Management, Nanjing University, Nanjing 210023, China
2Institute of Government Data Resources, Nanjing University, Nanjing 210023, China
3School of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China
Download: PDF (2024 KB)   HTML ( 30
Export: BibTeX | EndNote (RIS)      

[Objective] This paper tries to find valuable contents from a large number of medical literatures, aiming to help physicians make diagnosis and improve medical literature recommendation. [Methods] We proposed a new method based on the random forest model and keyword query expansion. First, we used the MeSH dictionary and the automatically constructed acronym dictionary to establish the complete relationship between keywords and corresponding articles at three levels of sentence, paragraph and document. Then, we calculated the multiple similarity between topics and articles. For each article, the PageRank and Authority weights of HITS were calculated through the citation network in the literature set. [Results] Compared with the average of the 10 values with the highest NDCG@100 value from the TREC clinical decision support follow-up evaluation, the overall average difference of the proposed method was within 0.9%, which was very small. [Limitations] Some new literatures or the “Sleeping Beauty” literature may have lower retrieval ranking due to low citation in the early stage. Our method cannot make accurate recommendations for these papers. [Conclusions] The proposed method effectively improves the medical literature recommendation.

Key wordsLiterature Recommendation      Clinical Decision Support      Random ForestQuery      Keyword Query Extension     
Received: 11 October 2021      Published: 31 December 2021
ZTFLH:  TP391  
Fund:National Social Science Fund of China(20&ZD154);National Natural Science Foundation of China(71573117);State Grid Jiangsu Electric Power Company Management Consulting Project(SGJSYF00YHJS2000144)
Corresponding Authors: Hu Guangwei,ORCID:0000-0003-1303-363X     E-mail:

Cite this article:

Ding Hao, Hu Guangwei, Qi Jianglei, Zhuang Guangguang. Recommending Medical Literature with Random Forest Model and Query Expansion. Data Analysis and Knowledge Discovery, 2022, 6(7): 32-43.

URL:     OR

Three-Tier Keyword Extraction Model of Sentence, Paragraph and Full Text
文献识别号 文献标题 Score of Relevance
3258729 Epipericardial fat necrosis - a rare cause of pleuritic chest pain: case report and review of the literature 2
3430116 Treatment of localized neuropathic pain after disk herniation with 5% lidocaine medicated plaster 0
3693649 Value of exercise tolerance testing in evaluation of diabetic patients presented with atypical chest discomfort N
3289164 The effect of exercise in PCOS women who exercise regularly N
3856285 A case of acute aortic dissection presenting with chest pain relieved by sublingual nitroglycerin 2
3809224 Chest pain as a presenting complaint in patients with acute myocardial infarction (AMI) 2
3377034 A right coronary artery aneurysm associated with chest pain: a case report 0
3772772 Exploring the information needs of patients with unexplained chest pain 0
3487367 Resource utilization reduction for evaluation of chest pain in pediatrics using a novel standardized clinical assessment and management plan (SCAMP) 0
3345151 Anxiety and depression symptoms in chest pain patients referred for the exercise stress test 0
Top 10 Results Retrieved on Lucene by Q0 (Test I)
文献识别号 文献标题 Score of Relevance
3258729 Epipericardial fat necrosis - a rare cause of pleuritic chest pain: case report and review of the literature 2
3339066 Hiatal hernia: an unusual presentation of dyspnea 0
3809224 Chest Pain as a presenting complaint in patients with acute myocardial infarction (AMI) 2
3821244 Dyspnea on exertion in patients of heart failure as a consequence of obesity: an observational study N
3658210 Polyostotic fibrous dysplasia of the ribs: an unusual cause of chest pain and dyspnea 0
3189858 Complex regional pain syndrome with associated chest wall dystonia: a case report N
3712160 Cytomegalovirus esophagitis presents as chest pain in a renal transplant recipient 0
3485125 Right sided arcus aorta as a cause of dyspnea and chronic cough N
3481681 Increasing serum troponin I and early prognosis in patients with chest pain or angina equivalent symptoms in
the emergency department
3133519 Migraine with benign episodic unilateral mydriasis N
Top 10 Results Retrieved on Lucene by Q1 (Test II)
NDCG@TOPK of Exp.I and II
Map@TopK of Exp.I and II
Relation Between the Number of Decision Trees and Accuracy of Test
Relation Between the Value of mfeature and Accuracy of Test
NDCG@K Results in 3 Cases
Comparison of MAP@K in 3 Cases
主题 trec_best_NDCG random_forest_model_using_six_features_NDCG
1 0.438 2 0.471 8
2 0.446 4 0.654 6
3 0.993 2 0.500 0
4 0.673 0 0.169 8
5 0.354 1 0.349 9
6 0.388 8 0.385 0
7 0.309 7 0.730 8
8 0.501 6 0.451 7
9 0.305 1 0.416 8
10 0.380 3 0.567 7
Comparative Experiment Based on NDCG@100
[1] 唐圆圆, 高东平, 池慧. 2005-2015年移动医疗文献计量学研究[J]. 中国数字医学, 2016, 11(2): 104-106.
[1] ( Tang Yuanyuan, Gao Dongping, Chi Hui. Bibliometric Study of Mobile Health Literature in 2005-2015[J]. China Digital Medicine, 2016, 11(2): 104-106.)
[2] Kuperman G J, Reichley R M, Bailey T C. Using Commercial Knowledge Bases for Clinical Decision Support: Opportunities, Hurdles, and Recommendations[J]. Journal of the American Medical Informatics Association, 2006, 13(4): 369-371.
doi: 10.1197/jamia.M2055 pmid: 16622160
[3] Makary M A, Daniel M. Medical Error - The Third Leading Cause of Death in the US[J]. BMJ(Clinical Research Ed), 2016, 353: i2139.
[4] Shortliffe E H, Davis R, Axline S G, et al. Computer-Based Consultations in Clinical Therapeutics: Explanation and Rule Acquisition Capabilities of the MYCIN System[J]. Computers and Biomedical Research, 1975, 8(4): 303-320.
pmid: 1157471
[5] Walker M G, Blum R, Fagan L M. Minimycin: A Miniature Rule-Based System[J]. M. D. Computing: Computers in Medical Practice, 1985, 2(4): 21-27, 46.
[6] 张晓博, 杨燕, 李天瑞, 等. 基于医疗文本数据聚类的帕金森病早期诊断预测[J]. 计算机应用, 2020, 40(10): 3088-3094.
doi: 10.11772/j.issn.1001-9081.2020030359
[6] ( Zhang Xiaobo, Yang Yan, Li Tianrui, et al. Early Diagnosis and Prediction of Parkinson’s Disease Based on Clustering Medical Text Data[J]. Journal of Computer Applications, 2020, 40(10): 3088-3094.)
doi: 10.11772/j.issn.1001-9081.2020030359
[7] Doyle-Lindrud S. Watson will See You Now: A Supercomputer to Help Clinicians Make Informed Treatment Decisions[J]. Clinical Journal of Oncology Nursing, 2015, 19(1): 31-32.
doi: 10.1188/15.CJON.31-32 pmid: 25689646
[8] Henderson E J, Rubin G P. The Utility of an Online Diagnostic Decision Support System(Isabel) in General Practice: A Process Evaluation[J]. JRSM Short Reports, 2013, 4(5): 31.
doi: 10.1177/2042533313476691 pmid: 23772310
[9] Carney P H. Information Technology and Precision Medicine[J]. Seminars in Oncology Nursing, 2014, 30(2): 124-129.
doi: 10.1016/j.soncn.2014.03.006
[10] Birndorf N I, Pentecost J O, Coakley J R, et al. An Expert System to Diagnose Anemia and Report Results Directly on Hematology Forms[J]. Computers and Biomedical Research, 1996, 29(1): 16-26.
pmid: 8689871
[11] 秦健, 侯建新, 谢怡宁, 等. 医疗文本的小样本命名实体识别[J]. 哈尔滨理工大学学报, 2021, 26(4): 94-101.
[11] ( Qin Jian, Hou Jianxin, Xie Yining, et al. Few-Shot Named Entity Recognition for Medical Text[J]. Journal of Harbin University of Science and Technology, 2021, 26(4): 94-101.)
[12] Umbaugh S E, Moss R H, Stoecker W V. Applying Artificial Intelligence to the Identification of Variegated Coloring in Skin Tumors[J]. IEEE Engineering in Medicine and Biology Magazine, 1991, 10(4): 57-62.
pmid: 18238392
[13] 张坤丽, 赵旭, 关同峰, 等. 面向医疗文本的实体及关系标注平台的构建及应用[J]. 中文信息学报, 2020, 34(6): 36-44.
[13] ( Zhang Kunli, Zhao Xu, Guan Tongfeng, et al. A Platform for Entity and Entity Relationship Labeling in Medical Texts[J]. Journal of Chinese Information Processing, 2020, 34(6): 36-44.)
[14] Liu H, Lufei H P, Shi W, et al. Towards Ubiquitous Access of Computer-Assisted Surgery Systems[C]// Proceedings of the 28th IEEE Engineering in Medicine and Biology Society Annual International Conference. 2006: 4428-4431.
[15] 胡佳慧, 赵琬清, 方安, 等. 基于医疗大数据的临床文本处理与知识发现方法研究[J]. 中国数字医学, 2020, 15(7): 11-13.
[15] ( Hu Jiahui, Zhao Wanqing, Fang An, et al. Research on Clinical Text Processing and Knowledge Discovery Method Based on Medical Big Data[J]. China Digital Medicine, 2020, 15(7): 11-13.)
[16] 张博, 孙逸, 李孟颖, 等. 基于迁移学习和集成学习的医学短文本分类[J]. 山西大学学报(自然科学版), 2020, 43(4): 947-954.
[16] Zhang Bo, Sun Yi, Li Mengying, et al. Medical Text Classification Based on Transfer Learning and Deep Learning[J]. Journal of Shanxi University(Natural Science Edition), 2020, 43(4): 947-954.)
[17] McCoy A B, Waitman L R, Lewis J B, et al. A Framework for Evaluating the Appropriateness of Clinical Decision Support Alerts and Responses[J]. Journal of the American Medical Informatics Association, 2012, 19(3): 346-352.
doi: 10.1136/amiajnl-2011-000185
[18] 龚乐君, 张知菲. 基于领域词典与CRF双层标注的中文电子病历实体识别[J]. 工程科学学报, 2020, 42(4): 469-475.
[18] ( Gong Lejun, Zhang Zhifei. Clinical Named Entity Recognition from Chinese Electronic Medical Records Using a Double-Layer Annotation Model Combining a Domain Dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475.)
[19] Kuperman G J, Bobb A, Payne T H, et al. Medication-Related Clinical Decision Support in Computerized Provider Order Entry Systems: A Review[J]. Journal of the American Medical Informatics Association, 2007, 14(1): 29-40.
pmid: 17068355
[20] 李强, 李瑶坤, 夏书月, 等. 一种改进的医疗文本分类模型: LS-GRU[J]. 东北大学学报(自然科学版), 2020, 41(7): 938-942.
[20] Li Qiang, Li Yaokun, Xia Shuyue, et al. An Improved Medical Text Classification Model: LS-GRU[J]. Journal of Northeastern University(Natural Science), 2020, 41(7): 938-942.)
[21] De la Rosa Algarin A. Clinical Decision Support Systems in Biomedical Informatics and Their Limitations[OL].
[22] 胡嘉豪, 孙焱, 程景民. 基于CiteSpace的互联网医疗研究文献可视化分析[J]. 中国数字医学, 2020, 15(10): 59-61.
[22] ( Hu Jiahao, Sun Yan, Cheng Jingmin. Visual Analysis on Research Literatures of Internet Medical Care Based on CiteSpace[J]. China Digital Medicine, 2020, 15(10): 59-61.)
[23] Liu N, Sakamoto J T, Cao J W, et al. Ensemble-Based Risk Scoring with Extreme Learning Machine for Prediction of Adverse Cardiac Events[J]. Cognitive Computation, 2017, 9(4): 545-554.
doi: 10.1007/s12559-017-9455-7
[24] Thanh N D, Ali M, Son L H. A Novel Clustering Algorithm in a Neutrosophic Recommender System for Medical Diagnosis[J]. Cognitive Computation, 2017, 9(4): 526-544.
doi: 10.1007/s12559-017-9462-8
[25] 牟冬梅, 张艳侠, 黄丽丽, 等. 基于SNOMED CT和FCA的医学领域本体构建研究[J]. 情报学报, 2013, 32(6): 653-662.
[25] ( Mu Dongmei, Zhang Yanxia, Huang Lili, et al. Constructing Medical Ontology Based on SNOMED CT and FCA[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(6): 653-662.)
[26] Elkin P L, Liebow M, Bauer B A, et al. The Introduction of a Diagnostic Decision Support System (DXplain™) into the Workflow of a Teaching Hospital Service can Decrease the Cost of Service for Diagnostically Challenging Diagnostic Related Groups(DRGS)[J]. International Journal of Medical Informatics, 2010, 79(11): 772-777.
doi: 10.1016/j.ijmedinf.2010.09.004
[27] Barnett G O, Cimino J J, Hupp J A, et al. DXplain.An Evolving Diagnostic Decision-Support System[J]. JAMA, 1987, 258(1): 67-74.
doi: 10.1001/jama.1987.03400010071030
[28] Heckerman D E. A Tractable Inference Algorithm for Diagnosing Multiple Diseases[J]. Machine Intelligence and Pattern Recognition, 1990, 10: 163-171.
[29] Shwe M A, Middleton B, Heckerman D E, et al. Probabilistic Diagnosis Using a Reformulation of the INTERNIST-1/QMR Knowledge Base. I. the Probabilistic Model and Inference Algorithms[J]. Methods of Information in Medicine, 1991, 30(4): 241-255.
pmid: 1762578
[30] Klimov D, Shahar Y. iALARM:An Intelligent Alert Language for Activation, Response, and Monitoring of Medical Alerts[A]// Process Support and Knowledge Representation in Health Care[M]. 2013: 128-142.
[31] Elhadad N, Kan M Y, Klavans J L, et al. Customization in a Unified Framework for Summarizing Medical Literature[J]. Artificial Intelligence in Medicine, 2005, 33(2): 179-198.
pmid: 15811784
[32] Jaspers M W M, Smeulers M, Vermeulen H, et al. Effects of Clinical Decision-Support Systems on Practitioner Performance and Patient Outcomes: A Synthesis of High-Quality Systematic Review Findings[J]. Journal of the American Medical Informatics Association, 2011, 18(3): 327-334.
doi: 10.1136/amiajnl-2011-000094
[33] Seidling H M, Phansalkar S, Seger D L, et al. Factors Influencing Alert Acceptance: A Novel Approach for Predicting the Success of Clinical Decision Support[J]. Journal of the American Medical Informatics Association, 2011, 18(4): 479-484.
doi: 10.1136/amiajnl-2010-000039 pmid: 21571746
[34] Mourão A, Martins F, Magalhães J. NovaSearch at TREC 2014 Clinical Decision Support Track[C]// Proceedings of the 23rd Text Retrieval Conference(TREC 2014). 2014.
[35] Wan R, Man J H, Chan T F. Query Modification Through External Sources to Support Clinical Decisions[R]. The Chinese University of Hong Kong, 2014.
[36] Xu T, Oard D W, McNamee P. HLTCOE at TREC 2014:Microblog and Clinical Decision Support[C]// Proceedings of the 23rd Text Retrieval Conference(TREC 2014). 2014.
[37] Hu F, Wu D T Y, Mei Q, et al. Learning from Medical Summaries:The University of Michigan at TREC 2015 Clinical Decision Support Track[C]// Proceedings of the 24th Text Retrieval Conference(TREC 2015). 2015.
[38] Nguyen G H, Soulier L, Tamine L, et al. IRIT@TREC 2016 Clinical Decision Support Track[C]// Proceedings of the 25th Text Retrieval Conference(TREC 2016). 2016.
[39] Greuter S, Junker P, Kuhn L, et al. ETH Zurich at TREC Clinical Decision Support 2016[C]// Proceedings of the 25th Text Retrieval Conference(TREC 2016). 2016.
[40] Cha M S, Han W J, Lee G, et al. LAMDA at TREC CDS Track 2015: Clinical Decision Support Track[R]. Ajou University, 2015.
[41] Agrafiotes C, Arampatzis A. Augmenting Medical Queries with UMLS Concepts via MetaMap[C]// Proceedings of the 25th Text Retrieval Conference (TREC 2016). 2016.
[42] Viswavarapu L K, Chen J, Cleveland A D, et al. UNT Medical Information Retrieval at TREC 2016[C]// Proceedings of the 25th Text Retrieval Conference (TREC 2016). 2016.
[43] Balaneshin-Kordan S, Kotov A, Xisto R. WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources[R]. Wayne State University, 2015.
[44] Choi S, Choi J. SNUMedinfo at TREC CDS Track 2014: Medical Case-based Retrieval Task[R]. Seoul National University, 2014.
[45] Wang Y, Rastegar-Mojarad M, Elayavilli R K, et al. An Ensemble Model of Clinical Information Extraction and Information Retrieval for Clinical Decision Support[C]// Proceedings of the 25th Text Retrieval Conference (TREC 2016). 2016.
[46] Mondal A, Cambria E, Das D, et al. Relation Extraction of Medical Concepts Using Categorization and Sentiment Analysis[J]. Cognitive Computation, 2018, 10(4): 670-685.
doi: 10.1007/s12559-018-9567-8
[1] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[2] Zhu Chaoyu, Liu Lei. A Review of Medical Decision Supports Based on Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(12): 26-32.
[3] Li Guolei, Chen Xianlai, Xia Dong, Yang Rong. Latent Semantic Analysis of Electronic Medical Record Text for Clinical Decision Making[J]. 数据分析与知识发现, 2016, 32(3): 50-57.
[4] Wei Meng. Literature Recommendation Using Evolution Patterns[J]. 现代图书情报技术, 2014, 30(4): 20-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938