Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (7): 32-43    DOI: 10.11925/infotech.2096-3467.2021.1148
Recommending Medical Literature with Random Forest Model and Query Expansion
Ding Hao1,2(),Hu Guangwei1,2(),Qi Jianglei1,Zhuang Guangguang3
1School of Information Management, Nanjing University, Nanjing 210023, China
2Institute of Government Data Resources, Nanjing University, Nanjing 210023, China
3School of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China
[Objective] This paper tries to find valuable contents from a large number of medical literatures, aiming to help physicians make diagnosis and improve medical literature recommendation. [Methods] We proposed a new method based on the random forest model and keyword query expansion. First, we used the MeSH dictionary and the automatically constructed acronym dictionary to establish the complete relationship between keywords and corresponding articles at three levels of sentence, paragraph and document. Then, we calculated the multiple similarity between topics and articles. For each article, the PageRank and Authority weights of HITS were calculated through the citation network in the literature set. [Results] Compared with the average of the 10 values with the highest NDCG@100 value from the TREC clinical decision support follow-up evaluation, the overall average difference of the proposed method was within 0.9%, which was very small. [Limitations] Some new literatures or the “Sleeping Beauty” literature may have lower retrieval ranking due to low citation in the early stage. Our method cannot make accurate recommendations for these papers. [Conclusions] The proposed method effectively improves the medical literature recommendation.

Key wordsLiterature Recommendation      Clinical Decision Support      Random ForestQuery      Keyword Query Extension     
Received: 11 October 2021      Published: 31 December 2021
ZTFLH:  TP391  
Fund:National Social Science Fund of China(20&ZD154);National Natural Science Foundation of China(71573117);State Grid Jiangsu Electric Power Company Management Consulting Project(SGJSYF00YHJS2000144)
Corresponding Authors: Hu Guangwei,ORCID:0000-0003-1303-363X     E-mail:

Ding Hao, Hu Guangwei, Qi Jianglei, Zhuang Guangguang. Recommending Medical Literature with Random Forest Model and Query Expansion. Data Analysis and Knowledge Discovery, 2022, 6(7): 32-43.

Three-Tier Keyword Extraction Model of Sentence, Paragraph and Full Text
文献识别号 文献标题 Score of Relevance
3258729 Epipericardial fat necrosis - a rare cause of pleuritic chest pain: case report and review of the literature 2
3430116 Treatment of localized neuropathic pain after disk herniation with 5% lidocaine medicated plaster 0
3693649 Value of exercise tolerance testing in evaluation of diabetic patients presented with atypical chest discomfort N
3289164 The effect of exercise in PCOS women who exercise regularly N
3856285 A case of acute aortic dissection presenting with chest pain relieved by sublingual nitroglycerin 2
3809224 Chest pain as a presenting complaint in patients with acute myocardial infarction (AMI) 2
3377034 A right coronary artery aneurysm associated with chest pain: a case report 0
3772772 Exploring the information needs of patients with unexplained chest pain 0
3487367 Resource utilization reduction for evaluation of chest pain in pediatrics using a novel standardized clinical assessment and management plan (SCAMP) 0
3345151 Anxiety and depression symptoms in chest pain patients referred for the exercise stress test 0
Top 10 Results Retrieved on Lucene by Q0 (Test I)
文献识别号 文献标题 Score of Relevance
3258729 Epipericardial fat necrosis - a rare cause of pleuritic chest pain: case report and review of the literature 2
3339066 Hiatal hernia: an unusual presentation of dyspnea 0
3809224 Chest Pain as a presenting complaint in patients with acute myocardial infarction (AMI) 2
3821244 Dyspnea on exertion in patients of heart failure as a consequence of obesity: an observational study N
3658210 Polyostotic fibrous dysplasia of the ribs: an unusual cause of chest pain and dyspnea 0
3189858 Complex regional pain syndrome with associated chest wall dystonia: a case report N
3712160 Cytomegalovirus esophagitis presents as chest pain in a renal transplant recipient 0
3485125 Right sided arcus aorta as a cause of dyspnea and chronic cough N
3481681 Increasing serum troponin I and early prognosis in patients with chest pain or angina equivalent symptoms in
the emergency department
3133519 Migraine with benign episodic unilateral mydriasis N
Top 10 Results Retrieved on Lucene by Q1 (Test II)
NDCG@TOPK of Exp.I and II
Map@TopK of Exp.I and II
Relation Between the Number of Decision Trees and Accuracy of Test
Relation Between the Value of mfeature and Accuracy of Test
NDCG@K Results in 3 Cases
Comparison of MAP@K in 3 Cases
主题 trec_best_NDCG random_forest_model_using_six_features_NDCG
1 0.438 2 0.471 8
2 0.446 4 0.654 6
3 0.993 2 0.500 0
4 0.673 0 0.169 8
5 0.354 1 0.349 9
6 0.388 8 0.385 0
7 0.309 7 0.730 8
8 0.501 6 0.451 7
9 0.305 1 0.416 8
10 0.380 3 0.567 7
Comparative Experiment Based on NDCG@100
