Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (7): 32-43     https://doi.org/10.11925/infotech.2096-3467.2021.1148
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于随机森林和关键词查询扩展的医学文献推荐方法*
丁浩1,2(),胡广伟1,2(),齐江蕾1,庄光光3
1南京大学信息管理学院 南京 210023
2南京大学政务信息资源研究所 南京 210023
3南京财经大学信息工程学院 南京 210023
Recommending Medical Literature with Random Forest Model and Query Expansion
Ding Hao1,2(),Hu Guangwei1,2(),Qi Jianglei1,Zhuang Guangguang3
1School of Information Management, Nanjing University, Nanjing 210023, China
2Institute of Government Data Resources, Nanjing University, Nanjing 210023, China
3School of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China
全文: PDF (2024 KB)   HTML ( 38
输出: BibTeX | EndNote (RIS)      
摘要 

目的】从大量医学文献中发现有价值的内容以帮助临床医生做出诊断,提高医学文献推荐效果。【方法】基于随机森林模型与关键词查询扩展相结合的新方法,利用MeSH词典和自动构建的首字母缩略词词典,在句子、段落、文档三个层次上建立关键词与相应文章的完整关系,计算主题与文章之间的多重相似度,对于每篇文章通过文献集合中的引文网络计算HITS的PageRank权重和Authority权重。【结果】与TREC临床决策支持跟踪评价结果中NDCG@100最高的10个值的平均值相比,本文方法NDCG@100的总体平均值差距在0.9%以内,差距极小。【局限】 由于某些新文献或“睡美人”文献前期引用较低,可能会出现检索排名靠后,在此类情况下,本文方法无法进行精准推荐。【结论】通过计算主题与文章之间相似点和引文关系的权重,利用随机森林方法对查询扩展结果进行重新排序,可以有效提高医学文献推荐的效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
丁浩
胡广伟
齐江蕾
庄光光
关键词 文献推荐临床决策支持随机森林关键词查询扩展    
Abstract

[Objective] This paper tries to find valuable contents from a large number of medical literatures, aiming to help physicians make diagnosis and improve medical literature recommendation. [Methods] We proposed a new method based on the random forest model and keyword query expansion. First, we used the MeSH dictionary and the automatically constructed acronym dictionary to establish the complete relationship between keywords and corresponding articles at three levels of sentence, paragraph and document. Then, we calculated the multiple similarity between topics and articles. For each article, the PageRank and Authority weights of HITS were calculated through the citation network in the literature set. [Results] Compared with the average of the 10 values with the highest NDCG@100 value from the TREC clinical decision support follow-up evaluation, the overall average difference of the proposed method was within 0.9%, which was very small. [Limitations] Some new literatures or the “Sleeping Beauty” literature may have lower retrieval ranking due to low citation in the early stage. Our method cannot make accurate recommendations for these papers. [Conclusions] The proposed method effectively improves the medical literature recommendation.

Key wordsLiterature Recommendation    Clinical Decision Support    Random ForestQuery    Keyword Query Extension
收稿日期: 2021-10-11      出版日期: 2021-12-31
ZTFLH:  TP391  
基金资助:*国家社会科学基金重大项目(20&ZD154);国家自然科学基金面上项目(71573117);国网江苏省电力公司管理咨询项目的研究成果之一(SGJSYF00YHJS2000144)
通讯作者: 胡广伟,ORCID:0000-0003-1303-363X     E-mail: hugw@nju.edu.cn
引用本文:   
丁浩, 胡广伟, 齐江蕾, 庄光光. 基于随机森林和关键词查询扩展的医学文献推荐方法*[J]. 数据分析与知识发现, 2022, 6(7): 32-43.
Ding Hao, Hu Guangwei, Qi Jianglei, Zhuang Guangguang. Recommending Medical Literature with Random Forest Model and Query Expansion. Data Analysis and Knowledge Discovery, 2022, 6(7): 32-43.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.1148      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I7/32
Fig.1  句子、段落和全文三层关键词抽取模型
文献识别号 文献标题 Score of Relevance
3258729 Epipericardial fat necrosis - a rare cause of pleuritic chest pain: case report and review of the literature 2
3430116 Treatment of localized neuropathic pain after disk herniation with 5% lidocaine medicated plaster 0
3693649 Value of exercise tolerance testing in evaluation of diabetic patients presented with atypical chest discomfort N
3289164 The effect of exercise in PCOS women who exercise regularly N
3856285 A case of acute aortic dissection presenting with chest pain relieved by sublingual nitroglycerin 2
3809224 Chest pain as a presenting complaint in patients with acute myocardial infarction (AMI) 2
3377034 A right coronary artery aneurysm associated with chest pain: a case report 0
3772772 Exploring the information needs of patients with unexplained chest pain 0
3487367 Resource utilization reduction for evaluation of chest pain in pediatrics using a novel standardized clinical assessment and management plan (SCAMP) 0
3345151 Anxiety and depression symptoms in chest pain patients referred for the exercise stress test 0
Table 1  Q0在Lucene上检索到的前10个结果(测试I)
文献识别号 文献标题 Score of Relevance
3258729 Epipericardial fat necrosis - a rare cause of pleuritic chest pain: case report and review of the literature 2
3339066 Hiatal hernia: an unusual presentation of dyspnea 0
3809224 Chest Pain as a presenting complaint in patients with acute myocardial infarction (AMI) 2
3821244 Dyspnea on exertion in patients of heart failure as a consequence of obesity: an observational study N
3658210 Polyostotic fibrous dysplasia of the ribs: an unusual cause of chest pain and dyspnea 0
3189858 Complex regional pain syndrome with associated chest wall dystonia: a case report N
3712160 Cytomegalovirus esophagitis presents as chest pain in a renal transplant recipient 0
3485125 Right sided arcus aorta as a cause of dyspnea and chronic cough N
3481681 Increasing serum troponin I and early prognosis in patients with chest pain or angina equivalent symptoms in
the emergency department
1
3133519 Migraine with benign episodic unilateral mydriasis N
Table2  Q1在Lucene上检索到的前10个结果(测试II)
Fig.2  测试I和II的NDCG@K结果
Fig.3  测试I和II的MAP@K结果
Fig.4  决策树数目与测试准确率的关系
Fig.5  mfeature与测试准确率的关系
Fig.6  三个实验中NDCG@K结果对比
Fig.7  三个实验中Map@K结果对比
主题 trec_best_NDCG random_forest_model_using_six_features_NDCG
1 0.438 2 0.471 8
2 0.446 4 0.654 6
3 0.993 2 0.500 0
4 0.673 0 0.169 8
5 0.354 1 0.349 9
6 0.388 8 0.385 0
7 0.309 7 0.730 8
8 0.501 6 0.451 7
9 0.305 1 0.416 8
10 0.380 3 0.567 7
Table 3  算法的比较实验NDCG@100
[1] 唐圆圆, 高东平, 池慧. 2005-2015年移动医疗文献计量学研究[J]. 中国数字医学, 2016, 11(2): 104-106.
[1] ( Tang Yuanyuan, Gao Dongping, Chi Hui. Bibliometric Study of Mobile Health Literature in 2005-2015[J]. China Digital Medicine, 2016, 11(2): 104-106.)
[2] Kuperman G J, Reichley R M, Bailey T C. Using Commercial Knowledge Bases for Clinical Decision Support: Opportunities, Hurdles, and Recommendations[J]. Journal of the American Medical Informatics Association, 2006, 13(4): 369-371.
doi: 10.1197/jamia.M2055 pmid: 16622160
[3] Makary M A, Daniel M. Medical Error - The Third Leading Cause of Death in the US[J]. BMJ(Clinical Research Ed), 2016, 353: i2139.
[4] Shortliffe E H, Davis R, Axline S G, et al. Computer-Based Consultations in Clinical Therapeutics: Explanation and Rule Acquisition Capabilities of the MYCIN System[J]. Computers and Biomedical Research, 1975, 8(4): 303-320.
pmid: 1157471
[5] Walker M G, Blum R, Fagan L M. Minimycin: A Miniature Rule-Based System[J]. M. D. Computing: Computers in Medical Practice, 1985, 2(4): 21-27, 46.
[6] 张晓博, 杨燕, 李天瑞, 等. 基于医疗文本数据聚类的帕金森病早期诊断预测[J]. 计算机应用, 2020, 40(10): 3088-3094.
doi: 10.11772/j.issn.1001-9081.2020030359
[6] ( Zhang Xiaobo, Yang Yan, Li Tianrui, et al. Early Diagnosis and Prediction of Parkinson’s Disease Based on Clustering Medical Text Data[J]. Journal of Computer Applications, 2020, 40(10): 3088-3094.)
doi: 10.11772/j.issn.1001-9081.2020030359
[7] Doyle-Lindrud S. Watson will See You Now: A Supercomputer to Help Clinicians Make Informed Treatment Decisions[J]. Clinical Journal of Oncology Nursing, 2015, 19(1): 31-32.
doi: 10.1188/15.CJON.31-32 pmid: 25689646
[8] Henderson E J, Rubin G P. The Utility of an Online Diagnostic Decision Support System(Isabel) in General Practice: A Process Evaluation[J]. JRSM Short Reports, 2013, 4(5): 31.
doi: 10.1177/2042533313476691 pmid: 23772310
[9] Carney P H. Information Technology and Precision Medicine[J]. Seminars in Oncology Nursing, 2014, 30(2): 124-129.
doi: 10.1016/j.soncn.2014.03.006
[10] Birndorf N I, Pentecost J O, Coakley J R, et al. An Expert System to Diagnose Anemia and Report Results Directly on Hematology Forms[J]. Computers and Biomedical Research, 1996, 29(1): 16-26.
pmid: 8689871
[11] 秦健, 侯建新, 谢怡宁, 等. 医疗文本的小样本命名实体识别[J]. 哈尔滨理工大学学报, 2021, 26(4): 94-101.
[11] ( Qin Jian, Hou Jianxin, Xie Yining, et al. Few-Shot Named Entity Recognition for Medical Text[J]. Journal of Harbin University of Science and Technology, 2021, 26(4): 94-101.)
[12] Umbaugh S E, Moss R H, Stoecker W V. Applying Artificial Intelligence to the Identification of Variegated Coloring in Skin Tumors[J]. IEEE Engineering in Medicine and Biology Magazine, 1991, 10(4): 57-62.
pmid: 18238392
[13] 张坤丽, 赵旭, 关同峰, 等. 面向医疗文本的实体及关系标注平台的构建及应用[J]. 中文信息学报, 2020, 34(6): 36-44.
[13] ( Zhang Kunli, Zhao Xu, Guan Tongfeng, et al. A Platform for Entity and Entity Relationship Labeling in Medical Texts[J]. Journal of Chinese Information Processing, 2020, 34(6): 36-44.)
[14] Liu H, Lufei H P, Shi W, et al. Towards Ubiquitous Access of Computer-Assisted Surgery Systems[C]// Proceedings of the 28th IEEE Engineering in Medicine and Biology Society Annual International Conference. 2006: 4428-4431.
[15] 胡佳慧, 赵琬清, 方安, 等. 基于医疗大数据的临床文本处理与知识发现方法研究[J]. 中国数字医学, 2020, 15(7): 11-13.
[15] ( Hu Jiahui, Zhao Wanqing, Fang An, et al. Research on Clinical Text Processing and Knowledge Discovery Method Based on Medical Big Data[J]. China Digital Medicine, 2020, 15(7): 11-13.)
[16] 张博, 孙逸, 李孟颖, 等. 基于迁移学习和集成学习的医学短文本分类[J]. 山西大学学报(自然科学版), 2020, 43(4): 947-954.
[16] Zhang Bo, Sun Yi, Li Mengying, et al. Medical Text Classification Based on Transfer Learning and Deep Learning[J]. Journal of Shanxi University(Natural Science Edition), 2020, 43(4): 947-954.)
[17] McCoy A B, Waitman L R, Lewis J B, et al. A Framework for Evaluating the Appropriateness of Clinical Decision Support Alerts and Responses[J]. Journal of the American Medical Informatics Association, 2012, 19(3): 346-352.
doi: 10.1136/amiajnl-2011-000185
[18] 龚乐君, 张知菲. 基于领域词典与CRF双层标注的中文电子病历实体识别[J]. 工程科学学报, 2020, 42(4): 469-475.
[18] ( Gong Lejun, Zhang Zhifei. Clinical Named Entity Recognition from Chinese Electronic Medical Records Using a Double-Layer Annotation Model Combining a Domain Dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475.)
[19] Kuperman G J, Bobb A, Payne T H, et al. Medication-Related Clinical Decision Support in Computerized Provider Order Entry Systems: A Review[J]. Journal of the American Medical Informatics Association, 2007, 14(1): 29-40.
pmid: 17068355
[20] 李强, 李瑶坤, 夏书月, 等. 一种改进的医疗文本分类模型: LS-GRU[J]. 东北大学学报(自然科学版), 2020, 41(7): 938-942.
[20] Li Qiang, Li Yaokun, Xia Shuyue, et al. An Improved Medical Text Classification Model: LS-GRU[J]. Journal of Northeastern University(Natural Science), 2020, 41(7): 938-942.)
[21] De la Rosa Algarin A. Clinical Decision Support Systems in Biomedical Informatics and Their Limitations[OL]. https://sdcse.engr.uconn.edu//Cse5810/delarosa.pdf.2011:643-674.
[22] 胡嘉豪, 孙焱, 程景民. 基于CiteSpace的互联网医疗研究文献可视化分析[J]. 中国数字医学, 2020, 15(10): 59-61.
[22] ( Hu Jiahao, Sun Yan, Cheng Jingmin. Visual Analysis on Research Literatures of Internet Medical Care Based on CiteSpace[J]. China Digital Medicine, 2020, 15(10): 59-61.)
[23] Liu N, Sakamoto J T, Cao J W, et al. Ensemble-Based Risk Scoring with Extreme Learning Machine for Prediction of Adverse Cardiac Events[J]. Cognitive Computation, 2017, 9(4): 545-554.
doi: 10.1007/s12559-017-9455-7
[24] Thanh N D, Ali M, Son L H. A Novel Clustering Algorithm in a Neutrosophic Recommender System for Medical Diagnosis[J]. Cognitive Computation, 2017, 9(4): 526-544.
doi: 10.1007/s12559-017-9462-8
[25] 牟冬梅, 张艳侠, 黄丽丽, 等. 基于SNOMED CT和FCA的医学领域本体构建研究[J]. 情报学报, 2013, 32(6): 653-662.
[25] ( Mu Dongmei, Zhang Yanxia, Huang Lili, et al. Constructing Medical Ontology Based on SNOMED CT and FCA[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(6): 653-662.)
[26] Elkin P L, Liebow M, Bauer B A, et al. The Introduction of a Diagnostic Decision Support System (DXplain™) into the Workflow of a Teaching Hospital Service can Decrease the Cost of Service for Diagnostically Challenging Diagnostic Related Groups(DRGS)[J]. International Journal of Medical Informatics, 2010, 79(11): 772-777.
doi: 10.1016/j.ijmedinf.2010.09.004
[27] Barnett G O, Cimino J J, Hupp J A, et al. DXplain.An Evolving Diagnostic Decision-Support System[J]. JAMA, 1987, 258(1): 67-74.
doi: 10.1001/jama.1987.03400010071030
[28] Heckerman D E. A Tractable Inference Algorithm for Diagnosing Multiple Diseases[J]. Machine Intelligence and Pattern Recognition, 1990, 10: 163-171.
[29] Shwe M A, Middleton B, Heckerman D E, et al. Probabilistic Diagnosis Using a Reformulation of the INTERNIST-1/QMR Knowledge Base. I. the Probabilistic Model and Inference Algorithms[J]. Methods of Information in Medicine, 1991, 30(4): 241-255.
pmid: 1762578
[30] Klimov D, Shahar Y. iALARM:An Intelligent Alert Language for Activation, Response, and Monitoring of Medical Alerts[A]// Process Support and Knowledge Representation in Health Care[M]. 2013: 128-142.
[31] Elhadad N, Kan M Y, Klavans J L, et al. Customization in a Unified Framework for Summarizing Medical Literature[J]. Artificial Intelligence in Medicine, 2005, 33(2): 179-198.
pmid: 15811784
[32] Jaspers M W M, Smeulers M, Vermeulen H, et al. Effects of Clinical Decision-Support Systems on Practitioner Performance and Patient Outcomes: A Synthesis of High-Quality Systematic Review Findings[J]. Journal of the American Medical Informatics Association, 2011, 18(3): 327-334.
doi: 10.1136/amiajnl-2011-000094
[33] Seidling H M, Phansalkar S, Seger D L, et al. Factors Influencing Alert Acceptance: A Novel Approach for Predicting the Success of Clinical Decision Support[J]. Journal of the American Medical Informatics Association, 2011, 18(4): 479-484.
doi: 10.1136/amiajnl-2010-000039 pmid: 21571746
[34] Mourão A, Martins F, Magalhães J. NovaSearch at TREC 2014 Clinical Decision Support Track[C]// Proceedings of the 23rd Text Retrieval Conference(TREC 2014). 2014.
[35] Wan R, Man J H, Chan T F. Query Modification Through External Sources to Support Clinical Decisions[R]. The Chinese University of Hong Kong, 2014.
[36] Xu T, Oard D W, McNamee P. HLTCOE at TREC 2014:Microblog and Clinical Decision Support[C]// Proceedings of the 23rd Text Retrieval Conference(TREC 2014). 2014.
[37] Hu F, Wu D T Y, Mei Q, et al. Learning from Medical Summaries:The University of Michigan at TREC 2015 Clinical Decision Support Track[C]// Proceedings of the 24th Text Retrieval Conference(TREC 2015). 2015.
[38] Nguyen G H, Soulier L, Tamine L, et al. IRIT@TREC 2016 Clinical Decision Support Track[C]// Proceedings of the 25th Text Retrieval Conference(TREC 2016). 2016.
[39] Greuter S, Junker P, Kuhn L, et al. ETH Zurich at TREC Clinical Decision Support 2016[C]// Proceedings of the 25th Text Retrieval Conference(TREC 2016). 2016.
[40] Cha M S, Han W J, Lee G, et al. LAMDA at TREC CDS Track 2015: Clinical Decision Support Track[R]. Ajou University, 2015.
[41] Agrafiotes C, Arampatzis A. Augmenting Medical Queries with UMLS Concepts via MetaMap[C]// Proceedings of the 25th Text Retrieval Conference (TREC 2016). 2016.
[42] Viswavarapu L K, Chen J, Cleveland A D, et al. UNT Medical Information Retrieval at TREC 2016[C]// Proceedings of the 25th Text Retrieval Conference (TREC 2016). 2016.
[43] Balaneshin-Kordan S, Kotov A, Xisto R. WSU-IR at TREC 2015 Clinical Decision Support Track: Joint Weighting of Explicit and Latent Medical Query Concepts from Diverse Sources[R]. Wayne State University, 2015.
[44] Choi S, Choi J. SNUMedinfo at TREC CDS Track 2014: Medical Case-based Retrieval Task[R]. Seoul National University, 2014.
[45] Wang Y, Rastegar-Mojarad M, Elayavilli R K, et al. An Ensemble Model of Clinical Information Extraction and Information Retrieval for Clinical Decision Support[C]// Proceedings of the 25th Text Retrieval Conference (TREC 2016). 2016.
[46] Mondal A, Cambria E, Das D, et al. Relation Extraction of Medical Concepts Using Categorization and Sentiment Analysis[J]. Cognitive Computation, 2018, 10(4): 670-685.
doi: 10.1007/s12559-018-9567-8
[1] 刘渊晨, 王昊, 高亚琪. 在线音乐歌单播放量预测及影响因素分析*[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[2] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[3] 朱超宇, 刘雷. 基于知识图谱的医学决策支持应用综述*[J]. 数据分析与知识发现, 2020, 4(12): 26-32.
[4] 余本功,曹雨蒙,陈杨楠,杨颖. 基于nLD-SVM-RF的短文本分类研究*[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[5] 齐惠颖,江雨荷. 基于多组学数据融合构建乳腺癌生存预测模型 *[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[6] 陈万成,戴浩然,金映含. 基于数据挖掘方法的HEDONIC房屋价格评估模型——以美国城市西雅图为例[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[7] 周成, 魏红芹. 基于随机森林属性约简的众包竞赛参与者识别体系研究*[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[8] 陈远, 王超群, 胡忠义, 吴江. 基于主成分分析和随机森林的恶意网站评估与识别*[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
[9] 张李义, 李一然, 文璇. 新消费者重复购买意向预测研究*[J]. 数据分析与知识发现, 2018, 2(11): 10-18.
[10] 吕伟民, 王小梅, 韩涛. 结合链路预测和ET机器学习的科研合作推荐方法研究*[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[11] 原欣伟, 杨少华, 王超超, 杜占河. 基于用户特征抽取和随机森林分类的用户创新社区领先用户识别研究*[J]. 数据分析与知识发现, 2017, 1(11): 62-74.
[12] 李国垒, 陈先来, 夏冬, 杨荣. 面向临床决策的电子病历文本潜在语义分析*[J]. 数据分析与知识发现, 2016, 32(3): 50-57.
[13] 张李义, 张皎. 一种基于主成分分析和随机森林的刷客识别方法[J]. 现代图书情报技术, 2015, 31(10): 65-71.
[14] 尉萌. 利用演化模式做文献推荐[J]. 现代图书情报技术, 2014, 30(4): 20-26.
[15] 陈祖琴,张惠玲,葛继科,郑宏. 基于加权关联规则挖掘的相关文献推荐*[J]. 现代图书情报技术, 2007, 2(10): 57-61.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn