Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (10): 14-27    DOI: 10.11925/infotech.2096-3467.2020.0222
Current Issue | Archive | Adv Search |
Measuring Uncertainty of Medical Knowledge: A Literature Review
Du Jian()
National Institute of Health Data Science, Peking University, Beijing 100191, China
Download: PDF (830 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This article reviewed the theory, research progress and potential applications on measuring uncertainty of medical knowledge from scientific publications.[Coverage] We searched PubMed, Web of Science, Microsoft Academic, CNKI, and Wanfang Data for English and Chinese publications with 1) keywords “uncertain* AND knowledge AND *medical” in title, and 2) the cited reference “Representing Scientific Knowledge: The Role of Uncertainty”.[Methods] First, we categorized these literature into computational linguistics and informetrics studies. Then, we summarized their research design, data analytics and conclusions.[Results] The thoughts of paradigm shift and the Bayesian causal networks were the foundation for measuring uncertainty of medical knowledge. Latest developments included: identifying uncertain cues from biomedical literature; extracting structured knowledge from unstructured biomedical texts; and measuring the uncertainty level of scientific text which resulted Subject-Predicate-Object (SPO) triples.[Limitations] Our discussion focused on the Data-Information-Knowledge-Wisdom driven research, such as information science, knowledge engineering and artificial intelligence.[Conclusions] The uncertainty of scientific knowledge and its evolution over time indirectly reflect the strength of competing knowledge claims, the contribution to fill up knowledge gap, as well as the probability of certainty for a given knowledge claim. It will promote the developments of informetrics and knowmetrics, as well as their applications in emerging fields, such as detecting reserch fronts, evaluating academic contributions and improving the efficacy of computable knowledge driven decision support.

Key wordsUncertainty      Medical Knowledge      Informetrics      Knowledge Metrics (Knowmetrics)      Medical Text Mining     
Received: 20 March 2020      Published: 06 August 2020
ZTFLH:  G350  
Corresponding Authors: Du Jian     E-mail: dujian@bjmu.edu.cn

Cite this article:

Du Jian. Measuring Uncertainty of Medical Knowledge: A Literature Review. Data Analysis and Knowledge Discovery, 2020, 4(10): 14-27.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0222     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I10/14

The Relations Between Paradigm Shift and the Uncertainty of Scientific Knowledge
作者 代表性工作 主要内容
Vincze等(2008)[15] BioScope语料库 对生物医学出版物中推测性和否定性线索词及其语言描述范围进行标注。
Farkas等(2010)[16] CoNLL-2010共享任务 在维基百科文章和生物医学文献两类自然语言文本中检测模糊修饰词及其语言范围。首先识别含模糊修饰信息的句子,然后识别句子中推测性文本的具体范围。
Thompson等(2011)[17] 用“元知识”的思路对蕴含在文本中的重要事实和科学发现,即生物医学事件进行详细标注 元知识指可从事件的上下文得到的信息。元知识标注方案包含多个维度:①三类确定性程度(猜测的、很可能的、确定的);②两类情感倾向(正向或负向);③6种知识类型(调查类、观察类、分析类、方法类、事实类、其他类)。
Tawfik等(2018)[18] 对生物医学文献摘要中文本蕴含和文本矛盾的检测 以PubMed文献摘要为基础语料,针对临床问题,通过Sentence Embeddings相似度识别潜在答案文本,将答句的提取问题转化为排序问题。然后对答句人工标注正反答案(Yes/No),利用机器学习模型自动识别文本蕴含(一致)或文本矛盾(不一致),以识别矛盾性答句。
Identifying Uncertain Cue Words in Biomedical Scientific Text by the Community of Computational Linguistics
作者 主要发现
Mercer等(2004)[21] 以985篇BioMed Central论文全文为语料,发现与全文文本相比,Hyland总结的模糊修饰词更常出现在引用语句文本中。
Small (2018)[22] 提出模糊修饰率指标,即一篇论文在PubMed Central全文本的所有引用语句中,含最常用的三个模糊修饰词(May、Could、Might)的语句所占的比例。发现方法类论文的模糊修饰率低,即具有较高的确定性,而非方法类论文具有较高的不确定性。
Small等(2019)[23] 总体上论文引用语句的模糊修饰率与其引用语句数量(反映被引次数)呈反比,早期施引者比晚期施引者使用模糊修饰的比率更高。
Small (2019)[24] 通过对比低模糊修饰率和高模糊修饰率论文的引用语句中词语使用的不同,发现高确定性的知识与方法应用和数据获取的词语(如Using,Performed等)相关,而不确定性的知识则与结果解释和表达证据观点的词汇(如Suggest, Evidence等)相关。
Murray等(2019)[5] 基于引用语句测度科学中的分歧(Disagreement)。分歧或不一致,是科学知识不确定性的表现形式之一。通过专家评估构建了能较为准确地表征科学分歧的两个线索词(Contradict、Conflict)和两个过滤词(Studies、Results),且过滤词必须临近在线索词的4词窗口内。
Kilicoglu等(2019)[25] 对临床医学研究文献的引用语句进行情感分析,对引用情感进行分类:正向(与被引临床研究结论一致);负向(与被引临床研究结论不一致);中立(未明确表明一致或不一致)。通过训练机器学习模型,实现自动分类。所用语料为Xu等[26]的标注语料:285篇临床研究文献,共有4 182次引用。其中,正向引用702次,占16.8%;负向引用308次,占7.4%;中立引用3 172次,占75.8%。
Measuring the Textual Uncertainty of Citing Sentences by the Community of Informetrics
名称 主要团队 主要工作
纳米出版物(Nanopublication)模型 Groth等(2010)[35] 并非专指纳米领域,而是借鉴纳米之义,指具有科学意义的、机器可读的、最小的可出版信息单元。包括三部分:①主-谓-宾三元组表示的科学论断;②出处信息,表示论断的来源,包括提出论断的作者、机构、时间和地点等;③出版信息,关于一个纳米出版物本身的元数据,包括纳米出版物的创建者、创建日期和版本等。
微出版物(Micropublication)模型 Clark等(2014)[36] 将科学文献看作以作者观点为论点,以陈述、声明、数据、方法、材料等作为论据的论证过程,包含支持性论证和反驳性论证。
可计算的生物医学知识组件 Friedman等(2019)[37]; Flynn等(2018)[38] 将人读的知识格式转化为机器可读的格式,包括知识载体、与用户交互的界面及有关知识的详细说明三个部分。进行知识的自动学习和更新,建设学习型健康医疗体系。
Knowlet模型 Mons(2019)[39] Knowlet的主要思想是将所有持相同论断的纳米出版物中共同出现的论断合成一个所谓的“基本论断”,以减少冗余。
Representation Models for Structured Knowledge Objects
作者 谓语分类 主要规则
Alamri(2016)[42] ①主动/导致类,如AUGMENTS、CAUSES;
②被动/抑制类,如DISRUPTS、PREVENTS;
③其他类,如ADMINISTERED_TO、OCCURS_IN。
如果从两个及以上来源句子中分别抽取出以上任何一个关系及其否定关系,如CAUSES和NEG-CAUSES,或者分别抽取出主动/导致类和其相反的被动/抑制类关系,如CAUSES和PREVENTS,则认为来源知识构成矛盾。
Rosemblat等(2019)[43] 构建疾病临床研究方面相反的语义关系对:
①有因果意义的4对相反关系,如TREATS versus CAUSES、PREVENTS versus CAUSES、TREATS versus PREDISPOSES、PREVENTS versus PREDISPOSES;
②无因果意义的4对相反关系,包括TREATS、PREVENTS、CAUSES、PREDISPOSES及其否定形式。
如果从两个及以上来源句子中分别抽取出以上矛盾性关系,则视为临床研究文献中的矛盾性知识。
Pinto等(2019)[44] ①矛盾性知识(Contradiction):7种谓语,包括“Affects”,“Associated-with”,“Causes”,“Inhibits”,“Prevents”,“Process-Of”,“Treats”;
②多样性知识(Diversity)。
如果从两个及以上来源句子中分别抽取出“相同的主语和宾语,相反的谓语”,则视为矛盾性知识。基于7种谓语及其否定形式(共14种);如果从两个及以上来源句子中分别抽取出“相同的主语和宾语,不同但不相反的谓语”,则视为多样性知识。
Studies on Identifying Contradictory Medical Knowledge Using SPO Triples
Basic Data Structure of the SENTENCE Table and the PREDICATION Table in SemMedDB
[1] Chen C M, Song M. Representing Scientific Knowledge: The Role of Uncertainty[M]. Springer International Publishing, 2017.
[2] Small H. Past as Prologue: Approaches to the Study of Confirmation in Science[J]. Quantitative Science Studies, 2020,1(3):1025-1040.
doi: 10.1162/qss_a_00063
[3] Evans J A, Foster J G. Metaknowledge[J]. Science, 2011,331(6018):721-725.
doi: 10.1126/science.1201765 pmid: 21311014
[4] Chen C, Song M, Heo G E. A Scalable and Adaptive Method for Finding Semantically Equivalent Cue Words of Uncertainty[J]. Journal of Informetrics, 2018,12(1):158-180.
doi: 10.1016/j.joi.2017.12.004
[5] Murray D, Lamers W, Boyack K, et al. Measuring Disagreement in Science[C]//Proceedings of the 17th International Conference of the International Society for Scientometrics and Informetrics. 2019: 2370-2375.
[6] Herrera-Perez D, Haslam A, Crain T, et al. A Comprehensive Review of Randomized Clinical Trials in Three Medical Journals Reveals 396 Medical Reversals[J]. eLife Sciences, 2019,8:e45183.
[7] Tatsioni A, Bonitsis N G, Ioannidis J P A. Persistence of Contradicted Claims in the Literature[J]. JAMA, 2007,298(21):2517-2526.
doi: 10.1001/jama.298.21.2517 pmid: 18056905
[8] Simpkin A L, Schwartzstein R M. Tolerating Uncertainty—The Next Medical Revolution?[J]. New England Journal of Medicine, 2016,375(18):1713-1715.
doi: 10.1056/NEJMp1606402 pmid: 27806221
[9] Kuhn T S, Hacking I. The Structure of Scientific Revolutions: 50th Anniversary Edition[M]. University of Chicago Press, 2012.
[10] Kilicoglu H. Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions[J]. Briefings in Bioinformatics, 2018,19(6):1400-1414.
doi: 10.1093/bib/bbx057 pmid: 28633401
[11] Small H. Some Questions for Information Science Arising from the History and Philosophy of Science?[C]//Proceedings of the BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval. 2020: 118-120.
[12] Hyland K. Talking to the Academy: Forms of Hedging in Science Research Articles[J]. Written Communication, 1996,13(2):251-281.
doi: 10.1177/0741088396013002004
[13] Light M, Qiu X Y, Srinivasan P. The Language of Bioscience: Facts, Speculations, and Statements in Between[C]//Proceedings of the Workshop on Linking Biological Literature, Ontologies and Databases, Boston, USA. 2004: 17-24.
[14] Zerva C. Automatic Identification of Textual Uncertainty[D]. Manchester: University of Manchester, 2019.
[15] Vincze V, Szarvas G, Farkas R, et al. The BioScope Corpus: Biomedical Texts Annotated for Uncertainty, Negation and Their Scopes[J]. BMC Bioinformatics, 2008, 9(11): Article No. S9.
[16] Farkas R, Vincze V, Móra G, et al. The CoNLL-2010 Shared Task: Learning to Detect Hedges and Their Scope in Natural Language Text[C]//Proceedings of the 14th Conference on Computational Natural Language Learning. 2010: 1-12.
[17] Thompson P, Nawaz R, McNaught J, et al. Enriching a Biomedical Event Corpus with Meta-Knowledge Annotation[J]. BMC Bioinformatics, 2011, 12(1): Article No.393.
[18] Tawfik N S, Spruit M R. Automated Contradiction Detection in Biomedical Literature[C]//Proceedings of the 14th International Conference on Machine Learning and Data Mining in Pattern Recognition. 2018: 138-148.
[19] Szarvas G, Vincze V, Farkas R, et al. Cross-Genre and Cross-Domain Detection of Semantic Uncertainty[J]. Computational Linguistics, 2012,38(2):335-367.
doi: 10.1162/COLI_a_00098
[20] 邹博伟, 钱忠, 陈站成, 等. 面向自然语言文本的否定性与不确定性信息抽取[J]. 软件学报, 2016,27(2):309-328.
[20] ( Zou Bowei, Qian Zhong, Chen Zhancheng, et al. Negation and Uncertainty Information Extraction Oriented to Natural Language Text[J]. Journal of Software, 2016,27(2):309-328.)
[21] Mercer R E, Di Marco C, Kroon F W. The Frequency of Hedging Cues in Citation Contexts in Scientific Writing[C]//Proceddings of the 17th Conference of the Canadian Society for Computational Studies of Intelligence. 2004: 75-88.
[22] Small H. Characterizing Highly Cited Method and Non-Method Papers Using Citation Contexts: The Role of Uncertainty[J]. Journal of Informetrics, 2018,12(2):461-480.
doi: 10.1016/j.joi.2018.03.007
[23] Small H, Boyack K W, Klavans R. Citations and Certainty: A New Interpretation of Citation Counts[J]. Scientometrics, 2019,118(3):1079-1092.
doi: 10.1007/s11192-019-03016-z
[24] Small H. What Makes Some Scientific Findings More Certain Than Others? A Study of Citing Sentences for Low-Hedged Papers[C]//Proceedings of the 17th International Conference of the International Society for Scientometrics and Informetrics, Rome, Italy. 2019: 554-560.
[25] Kilicoglu H, Peng Z, Tafreshi S, et al. Confirm or Refute?: A Comparative Study on Citation Sentiment Classification in Clinical Research Publications[J]. Journal of Biomedical Informatics, 2019,91:103123.
doi: 10.1016/j.jbi.2019.103123 pmid: 30753947
[26] Xu J, Zhang Y, Wu Y, et al. Citation Sentiment Analysis in Clinical Trial Papers[J]. American Medical Informatics Association Annual Symposium, 2015: 1334-1341.
[27] Atanassova I, Rey F, Claude, Bertin M. Studying Uncertainty in Science: A Distributional Analysis Through the IMRaD Structure[C]//Proceedings of the 7th International Workshop on Mining Scientific Publications at 11th Edition of the Language Resources and Evaluation Conference, Miyazaki, Japan. 2018: 01940294.
[28] Malhotra A, Younesi E, Gurulingappa H, et al. ‘HypojournalFinder:’ A Strategy for the Detection of Speculative Statements in Scientific Text[J]. PLoS Computational Biology, 2013,9(7):e1003117.
doi: 10.1371/journal.pcbi.1003117 pmid: 23935466
[29] 邱均平, 文庭孝, 宋艳辉. 知识计量学[M]. 北京: 科学出版社, 2014.
[29] ( Qiu Junping, Wen Tingxiao, Song Yanhui. Knowledgometrics[M]. Beijing: Science Press, 2014.)
[30] 赵红洲, 蒋国华. 知识单元与指数规律[J]. 科学学与科学技术管理, 1984(9):41-43.
[30] ( Zhao Hongzhou, Jiang Guohua. On the Element of Knowledge and Exponential Growth Rate[J]. Science of Science and Management of S. &. T., 1984(9):41-43.)
[31] 索传军, 盖双双. 知识元的内涵, 结构与描述模型研究[J]. 中国图书馆学报, 2018,44(4):54-72.
[31] ( Suo Chuanjun, Gai Shuangshuang. The Connotation, Structure and Description Model of Knowledge Unit[J]. Journal of Library Science in China, 2018,44(4):54-72.)
[32] 牛丽慧, 欧石燕. 科学论文语义标注框架的设计与应用[J]. 情报理论与实践, 2020,43(3):124-130.
[32] ( Niu Lihui, Ou Shiyan . Design and Application of a Semantic Annotation Framework for Scientific Articles[J]. Information Studies: Theory & Application, 2020,43(3):124-130.)
[33] Kilicoglu H, Shin D, Fiszman M, et al. SemMedDB: A Pubmed-Scale Repository of Biomedical Semantic Predications[J]. Bioinformatics, 2012,28(23):3158-3160.
doi: 10.1093/bioinformatics/bts591 pmid: 23044550
[34] Kilicoglu H, Rosemblat G, Fiszman M, et al. Broad-Coverage Biomedical Relation Extraction with SemRep[J]. BMC Bioinformatics, 2020, 21(1): Article No.188.
doi: 10.1186/s12859-020-03775-0 pmid: 33092523
[35] Groth P, Gibson A, Velterop J. The Anatomy of a Nanopublication[J]. Information Services & Use, 2010,30(1):51-56.
[36] Clark T, Ciccarese P N, Goble C A. Micropublications: A Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications[J]. Journal of Biomedical Semantics, 2014, 5: Article No. 28.
doi: 10.1186/2041-1480-5-29 pmid: 25093068
[37] Friedman C P, Flynn A J. Computable Knowledge: An Imperative for Learning Health Systems[J]. Learning Health Systems, 2019,3:e10203.
doi: 10.1002/lrh2.10203 pmid: 31641690
[38] Flynn A J, Friedman C P, Boisvert P, et al. The Knowledge Object Reference Ontology (KORO): A Formalism to Support Management and Sharing of Computable Biomedical Knowledge for Learning Health Systems[J]. Learning Health Systems, 2018,2:e10054.
doi: 10.1002/lrh2.10054 pmid: 31245583
[39] Mons B. FAIR Science for Social Machines: Let’s Share Metadata Knowlets in the Internet of FAIR Data and Services[J]. Data Intelligence, 2019,1(1):22-42.
doi: 10.1162/dint_a_00002
[40] Kilicoglu H, Rosemblat G, Rindflesch T C. Assigning Factuality Values to Semantic Relations Extracted from Biomedical Research Literature[J]. PLoS ONE, 2017,12(7):e0179926.
doi: 10.1371/journal.pone.0179926 pmid: 28678823
[41] Jia S, Xiang Y, Chen X, et al. Triple Trustworthiness Measurement for Knowledge Graph[C]// Proceedings of the 2019 World Wide Web Conference. 2019.
[42] Alamri A. The Detection of Contradictory Claims in Biomedical Abstracts[D]. Sheffield: University of Sheffield, 2016.
[43] Rosemblat G, Fiszman M, Shin D, et al. Towards a Characterization of Apparent Contradictions in the Biomedical Literature Using Context Analysis[J]. Journal of Biomedical Informatics, 2019,98:103275.
doi: 10.1016/j.jbi.2019.103275 pmid: 31473364
[44] Pinto J M G, Wawrzinek J, Balke W. What Drives Research Efforts? Find Scientific Claims That Count![C]// Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries. 2019: 217-226.
[45] 杜建. 不确定性医疗知识挖掘方法[C]// 2019科学计量与科技评价天府国际论坛, 成都. 2019.
[45] ( Du Jian. An Automated Approach for Extracting Uncertain Clinical Knowledge from Published Medical Documents[C]// Proceedings of the 2019 Tianfu International Forum on Scientometrics and Research Evaluation, Chengdu, China. 2019.)
[46] Debons A. The Measurement of Knowledge[C]// Proceedings of the 55th Annual Meeting on Celebrating Change: Information Management on the Move, Pittsburgh, Pennsylvania, USA. American Society for Information Science, 1992: 212-215.
[47] Ding Y, Song M, Han J, et al. Entitymetrics: Measuring the Impact of Entities[J]. PLoS ONE, 2013,8(8):e71416.
doi: 10.1371/journal.pone.0071416 pmid: 24009660
[48] 李晓瑛, 李军莲, 李丹亚. 一体化医学语言系统及其在知识发现中的应用研究[J]. 数字图书馆论坛, 2019(9):24-29.
[48] ( Li Xiaoying, Li Junlian, Li Danya. Research on the Unified Medical Language System and Its Application to Knowledge Discovery[J]. Digital Library Forum, 2019(9):24-29.)
[49] Keselman A, Rosemblat G, Kilicoglu H, et al. Adapting Semantic Natural Language Processing Technology to Address Information Overload in Influenza Epidemic Management[J]. Journal of the American Society for Information Science and Technology, 2010,61(12):2531-2543.
doi: 10.1002/asi.v61.12
[50] Bakal G, Talari P, Kakani E V, et al. Exploiting Semantic Patterns over Biomedical Knowledge Graphs for Predicting Treatment and Causative Relations[J]. Journal of Biomedical Informatics, 2018,82:189-199.
doi: 10.1016/j.jbi.2018.05.003 pmid: 29763706
[51] 游苏宁. 临床医疗的深刻洞见, 医学真相的昭然若揭[N]. 中华医学信息导报, 2020-05-27(23).
[51] ( You Suning. The Deep Insight of Clinical Treatment and the Disclosure of Medical Truth[N]. China Medical News, 2020-05-27(23).)
[1] Li Kechao, Lan Dongmei, Ling Xiaoe. Research of Books Recommendation of Borrow Preference Uncertainty in University Readers Based on Cloud Model and Multi-feature[J]. 现代图书情报技术, 2013, (5): 54-58.
[2] Chen Tun,Chen Xin . A Study on Perl Programming Language Aided Informetrics[J]. 现代图书情报技术, 2006, 1(7): 41-46.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn