医学知识不确定性测度的进展与展望<sup>*</sup>

doi:10.11925/infotech.2096-3467.2020.0222

数据分析与知识发现

2020, Vol. 4

Issue (10): 14-27 https://doi.org/10.11925/infotech.2096-3467.2020.0222

综述评介

本期目录 | 过刊浏览 | 高级检索

医学知识不确定性测度的进展与展望^*

杜建(

)

北京大学健康医疗大数据国家研究院北京 100191

Measuring Uncertainty of Medical Knowledge: A Literature Review

Du Jian(

)

National Institute of Health Data Science, Peking University, Beijing 100191, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (830 KB) HTML ( 47 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 围绕“通过科学文献中有关知识主张的文本语言学特征,测度医学知识的不确定性”这一主题,阐述其理论基础、研究进展及其预期应用场景。【文献范围】 以同时包含“不确定”、“知识/知识单元”、“医学”三方面关键词为规则,以源作Representing Scientific Knowledge： The Role of Uncertainty设置引文追踪。综合采用关键词检索和引文检索,在中英文数据库检索并筛选文献,共筛选得到51篇。【方法】 对文献进行分类述评;对涉及的研究方法、数据来源、核心观点进行归纳梳理。【结果】 理论基础主要包括宏观层面的范式转移理论,以及微观层面的统计学理论,如贝叶斯因果网络。研究进展集中于三个方面：一是识别医学文献中表达不确定性的线索词与语句;二是细粒度、结构化表示医学知识对象;三是针对结构化医学知识测度其来源文本表述的不确定性程度。【局限】 对知识单元的讨论仅限以“数据-信息-知识-智慧”（Data-Information-Knowledge-Wisdom,DIKW）模型为基本范式的情报学、知识工程或人工智能领域。【结论】 医学知识不确定性测度是一个信息计量学与医学信息学交叉研究的新方向。不确定性及其时间演化间接反映知识主张的竞争强度、知识缺口的解决程度和知识确定性的概率,有望促进信息计量学向知识计量学深化,并拓展信息计量学在知识发现、科技评价和人工智能领域潜在的新应用。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	杜建

关键词 ：不确定性, 医学知识, 信息计量学, 知识计量学, 医学文本挖掘

Abstract：

[Objective] This article reviewed the theory, research progress and potential applications on measuring uncertainty of medical knowledge from scientific publications.[Coverage] We searched PubMed, Web of Science, Microsoft Academic, CNKI, and Wanfang Data for English and Chinese publications with 1) keywords “uncertain* AND knowledge AND *medical” in title, and 2) the cited reference “Representing Scientific Knowledge: The Role of Uncertainty”.[Methods] First, we categorized these literature into computational linguistics and informetrics studies. Then, we summarized their research design, data analytics and conclusions.[Results] The thoughts of paradigm shift and the Bayesian causal networks were the foundation for measuring uncertainty of medical knowledge. Latest developments included: identifying uncertain cues from biomedical literature; extracting structured knowledge from unstructured biomedical texts; and measuring the uncertainty level of scientific text which resulted Subject-Predicate-Object (SPO) triples.[Limitations] Our discussion focused on the Data-Information-Knowledge-Wisdom driven research, such as information science, knowledge engineering and artificial intelligence.[Conclusions] The uncertainty of scientific knowledge and its evolution over time indirectly reflect the strength of competing knowledge claims, the contribution to fill up knowledge gap, as well as the probability of certainty for a given knowledge claim. It will promote the developments of informetrics and knowmetrics, as well as their applications in emerging fields, such as detecting reserch fronts, evaluating academic contributions and improving the efficacy of computable knowledge driven decision support.

Key words： Uncertainty Medical Knowledge Informetrics Knowledge Metrics (Knowmetrics) Medical Text Mining

收稿日期: 2020-03-20 出版日期: 2020-08-06

ZTFLH:

G350

基金资助:*本文系中国科协青年人才托举工程资助项目“医学知识结构化表示与智能化计算模型研究”(2017QNRC001);北京市高精尖学科建设项目“健康数据科学”的研究成果之一(BMU2019GJJXK001)

通讯作者: 杜建 E-mail: dujian@bjmu.edu.cn

引用本文:

杜建. 医学知识不确定性测度的进展与展望^*[J]. 数据分析与知识发现, 2020, 4(10): 14-27.
Du Jian. Measuring Uncertainty of Medical Knowledge: A Literature Review. Data Analysis and Knowledge Discovery, 2020, 4(10): 14-27.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0222 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I10/14

Fig.1 范式转移程度与知识不确定性程度的关系

Table 1 计算语言学领域识别生物医学不确定性线索及其语言范围的代表性研究

Table 2 信息计量学领域测度全文本引用语句中的不确定性的研究

Table 3 细粒度、结构化表示知识对象的相关模型

Table 4 基于三元组中“主宾相同、谓语相反”规则识别矛盾性医学知识的研究

Fig.2 SemMedDB中“句子”和“三元组”两个数据表的基本结构

[1]	Chen C M, Song M. Representing Scientific Knowledge: The Role of Uncertainty[M]. Springer International Publishing, 2017.
[2]	Small H. Past as Prologue: Approaches to the Study of Confirmation in Science[J]. Quantitative Science Studies, 2020,1(3):1025-1040. doi: 10.1162/qss_a_00063
[3]	Evans J A, Foster J G. Metaknowledge[J]. Science, 2011,331(6018):721-725. doi: 10.1126/science.1201765 pmid: 21311014
[4]	Chen C, Song M, Heo G E. A Scalable and Adaptive Method for Finding Semantically Equivalent Cue Words of Uncertainty[J]. Journal of Informetrics, 2018,12(1):158-180. doi: 10.1016/j.joi.2017.12.004
[5]	Murray D, Lamers W, Boyack K, et al. Measuring Disagreement in Science[C]//Proceedings of the 17th International Conference of the International Society for Scientometrics and Informetrics. 2019: 2370-2375.
[6]	Herrera-Perez D, Haslam A, Crain T, et al. A Comprehensive Review of Randomized Clinical Trials in Three Medical Journals Reveals 396 Medical Reversals[J]. eLife Sciences, 2019,8:e45183.
[7]	Tatsioni A, Bonitsis N G, Ioannidis J P A. Persistence of Contradicted Claims in the Literature[J]. JAMA, 2007,298(21):2517-2526. doi: 10.1001/jama.298.21.2517 pmid: 18056905
[8]	Simpkin A L, Schwartzstein R M. Tolerating Uncertainty—The Next Medical Revolution?[J]. New England Journal of Medicine, 2016,375(18):1713-1715. doi: 10.1056/NEJMp1606402 pmid: 27806221
[9]	Kuhn T S, Hacking I. The Structure of Scientific Revolutions: 50th Anniversary Edition[M]. University of Chicago Press, 2012.
[10]	Kilicoglu H. Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions[J]. Briefings in Bioinformatics, 2018,19(6):1400-1414. doi: 10.1093/bib/bbx057 pmid: 28633401
[11]	Small H. Some Questions for Information Science Arising from the History and Philosophy of Science?[C]//Proceedings of the BIR 2020 Workshop on Bibliometric-enhanced Information Retrieval. 2020: 118-120.
[12]	Hyland K. Talking to the Academy: Forms of Hedging in Science Research Articles[J]. Written Communication, 1996,13(2):251-281. doi: 10.1177/0741088396013002004
[13]	Light M, Qiu X Y, Srinivasan P. The Language of Bioscience: Facts, Speculations, and Statements in Between[C]//Proceedings of the Workshop on Linking Biological Literature, Ontologies and Databases, Boston, USA. 2004: 17-24.
[14]	Zerva C. Automatic Identification of Textual Uncertainty[D]. Manchester: University of Manchester, 2019.
[15]	Vincze V, Szarvas G, Farkas R, et al. The BioScope Corpus: Biomedical Texts Annotated for Uncertainty, Negation and Their Scopes[J]. BMC Bioinformatics, 2008, 9(11): Article No. S9.
[16]	Farkas R, Vincze V, Móra G, et al. The CoNLL-2010 Shared Task: Learning to Detect Hedges and Their Scope in Natural Language Text[C]//Proceedings of the 14th Conference on Computational Natural Language Learning. 2010: 1-12.
[17]	Thompson P, Nawaz R, McNaught J, et al. Enriching a Biomedical Event Corpus with Meta-Knowledge Annotation[J]. BMC Bioinformatics, 2011, 12(1): Article No.393.
[18]	Tawfik N S, Spruit M R. Automated Contradiction Detection in Biomedical Literature[C]//Proceedings of the 14th International Conference on Machine Learning and Data Mining in Pattern Recognition. 2018: 138-148.
[19]	Szarvas G, Vincze V, Farkas R, et al. Cross-Genre and Cross-Domain Detection of Semantic Uncertainty[J]. Computational Linguistics, 2012,38(2):335-367. doi: 10.1162/COLI_a_00098
[20]	邹博伟, 钱忠, 陈站成, 等. 面向自然语言文本的否定性与不确定性信息抽取[J]. 软件学报, 2016,27(2):309-328.
[20]	( Zou Bowei, Qian Zhong, Chen Zhancheng, et al. Negation and Uncertainty Information Extraction Oriented to Natural Language Text[J]. Journal of Software, 2016,27(2):309-328.)
[21]	Mercer R E, Di Marco C, Kroon F W. The Frequency of Hedging Cues in Citation Contexts in Scientific Writing[C]//Proceddings of the 17th Conference of the Canadian Society for Computational Studies of Intelligence. 2004: 75-88.
[22]	Small H. Characterizing Highly Cited Method and Non-Method Papers Using Citation Contexts: The Role of Uncertainty[J]. Journal of Informetrics, 2018,12(2):461-480. doi: 10.1016/j.joi.2018.03.007
[23]	Small H, Boyack K W, Klavans R. Citations and Certainty: A New Interpretation of Citation Counts[J]. Scientometrics, 2019,118(3):1079-1092. doi: 10.1007/s11192-019-03016-z
[24]	Small H. What Makes Some Scientific Findings More Certain Than Others? A Study of Citing Sentences for Low-Hedged Papers[C]//Proceedings of the 17th International Conference of the International Society for Scientometrics and Informetrics, Rome, Italy. 2019: 554-560.
[25]	Kilicoglu H, Peng Z, Tafreshi S, et al. Confirm or Refute?: A Comparative Study on Citation Sentiment Classification in Clinical Research Publications[J]. Journal of Biomedical Informatics, 2019,91:103123. doi: 10.1016/j.jbi.2019.103123 pmid: 30753947
[26]	Xu J, Zhang Y, Wu Y, et al. Citation Sentiment Analysis in Clinical Trial Papers[J]. American Medical Informatics Association Annual Symposium, 2015: 1334-1341.
[27]	Atanassova I, Rey F, Claude, Bertin M. Studying Uncertainty in Science: A Distributional Analysis Through the IMRaD Structure[C]//Proceedings of the 7th International Workshop on Mining Scientific Publications at 11th Edition of the Language Resources and Evaluation Conference, Miyazaki, Japan. 2018: 01940294.
[28]	Malhotra A, Younesi E, Gurulingappa H, et al. ‘HypojournalFinder:’ A Strategy for the Detection of Speculative Statements in Scientific Text[J]. PLoS Computational Biology, 2013,9(7):e1003117. doi: 10.1371/journal.pcbi.1003117 pmid: 23935466
[29]	邱均平, 文庭孝, 宋艳辉. 知识计量学[M]. 北京: 科学出版社, 2014.
[29]	( Qiu Junping, Wen Tingxiao, Song Yanhui. Knowledgometrics[M]. Beijing: Science Press, 2014.)
[30]	赵红洲, 蒋国华. 知识单元与指数规律[J]. 科学学与科学技术管理, 1984(9):41-43.
[30]	( Zhao Hongzhou, Jiang Guohua. On the Element of Knowledge and Exponential Growth Rate[J]. Science of Science and Management of S. &. T., 1984(9):41-43.)
[31]	索传军, 盖双双. 知识元的内涵, 结构与描述模型研究[J]. 中国图书馆学报, 2018,44(4):54-72.
[31]	( Suo Chuanjun, Gai Shuangshuang. The Connotation, Structure and Description Model of Knowledge Unit[J]. Journal of Library Science in China, 2018,44(4):54-72.)
[32]	牛丽慧, 欧石燕. 科学论文语义标注框架的设计与应用[J]. 情报理论与实践, 2020,43(3):124-130.
[32]	( Niu Lihui, Ou Shiyan . Design and Application of a Semantic Annotation Framework for Scientific Articles[J]. Information Studies: Theory & Application, 2020,43(3):124-130.)
[33]	Kilicoglu H, Shin D, Fiszman M, et al. SemMedDB: A Pubmed-Scale Repository of Biomedical Semantic Predications[J]. Bioinformatics, 2012,28(23):3158-3160. doi: 10.1093/bioinformatics/bts591 pmid: 23044550
[34]	Kilicoglu H, Rosemblat G, Fiszman M, et al. Broad-Coverage Biomedical Relation Extraction with SemRep[J]. BMC Bioinformatics, 2020, 21(1): Article No.188. doi: 10.1186/s12859-020-03775-0 pmid: 33092523
[35]	Groth P, Gibson A, Velterop J. The Anatomy of a Nanopublication[J]. Information Services & Use, 2010,30(1):51-56.
[36]	Clark T, Ciccarese P N, Goble C A. Micropublications: A Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications[J]. Journal of Biomedical Semantics, 2014, 5: Article No. 28. doi: 10.1186/2041-1480-5-29 pmid: 25093068
[37]	Friedman C P, Flynn A J. Computable Knowledge: An Imperative for Learning Health Systems[J]. Learning Health Systems, 2019,3:e10203. doi: 10.1002/lrh2.10203 pmid: 31641690
[38]	Flynn A J, Friedman C P, Boisvert P, et al. The Knowledge Object Reference Ontology (KORO): A Formalism to Support Management and Sharing of Computable Biomedical Knowledge for Learning Health Systems[J]. Learning Health Systems, 2018,2:e10054. doi: 10.1002/lrh2.10054 pmid: 31245583
[39]	Mons B. FAIR Science for Social Machines: Let’s Share Metadata Knowlets in the Internet of FAIR Data and Services[J]. Data Intelligence, 2019,1(1):22-42. doi: 10.1162/dint_a_00002
[40]	Kilicoglu H, Rosemblat G, Rindflesch T C. Assigning Factuality Values to Semantic Relations Extracted from Biomedical Research Literature[J]. PLoS ONE, 2017,12(7):e0179926. doi: 10.1371/journal.pone.0179926 pmid: 28678823
[41]	Jia S, Xiang Y, Chen X, et al. Triple Trustworthiness Measurement for Knowledge Graph[C]// Proceedings of the 2019 World Wide Web Conference. 2019.
[42]	Alamri A. The Detection of Contradictory Claims in Biomedical Abstracts[D]. Sheffield: University of Sheffield, 2016.
[43]	Rosemblat G, Fiszman M, Shin D, et al. Towards a Characterization of Apparent Contradictions in the Biomedical Literature Using Context Analysis[J]. Journal of Biomedical Informatics, 2019,98:103275. doi: 10.1016/j.jbi.2019.103275 pmid: 31473364
[44]	Pinto J M G, Wawrzinek J, Balke W. What Drives Research Efforts? Find Scientific Claims That Count![C]// Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries. 2019: 217-226.
[45]	杜建. 不确定性医疗知识挖掘方法[C]// 2019科学计量与科技评价天府国际论坛, 成都. 2019.
[45]	( Du Jian. An Automated Approach for Extracting Uncertain Clinical Knowledge from Published Medical Documents[C]// Proceedings of the 2019 Tianfu International Forum on Scientometrics and Research Evaluation, Chengdu, China. 2019.)
[46]	Debons A. The Measurement of Knowledge[C]// Proceedings of the 55th Annual Meeting on Celebrating Change: Information Management on the Move, Pittsburgh, Pennsylvania, USA. American Society for Information Science, 1992: 212-215.
[47]	Ding Y, Song M, Han J, et al. Entitymetrics: Measuring the Impact of Entities[J]. PLoS ONE, 2013,8(8):e71416. doi: 10.1371/journal.pone.0071416 pmid: 24009660
[48]	李晓瑛, 李军莲, 李丹亚. 一体化医学语言系统及其在知识发现中的应用研究[J]. 数字图书馆论坛, 2019(9):24-29.
[48]	( Li Xiaoying, Li Junlian, Li Danya. Research on the Unified Medical Language System and Its Application to Knowledge Discovery[J]. Digital Library Forum, 2019(9):24-29.)
[49]	Keselman A, Rosemblat G, Kilicoglu H, et al. Adapting Semantic Natural Language Processing Technology to Address Information Overload in Influenza Epidemic Management[J]. Journal of the American Society for Information Science and Technology, 2010,61(12):2531-2543. doi: 10.1002/asi.v61.12
[50]	Bakal G, Talari P, Kakani E V, et al. Exploiting Semantic Patterns over Biomedical Knowledge Graphs for Predicting Treatment and Causative Relations[J]. Journal of Biomedical Informatics, 2018,82:189-199. doi: 10.1016/j.jbi.2018.05.003 pmid: 29763706
[51]	游苏宁. 临床医疗的深刻洞见, 医学真相的昭然若揭[N]. 中华医学信息导报, 2020-05-27(23).
[51]	( You Suning. The Deep Insight of Clinical Treatment and the Disclosure of Medical Truth[N]. China Medical News, 2020-05-27(23).)

[1]	杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[2]	李克潮, 蓝冬梅, 凌霄娥. 云模型和多特征的高校读者借阅偏好不确定性图书推荐研究[J]. 现代图书情报技术, 2013, (5): 54-58.
[3]	陈暾,陈新 . Perl语言辅助的信息计量学研究[J]. 现代图书情报技术, 2006, 1(7): 41-46.
[4]	邱均平,黄晓斌. 网络用户使用记录的计量分析*[J]. 现代图书情报技术, 2002, 18(5): 50-55.

Viewed

Full text

Abstract

Cited

Shared

Discussed