Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (3): 79-86     https://doi.org/10.11925/infotech.2096-3467.2017.1047
  应用论文 本期目录 | 过刊浏览 | 高级检索 |
基于文本挖掘的药物副作用知识发现研究
范馨月, 崔雷()
中国医科大学医学信息学院 沈阳 110122
Using Text Mining to Discover Drug Side Effects: Case Study of PubMed
Fan Xinyue, Cui Lei()
School of Medical Informatics, China Medical University, Shenyang 110122, China
全文: PDF (1348 KB)   HTML ( 5
输出: BibTeX | EndNote (RIS)      
摘要 

目的】利用文本挖掘方法发现潜在的药物-副作用关系, 为完善现有药物-副作用数据库及药物副作用早期预测提供有效途径。【方法】从PubMed数据库获取2011年 - 2016年间与人类药物治疗和副作用相关文献共100 873篇, 对文献集进行Perl语言切分处理、基于词典的命名实体识别、R语言生成药物-副作用共现矩阵、gCLUTO双聚类分析等一系列研究。【结果】以聚类结果中一类为例, 计算得到本方法提取药物-副作用的准确率达75.65%, 其中发现潜在的药物-副作用关系比例达13.91%。【局限】仅使用基于词典的命名实体识别方法, 并未考虑语法、词法等因素, 造成较高的假阳性率。【结论】本研究可用于发现数据库中尚无记载的药物副作用, 为药物副作用的早期发现提供参考, 为进一步运用自动学习的方法更加准确地提取药物-副作用提供可行的方案。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
范馨月
崔雷
关键词 药物副作用文本挖掘命名实体识别聚类分析    
Abstract

[Objective] This paper finds the potential side effects of drugs with the help of text mining, aiming to improve the contents of existing databases and early prediction of drug side effects. [Methods] A total of 100, 873 articles were retrieved from the PubMed database for about five years (2011-2016). We generated the drug side effects co-occurrence matrix and conducted gCLUTO bi-clustering analysis with Perl’s segmentation technique, named entity recognition method based on the dictionary, as well as the R language. [Results] For one category of results, we found the precision rate of the proposed method reached 75.65%, and identified 13.91% potential side effects. [Limitations] Only used the dictionary-based named entity recognition method and did not consider grammatical or lexis factors, which yielded high false positive rates. [Conclusions] This paper proposes a new approach to detect the unannounced side effects of drugs automatically and effectively.

Key wordsDrug-Side Effects    Text Mining    Named Entity Recognition    Cluster Analysis
收稿日期: 2017-10-20      出版日期: 2018-04-03
ZTFLH:  TP391 G353  
引用本文:   
范馨月, 崔雷. 基于文本挖掘的药物副作用知识发现研究[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
Fan Xinyue,Cui Lei. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed. Data Analysis and Knowledge Discovery, 2018, 2(3): 79-86.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.1047      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I3/79
  文本挖掘方法提取药物副作用流程图
  Perl语言切分处理后文件格式(局部)
MeSH ID 药物MeSH词 药物款目词
D000935 Antifungal Agents Agents, Antifungal
Therapeutic Fungicides
Fungicides, Therapeutic
Antibiotics, Antifungal
Antifungal Antibiotics
D001569 Benzodiazepines Benzodiazepine Compounds
Benzodiazepine
D006493 Heparin Unfractionated Heparin
Heparin, Unfractionated
Heparinic Acid
Liquaemin
Sodium Heparin
Heparin, Sodium
Heparin Sodium
alpha-Heparin
alpha Heparin
  MeSH ID、主题词、款目词对应格式(局部)
PubMed MEDLINE SIDER
PMID: 24739449
TI
AB 1 Depression
AB 2
AB 3
AB 4
AB 5 Epilepsy
AB 6
AB 7
AB 8
AB 9
AB 10
AB 11
AB 12
  副作用词典匹配结果(局部)
PubMed MEDLINE Drug
PMID: 24739449
TI tianeptine
AB 1
AB 2
AB 3 tianeptine
AB 4 tianeptine
AB 5 tianeptine
AB 6 tianeptine
AB 7
AB 8
AB 9 tianeptine
AB 10 tianeptine
AB 11 tianeptine
AB 12 tianeptine
  药物词典匹配结果(局部)
Lyme disease Polyps General Surgery Hypothermia
clarithromycin 1 0 0 0
ceftriaxone 1 0 0 0
doxycycline 1 0 0 0
erlotinib 0 1 0 0
thyroid 0 0 1 0
tyrosine 0 0 1 0
cabozantinib 0 0 1 0
morphine 0 0 1 1
  药物-副作用共现矩阵(局部)
  1 287种药物聚类结果可视化山峰图
Cluster 0 Pain Postoperative pain Rheumatism
disease
Headache
ceftriaxone (+) (-) (-) (+)
hyaluronic acid
naloxone (-) (-) (-) (-)
fluconazole (+) (-) (-) (+)
ciclosporin (+) (-) (+) (+)
palonosetron (+) (-) (-) (+)
dinoprostone (+) (-) (-) (-)
  SIDER数据库对比结果(局部)
[1] 赵明珍, 程亮喜, 林鸿飞. 基于评论挖掘的药物副作用发现机制[J]. 中文信息学报, 2015, 29(6): 193-202.
[1] (Zhao Mingzhen, Cheng Liangxi, Lin Hongfei.Detection of Adverse Drug Reactions Based on Comment Mining[J]. Journal of Chinese Information Processing, 2015, 29(6): 193-202.)
[2] 牛姝媛. 基于信息整合的药物副作用预测方法研究[D].上海: 华东师范大学, 2016.
[2] (Niu Shuyuan.Method Research for the Prediction of Drug’s Side Effect Based on Information Integration[D]. Shanghai: East China Normal University, 2016.)
[3] 丁玉峰, 周文丽. 药物不良反应与药物不良反应事件[J]. 医药导报, 2004, 23(8): 610.
doi: 10.3870/j.issn.1004-0781.2004.08.062
[3] (Ding Yufeng, Zhou Wenli.Adverse Drug Reactions and Adverse Drug Events[J]. Herald of Medicine, 2004, 23(8): 610.)
doi: 10.3870/j.issn.1004-0781.2004.08.062
[4] Ho T B, Le L, Thai D T, et al.Data-driven Approach to Detect and Predict Adverse Drug Reactions[J]. Current Pharmaceutical Design, 2016, 22(23): 3498.
doi: 10.2174/1381612822666160509125047 pmid: 27157416
[5] Karimi S, Wang C, Metke-Jimenez A, et al.Text and Data Mining Techniques in Adverse Drug Reaction Detection[J]. ACM Computing Surveys, 2015, 47(4): 1-39.
doi: 10.1145/2719920
[6] 刘海山. 正确区分药物不良反应杜绝药物不良反应事件发生[J]. 实用医技杂志, 2005, 12(16): 2309.
[6] (Liu Haishan.The Correct Distinction Between Adverse Drug Reactions to Eliminate Adverse Drug Reactions Occured[J]. Journal of Practical Medical Techniques, 2005, 12(16): 2309.)
[7] 赵东彦, 王海虹, 王桂梅, 等.浅谈药品不良反应发生的原因及预防措施[J].山西医药杂志, 2010, 39(5): 442-443.
doi: 10.3969/j.issn.0253-9926.2010.05.028
[7] (Zhao Dongyan, Wang Haihong, Wang Guimei, et al.Talking about the Reasons and Preventive Measures of Adverse Drug Reactions[J]. Shanxi Medical Journal, 2010, 39(5): 442-443.)
doi: 10.3969/j.issn.0253-9926.2010.05.028
[8] 张新立. 临床常用药物副作用概述[J]. 健康必读旬刊, 2013, 12(12): 242.
[8] (Zhang Xinli.Common Clinical Side Effects of Drugs Outlined[J]. Healthmust-Readmagazine, 2013, 12(12): 242.)
[9] 隋明爽, 崔雷. 用文本挖掘方法发现药物的副作用[J]. 中华医学图书情报杂志, 2015, 24(11): 67-72.
doi: 10.3969/j.issn.1671-3982.2015.11.016
[9] (Sui Mingshuang, Cui Lei.Detection of Drug Adverse Effects by Text-Mining[J]. Chinese Journal of Medical Library and Information Science, 2015, 24(11): 67-72.)
doi: 10.3969/j.issn.1671-3982.2015.11.016
[10] Liu M, Wu Y, Chen Y, et al.Large-scale Prediction of Adverse Drug Reactions Using Chemical, Biological, and Phenotypic Properties of Drugs[J]. Journal of the American Medical Informatics Association, 2012, 19(1): 28-35.
doi: 10.1136/amiajnl-2011-000699
[11] Pauwels E, Stoven V, Yamanishi Y.Predicting Drug Side-effect Profiles: A Chemical Fragment-based Approach[J]. BMC Bioinformatics, 2011, 12(1): 169.
doi: 10.1186/1471-2105-12-169 pmid: 3125260
[12] Vilar S, Tatonetti N P, Hripcsak G.3D Pharmacophoric Similarity Improves Multi Adverse Drug Event Identification in Pharmacovigilance[J].Scientific Reports, 2015, 5: 8809.
doi: 10.1038/srep08809 pmid: 25744369
[13] Wang W, Haerian K, Salmasian H, et al.A Drug-Adverse Event Extraction Algorithm to Support Pharmacovigilance Knowledge Mining from PubMed Citations[C]//Proceedings of AMIA Annual Symposium. AMIA Symposium, 2011: 1464.
[14] 刘晓倩, 陶枫, 金昕, 等.基于文本挖掘方法探索中医治疗肥胖病的用药规律[J]. 世界科学技术: 中医药现代化, 2017, 19(2): 212-217.
[14] (Liu Xiaoqian, Tao Feng, Jin Xin, et al.Exploration of the Medication Regularity of Traditional Chinese Medicine for Obesity Based on Text Mining Techniques[J]. World Science and Technology-Modernization of Traditional Chinese Medicine, 2017, 19(2): 212-217.)
[15] 郭佳栋, 张雪梅, 刘影, 等.基于数据挖掘技术对胃癌化疗药物不良反应关联性研究[J]. 药物流行病学杂志, 2017(1): 46-49.
[15] (Guo Jiadong, Zhang Xuemei, Liu Ying, et al.Correlation Analysis of Gastric Cancer Chemotherapy Drugs Adverse Drug Reaction Based on Data Mining Technology[J].Chinese Journal of Pharmacoepidemiology, 2017(1): 46-49.)
[16] Kwartler T.Text Mining in Practice with R[M]. John Wiley & Sons, Ltd., 2017: 1-15.
[17] Allahyari M, Pouriyeh S, Assefi M, et al. Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques [OL]. arXiv Preprint, arXiv:1707.02919 2017.
[18] 陈基. 命名实体识别综述[J]. 现代计算机, 2016(3): 24-26.
[18] (Chen Ji.Survey of Named Entity Recognition[J]. Modern Computer, 2016(3): 24-26.)
[19] 范文婷. 生物医学领域的命名实体识别和标准化[D]. 大连: 大连理工大学, 2013.
[19] (Fan Wenting.Named Entities Recognition and Normalization in Biomedical Literatures[D]. Dalian: Dalian University of Technology, 2013.)
[20] 滕达. 基于机器学习的蛋白质命名实体识别和相互作用关系抽取的研究[D]. 合肥: 中国科学技术大学, 2012.
[20] (Teng Da.Research on Machine Learning Algorithms of Protein Named Entity Recognition and Protein Interaction Relation Extraction[D].Hefei: University of Science and Technology of China, 2012.)
[21] 刘步权, 廖湘科. Perl程序设计语言综述[J]. 计算机工程与应用, 2002, 38(18): 86-87.
[21] (Liu Buquan, Liao Xiangke.Perl Programming Language Summary[J]. Computer Engineering and Applications, 2002, 38(18): 86-87.)
[22] Richards J, All S, Skopis G, et al.Opposing Actions of Perl and Cry2 in the Regulation of Perl Target Gene Expression in the Liver and Kidney[J]. American Journal of Physiology, 2013, 305(4): 735-747.
[23] 石翠, 王杨. 运用perl轻松处理字符串[J]. 办公自动化, 2014(7): 56-57.
[23] (Shi Cui, Wang Yang.Using Perl Easy Processing String[J]. Office Automation, 2014(7): 56-57. )
[24] 王巍. 基于Perl的汉语自动分词算法研究[J]. 中州大学学报, 2007, 24(1): 120-122.
doi: 10.3969/j.issn.1008-3715.2007.01.041
[24] (Wang Wei.Algorithmic Study on Perl-based Automatic Segmentation of Chinese Words[J]. Journal of Zhongzhou University, 2007, 24(1): 120-122.)
doi: 10.3969/j.issn.1008-3715.2007.01.041
[25] Kuhn M, Letunic I, Jensen L J, et al.The SIDER Database of Drugs and Side Effects[J]. Nucleic Acids Research, 2016, 44(D1): 1075-1079.
doi: 10.1093/nar/gkv1075 pmid: 26481350
[26] Wishart D S, Knox C, Guo A C, et al.DrugBank: A Knowledgebase for Drugs, Drug Actions and Drug Targets[J]. Nucleic Acids Research, 2008, 36(Database Issue): 901-906.
doi: 10.1093/nar/gkm958 pmid: 2238889
[27] 王秀艳. 基于主题词关联规则的实体间语义关系抽取——以药物副作用引起疾病为例[D]. 沈阳: 中国医科大学, 2012.
[27] (Wang Xiuyan.Semantic Relations Extraction Based on MeSH Term Association Rules: A Case Study of Drug Side Effects Causing Disease [D]. Shenyang: China Medical University, 2012.)
[28] Rasmussen M, Karypis G. gCLUTO-An Interactive Clustering, Visualization, and Analysis System [R].UMN-CS TR-04-021, 2004.
[29] 杨颖, 崔雷. 同被引双聚类方法在情报分析中应用研究[C]//中国竞争情报年会, 2013.
[29] (Yang Ying, Cui Lei.Applied Research of Cited Biclustering Method in Intelligence Analysis[C]//Proceedings of China Competitive Intelligence Annual Meeting, 2013.)
[30] 于跃, 徐志健, 王坤, 等. 基于双聚类方法的生物医学信息学文本数据挖掘研究[J]. 图书情报工作, 2012, 56(18): 133-136.
[30] (Yu Yue, Xu Zhijian, Wang Kun, et al.Text Data Mining in Biomedical Informatics Based on Biclustering Method[J]. Library and Information Service, 2012, 56(18): 133-136.)
[31] 方丽, 崔雷. 利用双聚类算法探测学科前沿及知识基础——以h指数研究领域为例[J]. 情报理论与实践, 2014, 37(11): 55-60.
[31] (Fang Li, Cui Lei.Detection of Frontier and Knowledge Base Using Biclustering Algorithm-A Case Study of h Index[J]. Information Studies: Theory & Application, 2014, 37(11): 55-60.)
[32] Lyons G, Columb M, Wilson R C, et al.Epidural Pain Relief in Labour: Potencies of Levobupivacaine and Racemic Bupivacaine[J]. British Journal of Anaesthesia, 1998, 81(6): 899-901.
doi: 10.1093/bja/81.6.899 pmid: 10211016
[33] Song Y K, Lee C.Effects of Ramosetron and Dexamethasone on Postoperative Nausea, Vomiting, Pain, and Shivering in Female Patients Undergoing Thyroid Surgery[J].Journal of Anesthesia, 2013, 27(1): 29-34.
doi: 10.1007/s00540-012-1473-8 pmid: 22965329
[34] 任翠玉, 任红梅. 头孢唑林钠引起腹痛1例[J]. 中国误诊学杂志, 2006, 6(19): 3889.
doi: 10.3969/j.issn.1009-6647.2006.19.223
[34] (Ren Cuiyu, Ren Hongmei.Cefazolin Sodium Caused Abdominal Pain in 1 Case[J]. Chinese Journal of Misdiagnosis, 2006, 6(19): 3889.)
doi: 10.3969/j.issn.1009-6647.2006.19.223
[35] Cefazolin Side Effects in Detail[DB/OL]. [2017-09-09]..
[36] Stevens B, Yamada J, Ohlsson A. Sucrose for Analgesia in Newborn Infants Undergoing Painful Procedures[J]. The Cochrane Database of Systematic Reviews, 2013, 14(1): CD001069.
doi: 10.1002/14651858.CD001069.pub4 pmid: 23440783
[37] Webster L, Chey W D, Tack J, et al.Randomised Clinical Trial: The Long-term Safety and Tolerability of Naloxegol in Patients with Pain and Opioid-induced Constipation[J]. Alimentary Pharmacology & Therapeutics, 2014, 40(7): 771-779.
[38] Peiró A M, Martínez J, Martinez E, et al.Efficacy and Tolerance of Metamizole versus Morphine for Acute Pancreatitis Pain[J]. Pancreatology, 2008, 8(1): 25-29.
doi: 10.1159/000114852 pmid: 18235213
[1] 夏天. 面向中文学术文本的单文档关键短语抽取 *[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[2] 高原,施元磊,张蕾,曹天奕,冯筠. 基于游记文本的游客游览行程重构*[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[3] 马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[4] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[5] 黄名选,卢守东,徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索 *[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[6] 杨亚楠,赵文辉,张健,谭珅,张贝贝. 基于多视图协同的政策文本可视化研究*[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[7] 黄菡,王宏宇,王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别*[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[8] 张梦吉,杜婉钰,郑楠. 引入新闻短文本的个股走势预测模型[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[9] 吴江,赵颖慧,高嘉慧. 医疗舆情事件的微博意见领袖识别与分析研究*[J]. 数据分析与知识发现, 2019, 3(4): 53-62.
[10] 陈美杉,夏晨曦. 肝癌患者在线提问的命名实体识别研究:一种基于迁移学习的方法 *[J]. 数据分析与知识发现, 2019, 3(12): 61-69.
[11] 余丽,钱力,付常雷,赵华茗. 基于深度学习的文本中细粒度知识元抽取方法研究*[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[12] 牟冬梅,金姗,琚沅红. 基于文献数据的疾病与基因关联关系研究*[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[13] 唐慧慧,王昊,张紫玄,王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[14] 张宁,尹乐民,何立峰. 网络股评“发布者-关注者”BSI与股票市场关联性研究*[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[15] 刘明辉. 基于K-means聚类分析的民航系统恐怖主义风险评估*[J]. 数据分析与知识发现, 2018, 2(10): 21-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn