Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (3): 79-86    DOI: 10.11925/infotech.2096-3467.2017.1047
Current Issue | Archive | Adv Search |
Using Text Mining to Discover Drug Side Effects: Case Study of PubMed
Xinyue Fan,Lei Cui()
School of Medical Informatics, China Medical University, Shenyang 110122, China
Download: PDF(1348 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      

[Objective] This paper finds the potential side effects of drugs with the help of text mining, aiming to improve the contents of existing databases and early prediction of drug side effects. [Methods] A total of 100, 873 articles were retrieved from the PubMed database for about five years (2011-2016). We generated the drug side effects co-occurrence matrix and conducted gCLUTO bi-clustering analysis with Perl’s segmentation technique, named entity recognition method based on the dictionary, as well as the R language. [Results] For one category of results, we found the precision rate of the proposed method reached 75.65%, and identified 13.91% potential side effects. [Limitations] Only used the dictionary-based named entity recognition method and did not consider grammatical or lexis factors, which yielded high false positive rates. [Conclusions] This paper proposes a new approach to detect the unannounced side effects of drugs automatically and effectively.

Key wordsDrug-Side Effects      Text Mining      Named Entity Recognition      Cluster Analysis     
Received: 20 October 2017      Published: 03 April 2018

Cite this article:

Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed. Data Analysis and Knowledge Discovery, 2018, 2(3): 79-86.

URL:     OR

[1] 赵明珍, 程亮喜, 林鸿飞. 基于评论挖掘的药物副作用发现机制[J]. 中文信息学报, 2015, 29(6): 193-202.
[1] (Zhao Mingzhen, Cheng Liangxi, Lin Hongfei.Detection of Adverse Drug Reactions Based on Comment Mining[J]. Journal of Chinese Information Processing, 2015, 29(6): 193-202.)
[2] 牛姝媛. 基于信息整合的药物副作用预测方法研究[D].上海: 华东师范大学, 2016.
[2] (Niu Shuyuan.Method Research for the Prediction of Drug’s Side Effect Based on Information Integration[D]. Shanghai: East China Normal University, 2016.)
[3] 丁玉峰, 周文丽. 药物不良反应与药物不良反应事件[J]. 医药导报, 2004, 23(8): 610.
[3] (Ding Yufeng, Zhou Wenli.Adverse Drug Reactions and Adverse Drug Events[J]. Herald of Medicine, 2004, 23(8): 610.)
[4] Ho T B, Le L, Thai D T, et al.Data-driven Approach to Detect and Predict Adverse Drug Reactions[J]. Current Pharmaceutical Design, 2016, 22(23): 3498.
[5] Karimi S, Wang C, Metke-Jimenez A, et al.Text and Data Mining Techniques in Adverse Drug Reaction Detection[J]. ACM Computing Surveys, 2015, 47(4): 1-39.
[6] 刘海山. 正确区分药物不良反应杜绝药物不良反应事件发生[J]. 实用医技杂志, 2005, 12(16): 2309.
[6] (Liu Haishan.The Correct Distinction Between Adverse Drug Reactions to Eliminate Adverse Drug Reactions Occured[J]. Journal of Practical Medical Techniques, 2005, 12(16): 2309.)
[7] 赵东彦, 王海虹, 王桂梅, 等.浅谈药品不良反应发生的原因及预防措施[J].山西医药杂志, 2010, 39(5): 442-443.
[7] (Zhao Dongyan, Wang Haihong, Wang Guimei, et al.Talking about the Reasons and Preventive Measures of Adverse Drug Reactions[J]. Shanxi Medical Journal, 2010, 39(5): 442-443.)
[8] 张新立. 临床常用药物副作用概述[J]. 健康必读旬刊, 2013, 12(12): 242.
[8] (Zhang Xinli.Common Clinical Side Effects of Drugs Outlined[J]. Healthmust-Readmagazine, 2013, 12(12): 242.)
[9] 隋明爽, 崔雷. 用文本挖掘方法发现药物的副作用[J]. 中华医学图书情报杂志, 2015, 24(11): 67-72.
[9] (Sui Mingshuang, Cui Lei.Detection of Drug Adverse Effects by Text-Mining[J]. Chinese Journal of Medical Library and Information Science, 2015, 24(11): 67-72.)
[10] Liu M, Wu Y, Chen Y, et al.Large-scale Prediction of Adverse Drug Reactions Using Chemical, Biological, and Phenotypic Properties of Drugs[J]. Journal of the American Medical Informatics Association, 2012, 19(1): 28-35.
[11] Pauwels E, Stoven V, Yamanishi Y.Predicting Drug Side-effect Profiles: A Chemical Fragment-based Approach[J]. BMC Bioinformatics, 2011, 12(1): 169.
[12] Vilar S, Tatonetti N P, Hripcsak G.3D Pharmacophoric Similarity Improves Multi Adverse Drug Event Identification in Pharmacovigilance[J].Scientific Reports, 2015, 5: 8809.
[13] Wang W, Haerian K, Salmasian H, et al.A Drug-Adverse Event Extraction Algorithm to Support Pharmacovigilance Knowledge Mining from PubMed Citations[C]//Proceedings of AMIA Annual Symposium. AMIA Symposium, 2011: 1464.
[14] 刘晓倩, 陶枫, 金昕, 等.基于文本挖掘方法探索中医治疗肥胖病的用药规律[J]. 世界科学技术: 中医药现代化, 2017, 19(2): 212-217.
[14] (Liu Xiaoqian, Tao Feng, Jin Xin, et al.Exploration of the Medication Regularity of Traditional Chinese Medicine for Obesity Based on Text Mining Techniques[J]. World Science and Technology-Modernization of Traditional Chinese Medicine, 2017, 19(2): 212-217.)
[15] 郭佳栋, 张雪梅, 刘影, 等.基于数据挖掘技术对胃癌化疗药物不良反应关联性研究[J]. 药物流行病学杂志, 2017(1): 46-49.
[15] (Guo Jiadong, Zhang Xuemei, Liu Ying, et al.Correlation Analysis of Gastric Cancer Chemotherapy Drugs Adverse Drug Reaction Based on Data Mining Technology[J].Chinese Journal of Pharmacoepidemiology, 2017(1): 46-49.)
[16] Kwartler T.Text Mining in Practice with R[M]. John Wiley & Sons, Ltd., 2017: 1-15.
[17] Allahyari M, Pouriyeh S, Assefi M, et al. Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques [OL]. arXiv Preprint, arXiv:1707.02919 2017.
[18] 陈基. 命名实体识别综述[J]. 现代计算机, 2016(3): 24-26.
[18] (Chen Ji.Survey of Named Entity Recognition[J]. Modern Computer, 2016(3): 24-26.)
[19] 范文婷. 生物医学领域的命名实体识别和标准化[D]. 大连: 大连理工大学, 2013.
[19] (Fan Wenting.Named Entities Recognition and Normalization in Biomedical Literatures[D]. Dalian: Dalian University of Technology, 2013.)
[20] 滕达. 基于机器学习的蛋白质命名实体识别和相互作用关系抽取的研究[D]. 合肥: 中国科学技术大学, 2012.
[20] (Teng Da.Research on Machine Learning Algorithms of Protein Named Entity Recognition and Protein Interaction Relation Extraction[D].Hefei: University of Science and Technology of China, 2012.)
[21] 刘步权, 廖湘科. Perl程序设计语言综述[J]. 计算机工程与应用, 2002, 38(18): 86-87.
[21] (Liu Buquan, Liao Xiangke.Perl Programming Language Summary[J]. Computer Engineering and Applications, 2002, 38(18): 86-87.)
[22] Richards J, All S, Skopis G, et al.Opposing Actions of Perl and Cry2 in the Regulation of Perl Target Gene Expression in the Liver and Kidney[J]. American Journal of Physiology, 2013, 305(4): 735-747.
[23] 石翠, 王杨. 运用perl轻松处理字符串[J]. 办公自动化, 2014(7): 56-57.
[23] (Shi Cui, Wang Yang.Using Perl Easy Processing String[J]. Office Automation, 2014(7): 56-57. )
[24] 王巍. 基于Perl的汉语自动分词算法研究[J]. 中州大学学报, 2007, 24(1): 120-122.
[24] (Wang Wei.Algorithmic Study on Perl-based Automatic Segmentation of Chinese Words[J]. Journal of Zhongzhou University, 2007, 24(1): 120-122.)
[25] Kuhn M, Letunic I, Jensen L J, et al.The SIDER Database of Drugs and Side Effects[J]. Nucleic Acids Research, 2016, 44(D1): 1075-1079.
[26] Wishart D S, Knox C, Guo A C, et al.DrugBank: A Knowledgebase for Drugs, Drug Actions and Drug Targets[J]. Nucleic Acids Research, 2008, 36(Database Issue): 901-906.
[27] 王秀艳. 基于主题词关联规则的实体间语义关系抽取——以药物副作用引起疾病为例[D]. 沈阳: 中国医科大学, 2012.
[27] (Wang Xiuyan.Semantic Relations Extraction Based on MeSH Term Association Rules: A Case Study of Drug Side Effects Causing Disease [D]. Shenyang: China Medical University, 2012.)
[28] Rasmussen M, Karypis G. gCLUTO-An Interactive Clustering, Visualization, and Analysis System [R].UMN-CS TR-04-021, 2004.
[29] 杨颖, 崔雷. 同被引双聚类方法在情报分析中应用研究[C]//中国竞争情报年会, 2013.
[29] (Yang Ying, Cui Lei.Applied Research of Cited Biclustering Method in Intelligence Analysis[C]//Proceedings of China Competitive Intelligence Annual Meeting, 2013.)
[30] 于跃, 徐志健, 王坤, 等. 基于双聚类方法的生物医学信息学文本数据挖掘研究[J]. 图书情报工作, 2012, 56(18): 133-136.
[30] (Yu Yue, Xu Zhijian, Wang Kun, et al.Text Data Mining in Biomedical Informatics Based on Biclustering Method[J]. Library and Information Service, 2012, 56(18): 133-136.)
[31] 方丽, 崔雷. 利用双聚类算法探测学科前沿及知识基础——以h指数研究领域为例[J]. 情报理论与实践, 2014, 37(11): 55-60.
[31] (Fang Li, Cui Lei.Detection of Frontier and Knowledge Base Using Biclustering Algorithm-A Case Study of h Index[J]. Information Studies: Theory & Application, 2014, 37(11): 55-60.)
[32] Lyons G, Columb M, Wilson R C, et al.Epidural Pain Relief in Labour: Potencies of Levobupivacaine and Racemic Bupivacaine[J]. British Journal of Anaesthesia, 1998, 81(6): 899-901.
[33] Song Y K, Lee C.Effects of Ramosetron and Dexamethasone on Postoperative Nausea, Vomiting, Pain, and Shivering in Female Patients Undergoing Thyroid Surgery[J].Journal of Anesthesia, 2013, 27(1): 29-34.
[34] 任翠玉, 任红梅. 头孢唑林钠引起腹痛1例[J]. 中国误诊学杂志, 2006, 6(19): 3889.
[34] (Ren Cuiyu, Ren Hongmei.Cefazolin Sodium Caused Abdominal Pain in 1 Case[J]. Chinese Journal of Misdiagnosis, 2006, 6(19): 3889.)
[35] Cefazolin Side Effects in Detail[DB/OL]. [2017-09-09]..
[36] Stevens B, Yamada J, Ohlsson A. Sucrose for Analgesia in Newborn Infants Undergoing Painful Procedures[J]. The Cochrane Database of Systematic Reviews, 2013, 14(1): CD001069.
[37] Webster L, Chey W D, Tack J, et al.Randomised Clinical Trial: The Long-term Safety and Tolerability of Naloxegol in Patients with Pain and Opioid-induced Constipation[J]. Alimentary Pharmacology & Therapeutics, 2014, 40(7): 771-779.
[38] Peiró A M, Martínez J, Martinez E, et al.Efficacy and Tolerance of Metamizole versus Morphine for Acute Pancreatitis Pain[J]. Pancreatology, 2008, 8(1): 25-29.
[1] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[2] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[3] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[4] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[5] Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[6] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[7] Ning Zhang,Lemin Yin,Lifeng He. Impacts of “Poster-Follower” Sentiment on Stock Market Performance[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[8] Yue He,Aixin Wang,Yue Feng,Li Wang. Optimizing Layouts of Outpatient Pharmacy Based on Association Rules[J]. 数据分析与知识发现, 2018, 2(1): 99-108.
[9] Runwen Chen,Yong Qiu,Wenbin Huang,Jun Wang. Analyzing Private College Students’ Online Lifestyle with Web-logs[J]. 数据分析与知识发现, 2017, 1(8): 31-38.
[10] Xueying Wang,Zixuan Zhang,Hao Wang,Sanhong Deng. Evaluating Brands of Agriculture Products: A Literature Review[J]. 数据分析与知识发现, 2017, 1(7): 13-21.
[11] Jiawang Cui,Chunwang Li. Identifying Semantic Relations of Clusters Based on Linked Data[J]. 数据分析与知识发现, 2017, 1(4): 57-66.
[12] Qiangbing Wang,Chengzhi Zhang. Constructing Users Profiles with Content and Gesture Behaviors[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[13] Xiufang Xie,Xiaolin Zhang. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[14] Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[15] Lan Qiujun,Liu Wenxing,Li Weikang,Hu Xingye. Sentiment Analysis of Financial Forum Textual Message[J]. 现代图书情报技术, 2016, 32(4): 64-71.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938