融合异构知识网络元路径特征的药物知识发现方法研究——以药物-靶标关系预测为例

doi:10.11925/infotech.2096-3467.2023.0869

朱祥,张云秋,孙绍丹,张莉曼

(南京理工大学网络空间安全学院江苏南京 210094） (吉林大学公共卫生学院医学信息学系吉林长春 130021)

Research on Drug Knowledge Discovery Method Fusing Meta-path Features of Heterogeneous Knowledge Network: Taking the Prediction of Drug-Target Relations as An Example

Zhu Xiang,Zhang Yunqiu,Sun Shaodan,Zhang Liman

（School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094）（Department of Medical Informatics, School of Public Health, Jilin University, Changchun 130021）

摘要
相关文章
Metrics

全文:
输出: BibTeX | EndNote (RIS)

摘要

[目的] 本研究提出一种融合异构知识网络元路径特征的药物知识发现方法，以进一步提高药物知识发现性能。

[方法] 首先构建一个包含4种实体类型和6种关系类型药物异构知识网络，然后基于知识网络元路径和HeteSim算法获得药物-目标实体间的多维元路径特征，进而将得到的元路径特征与药物相似性、目标实体相似性特征相融合，作为机器学习模型的特征输入实现药物知识发现。

[结果] 构建的药物异构知识网络共包含12015个节点和1895445个边。以药物-靶标关系预测为例计算得到了药物-靶标间的21维HeteSim特征。实证研究表明，本方法的AUC值在3种机器学习模型上均取得了最高值（XGBoost=0.993，RF=0.990，SVM=0.975）。此外，准确率、精准率、F值也高于其它两种对比方法。并且通过对20个预测结果进行文献查找，发现部分预测结果可以得到先前文献的证据支持。

[局限] 虽然使用了PU学习策略来降低样本不平衡所带来的影响，但依然会造成一部分结果的失真。

[结论] 本研究提出的药物知识发现方法具有一定的先进性和有效性，具有一定的理论和方法借鉴意义。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

关键词 ：药物知识发现, 异构知识网络, 元路径, 机器学习, 药物-靶标

Abstract：

[Objective] This paper proposes a drug knowledge discovery method that fuses meta-path features of heterogeneous knowledge network to improve the performance of drug knowledge discovery.

[Methods] Based on different meta-paths connecting drug and target entity in heterogeneous knowledge network, the HeteSim algorithm is used to calculate the multi-dimensional semantic similarity of drug-target entity. These meta-path features are fused with drug similarity and target entity similarity features as feature inputs for machine learning models to achieve drug knowledge discovery.

[Results] The drug heterogeneous knowledge network contains 12015 nodes and 1895445 edges. Taking drug-target relation prediction as an example, the 21-dimensional HeteSim features between drug and target were calculated. The AUC value of this method achieved the highest value on the three machine learning models (XGBoost=0.993, RF=0.990, SVM=0.975). The accuracy, accuracy and F-value of this method are also higher than those of the other two comparison methods. Through literature search of 20 prediction results, it is found that some prediction results can be supported by evidence in previous literature.

[Limitations] Although PU learning strategy is used to reduce the influence of sample imbalance, some results will still be distorted.

[Conclusions] The drug knowledge discovery method proposed in this study has certain progressiveness and effectiveness, and has certain theoretical and methodological reference significance.

Key words： Drug knowledge discovery Heterogeneous knowledge network Meta-path Machine learning Drug-Target

出版日期: 2024-04-18

ZTFLH:

G351

引用本文:

朱祥, 张云秋, 孙绍丹, 张莉曼. 融合异构知识网络元路径特征的药物知识发现方法研究——以药物-靶标关系预测为例 [J]. 数据分析与知识发现, 10.11925/infotech.2096-3467.2023.0869.
Zhu Xiang, Zhang Yunqiu, Sun Shaodan, Zhang Liman. Research on Drug Knowledge Discovery Method Fusing Meta-path Features of Heterogeneous Knowledge Network: Taking the Prediction of Drug-Target Relations as An Example . Data Analysis and Knowledge Discovery, 0, (): 1-.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0869 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y0/V/I/1

[1]	聂卉, 吴晓燕. 结合梯度提升树算法与可解释机器学习模型SHAP的抑郁症影响因素研究*[J]. 数据分析与知识发现, 2024, 8(3): 41-52.
[2]	张云秋, 黄麒霏, 朱祥. 基于关系融合和双向扩散模型的药物与靶标关系预测方法研究^*[J]. 数据分析与知识发现, 2024, 8(2): 155-167.
[3]	刘智锋, 王继民. 可解释机器学习在信息资源管理领域的应用研究综述^*[J]. 数据分析与知识发现, 2024, 8(1): 16-29.
[4]	刘天畅, 王雷, 朱庆华. 基于SHAP解释方法的智慧居家养老服务平台用户流失预测研究^*[J]. 数据分析与知识发现, 2024, 8(1): 40-54.
[5]	徐晨, 张巍. 不平衡数据背景下基于文本线索的公益众筹欺诈项目检测^*[J]. 数据分析与知识发现, 2023, 7(9): 125-135.
[6]	韦华楠, 雷鸣, 汪雪锋, 余音. 基础研究资助导向识别及演化分析：以NSF为例[J]. 数据分析与知识发现, 2023, 7(5): 10-20.
[7]	林伟振, 刘洪伟, 陈燕君, 温展明, 易闽琦. 基于在线评论的顾客满意度研究——以健康监测穿戴产品为例^*[J]. 数据分析与知识发现, 2023, 7(5): 145-154.
[8]	蒋林甫, 袁贞明, 张邢炜, 姜华强, 孙晓燕. 基于PCHD-TabNet的十年冠心病预测^*[J]. 数据分析与知识发现, 2023, 7(5): 133-144.
[9]	吕琦, 上官燕红, 张琳, 黄颖. 基于文本内容自动分类的跨学科测度研究^*[J]. 数据分析与知识发现, 2023, 7(4): 56-67.
[10]	曲宗希, 沙勇忠, 李雨桐. 基于灰狼优化与多机器学习的重大传染病集合预测研究——以COVID-19疫情为例*[J]. 数据分析与知识发现, 2022, 6(8): 122-133.
[11]	赵杨, 严周周, 沈棋琦, 李钟航. 基于机器学习的医疗健康APP隐私政策合规性研究*[J]. 数据分析与知识发现, 2022, 6(5): 112-126.
[12]	王露, 乐小虬. 科技论文引用内容分析研究进展[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[13]	王若佳, 严承希, 郭凤英, 王继民. 基于用户画像的在线健康社区用户流失预测研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 80-92.
[14]	吴金红, 穆克亮. 国际期刊异常行为的自动识别与预警研究^*[J]. 数据分析与知识发现, 2022, 6(2/3): 385-395.
[15]	胡雅敏, 吴晓燕, 陈方. 基于机器学习的技术术语识别研究综述[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.

Viewed

Full text

Abstract

Cited

Shared

Discussed