Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (3): 29-37     https://doi.org/10.11925/infotech.2096-3467.2017.03.04
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向Cytoscape平台的关联数据知识图谱概览抽取与可视化*
姜赢(), 张婧, 朱玲萱
北京师范大学珠海分校管理学院 珠海 519087
Extracting and Visualizing Knowledge Graph Schema from Linked Data with Cytoscape Platform
Jiang Ying(), Zhang Jing, Zhu Lingxuan
School of Management, Beijing Normal University, Zhuhai, Zhuhai 519087, China
全文: PDF (3603 KB)   HTML ( 17
输出: BibTeX | EndNote (RIS)      
摘要 

目的】为更方便地查询和利用各个领域的海量关联数据, 提出一种关联数据知识图谱概览的生成方法, 使得用户在查询前就能了解关联数据访问点的内部数据结构。【方法】通过SPARQL查询关联数据所包含的领域知识关系, 针对每一个知识关系构建知识图谱概览三元组并形成初步的知识图谱概览, 再抽取每个知识分类的知识图谱概览三元组并合并到前者形成完整的知识图谱概览。【结果】研发Cytoscape插件实现此方法, 并进一步提供知识图谱概览可视化功能。【局限】不能处理匿名节点等复杂知识分类抽取。【结论】在生物医学领域分别进行单点抽取、关联“桥”和关联“包含”三项测试, 测试结果表明该方法抽取速度快而稳定, 抽取结果的查全率高, 且不需要网络爬虫或额外的索引工作。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
姜赢
张婧
朱玲萱
关键词 关联数据知识图谱概览SPARQLCytoscape    
Abstract

[Objective] This paper proposes a new method to generate knowledge graph schema, aiming to help us understand the data structure before submitting a query, and improve the perfornamce of linked data retrieval. [Methods] First, we searched knowledge relations of the linked data through SPARQL. Second, we constructed knowledge graph schema triples for each identified relation. Finally, we extracted graphs schema triples from every knowledge class and merged them with those of the relations. [Results] A Cytoscape plugin was developed based on the proposed method to visualize the knowledge graph schema. [Limitations] Our method could not extract knowledge from complex classtification, such as anonymous nodes. [Conclusions] The proposed method was examined with biomedical data for single, inclusive, and bridge extractions. It could retrieve information effectively, and does not need additional crawling and index efforts.

Key wordsLinked Data    Knowledge Graph Schema    SPARQL    Cytoscape
收稿日期: 2017-01-18      出版日期: 2017-04-20
ZTFLH:  TP393  
基金资助:*本文系广东省高等学校优秀青年教师培养计划项目“面向大数据的生物通路本体知识图谱可视化研究”(项目编号: YQ2015239)和广东省自然科学基金项目“基于本体推理演化的财经大数据分析与预测研究”(项目编号: 2016A030313386)的研究成果之一
引用本文:   
姜赢, 张婧, 朱玲萱. 面向Cytoscape平台的关联数据知识图谱概览抽取与可视化*[J]. 数据分析与知识发现, 2017, 1(3): 29-37.
Jiang Ying,Zhang Jing,Zhu Lingxuan. Extracting and Visualizing Knowledge Graph Schema from Linked Data with Cytoscape Platform. Data Analysis and Knowledge Discovery, 2017, 1(3): 29-37.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.03.04      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I3/29
  知识图谱概览抽取思路
  Pathway Commons的关联数据访问点的知识图谱概览抽取图
关联数据SPARQL访问点 RDF三元组个数 抽取时间(分钟)
Pathway Commons 27 623 683 8.16
BioCyc 18 532 342 9.57
MeSH 654 198 10.86
Reactome 2 980 230 6.45
  单个关联数据访问点的知识图谱概览抽取实验结果表
rdfs:domain
知识分类
知识关系 rdfs:range
知识分类
meshv:TreeNumber meshv:parentTreeNumber meshv:TreeNumber
meshv:treeNumber meshv:TreeNumber
meshv:Concept meshv:broaderConcept meshv:Concept
meshv:Concept meshv:narrowerConcept meshv:Concept
meshv:Concept meshv:relatedConcept meshv:Concept
meshv:Descriptor meshv:broaderDescriptor meshv:Descriptor
meshv:hasDescriptor meshv:Descriptor
meshv:allowableQualifier meshv:Qualifier
meshv:hasQualifier meshv:Qualifier
meshv:Qualifier meshv:broaderQualifier meshv:Qualifier
  MeSH关联数据中标记rdfs:domain和rdfs:range的知识关系
子类(知识分类) 父类(知识分类)
meshv:TreeNumber owl:Thing
meshv:Concept owl:Thing
meshv:Descriptor owl:Thing
meshv:DescriptorQualifierPair owl:Thing
meshv:SupplementaryConceptRecord owl:Thing
meshv:Qualifier owl:Thing
meshv:Term owl:Thing
meshv:broaderQualifier meshv:Qualifier
  MeSH关联数据中标记rdfs:subClassOf的知识分类
  B方法未能抽取的知识图谱概览三元组
对比项目 B1方法 B2方法 B方法(B1+B2) 本文
方法
抽取知识图谱概览
三元组数量
8 6 14 33
抽取查全率 22.86% 17.14% 40.00% 94.28%
  不同抽取方法的查全率对比表
  HGNC和BioCyc关联“包含”知识图谱概览抽取可视化图
关联数据SPARQL访问点 RDF三元组个数 图示节点形状
HGNC 922 523 圆形
MeSH 654 198 三角形
关联“包含” - 圆形
  HGNC和BioCyc关联“包含”的知识图谱概览抽取实验结果
  关联“桥”知识图谱概览抽取可视化图
关联数据SPARQL访问点 RDF三元组个数 图示节点形状
BioModel 2 380 009 三角形
Pathway Commons 27 623 683 方形
Linkedspl 2 174 579 菱形
关联“桥” - 圆形
  关联“桥”的知识图谱概览抽取实验结果
[1] Bizer C, Heath T, Berners-Lee T.Linked Data—The Story So Far[J]. International Journal on Semantic Web & Information Systems, 2009, 5(3): 1-22.
doi: 10.4018/jswis.2009081901
[2] Klyne G, Carroll J J, McBride B. RDF 1.1 Concepts and Abstract Syntax [EB/OL]. (2014-02-25). [2017-01-25]. .
[3] Harris S, Seaborne A. SPARQL 1.1 Query Language [EB/OL]. (2013-03-21). [2017-01-25]. .
[4] Feigenbaum L, Williams G T, Clark K G, et al. SPARQL 1.1 Protocol [EB/OL]. (2013-03-21). [2017-01-25]. .
[5] Schmachtenberg M, Bizer C, Paulheim H.Adoption of the Linked Data Best Practices in Different Topical Domains[C]// Proceedings of the 13th International Semantic Web Conference (ISWC 2014), Seattle, USA. Germany: Springer, 2014.
[6] Konrath M, Gottron T, Scherp A.Schemex-Web-Scale Indexed Schema Extraction of Linked Open Data[C]// Proceedings of the 13th International Semantic Web Conference (ISWC 2011) Submission to the Billion Triple Track, Bonn, Germany. Springer, 2011:52-58.
[7] Shannon P, Markiel A, Ozier O, et al.Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks[J]. Genome Research, 2003, 13(11): 2498-2504.
[8] Lanzenberger M, Sampson J, Rester M.Visualization in Ontology Tools[C]//Proceedings of the 2nd International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2009), Burgos, Spain. New York, USA: IEEE, 2009: 705-711.
[9] Gennari J H, Musen M A, Fergerson R W, et al.The Evolution of Protégé: An Environment for Knowledge-based Systems Development[J]. International Journal of Human- Computer Studies, 2003, 58(1): 89-123.
[10] Alani H.TGVizTab: An Ontology Visualisation Extension for Protégé[C]//Proceedings of the 2nd International Conference on Knowledge Capture (K-Cap’03), Florida, USA. ACM, 2003.
[11] Falconer S. OntoGraf [EB/OL]. [2017-01-25]. .
[12] Knublauch H, Fergerson R W, Noy N F, et al.The Protégé OWL Plugin: An Open Development Environment for Semantic Web Applications[C]//Proceedings of the 3rd International Semantic Web Conference (ISWC 2004), Karlsruhe, Germany. Springer, 2004: 229-243.
[13] Hussain A, Latif K, Rextin A T, et al.Scalable Visualization of Semantic Nets Using Power-Law Graphs[J]. Applied Mathematics & Information Sciences, 2014, 8(1): 355-367.
doi: 10.12785/amis/080145
[14] Haase P, Lewen H, Studer R, et al.The Neon Ontology Engineering Toolkit[C]//Proccedings of the 17th International World Wide Web Conference (WWW 2008), Beijing, China. ACM, 2008.
[15] Motta E, Mulholland P, Peroni S, et al.A Novel Approach to Visualizing and Navigating Ontologies[C]//Proceedings of the 10th International Semantic Web Conference (ISWC 2011), Bonn, Germany. Springer, 2011: 470-486.
[16] Krivov S, Villa F, Williams R, et al.On Visualization of OWL Ontologies[A]// Baker C J, Cheung K H. Semantic Web[M]. Springer Berlin Heidelberg, 2007: 205-221.
[17] Bārzdiņš J, Bārzdiņš G, Čerāns K, et al.UML Style Graphical Notation and Editor for OWL 2[C]//Proceedings of the 9th International Conference on Business Informatics Research (BIR 2010), Rostock, Germany. Springer, 2010: 102-114.
[18] Sintek M. OntoViz [EB/OL]. [2017-01-25]. .
[19] Lohmann S, Link V, Marbach E, et al.WebVOWL: Web-based Visualization of Ontologies[C]//Proceedings of the International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014), Linköping, Sweden. Springer, 2014: 154-158.
[20] Katifori A, Halatsis C, Lepouras G, et al. Ontology Visualization Methods—A Survey[J]. ACM Computing Surveys, 2007, 39(4): Article No. 10.
doi: 10.1145/1287620.1287621
[21] Bosca A, Bomino D, Pellegrino P.OntoSphere: More than a 3D Ontology Visualization Tool[C]// Proceedings of the 2nd Italian Semantic Web Workshop (SWAP 2005), Trento, Italy. 2005.
[22] Brickley D, Guha R V. RDF Schema 1.1[EB/OL]. (2013- 03-21). [2017-01-25]. .
[23] Bechhofer S, Harmelen F V, Hendler J, et al. OWL Web Ontology Language Reference [EB/OL]. (2004-02-10). [2017-01-25]. .
[24] Grau B C, Horrocks I, Motik B, et al.OWL 2: The Next Step for OWL[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2008,6(4): 309-322.
doi: 10.1016/j.websem.2008.05.001
[25] Gottron T, Knauf M, Scheglmann S, et al.A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud[C]//Proceedings of the 10th European Semantic Web Conference (ESWC 2013), Montpellier, France. Springer, 2013: 228-242.
[26] Zneika M, Lucchese C, Vodislav D, et al.RDF Graph Summarization Based on Approximate Patterns[C]// Proceedings of the International Workshop on Information Search, Integration, and Personalization (ISIP 2015), Grand Forks, USA. Springer, 2015: 69-87.
[27] Cerami E G, Gross B E, Demir E, et al.Pathway Commons, A Web Resource for Biological Pathway Data[J]. Nucleic Acids Research, 2011, 39(Database Issue): D685-D690.
doi: 10.1093/nar/gkq1039 pmid: 21071392
[28] Caspi R, Billington R, Ferrer L, et al.The MetaCyc Database of Metabolic Pathways and Enzymes and the BioCyc Collection of Pathway/Genome Databases[J]. Nucleic Acids Research, 2014, 42(Database Issue): D459-D471.
[29] Joshi-Tope G, Gillespie M, Vastrik I, et al.Reactome: A Knowledgebase of Biological Pathways[J]. Nucleic Acids Research, 2005, 33(Database Issue): D428-D432.
doi: 10.1093/nar/gki072 pmid: 540026
[30] Bruford E A, Lush M J, Wright M W, et al.The HGNC Database in 2008: A Resource for the Human Genome[J]. Nucleic Acids Research, 2008, 36(Database Issue): D445-D448.
doi: 10.1093/nar/gkm881 pmid: 17984084
[31] Novère N L, Bornstein B, Broicher A, et al.BioModels Database: A Free, Centralized Database of Curated, Published, Quantitative Kinetic Models of Biochemical and Cellular Systems[J]. Nucleic Acids Research, 2006, 34(Database Issue): D689-D691.
doi: 10.1093/nar/gkj092 pmid: 16381960
[32] Erling O, Mikhailov I.RDF Support in the Virtuoso DBMS[A]//Networked Knowledge-Networked Media[M]. Springer Berlin Heidelberg, 2009: 7-24.
[33] Weibel S.The Dublin Core: A Simple Content Description Model for Electronic Resources[J]. Bulletin of the Association for Information Science and Technology, 1997, 24(1): 9-11.
doi: 10.1002/bult.70
[34] Brickley D, Miller L. FOAF Vocabulary Specification 0.91[EB/OL]. (2007-11-02). [2017-01-25]. .
[1] 沈志宏,姚畅,侯艳飞,吴林寰,李跃鹏. 关联大数据管理技术: 挑战、对策与实践*[J]. 数据分析与知识发现, 2018, 2(1): 9-20.
[2] 崔家旺,李春旺. 基于关联数据的类簇语义揭示模型研究[J]. 数据分析与知识发现, 2017, 1(4): 57-66.
[3] 齐云飞, 赵宇翔, 朱庆华. 关联数据在数字图书馆移动视觉搜索系统中的应用研究*[J]. 数据分析与知识发现, 2017, 1(1): 81-90.
[4] 赵夷平,毕强. 关联数据在学术资源网相似文献发现中的应用研究*[J]. 现代图书情报技术, 2016, 32(3): 41-49.
[5] 郭振英, 赵文兵, 魏育辉. 轻量级书目本体关联数据建设实践[J]. 现代图书情报技术, 2015, 31(7-8): 139-143.
[6] 高劲松, 程娅, 梁艳琪. 面向关联数据集的本体匹配方法研究[J]. 现代图书情报技术, 2015, 31(6): 33-40.
[7] 梁艺多, 翟军. 本体推理在关联数据链接发现中的应用研究[J]. 现代图书情报技术, 2015, 31(4): 87-95.
[8] 高劲松, 梁艳琪, 李珂, 肖涟, 周习曼. 面向关联数据的电子商务信用信息服务模型研究[J]. 现代图书情报技术, 2014, 30(6): 8-16.
[9] 虞为, 陈俊鹏. 基于MapReduce的书目数据关联匹配研究[J]. 现代图书情报技术, 2013, 29(9): 15-22.
[10] 王忠义, 夏立新, 石义金, 郑森茂. 数字图书馆中层关联数据的创建与发布[J]. 现代图书情报技术, 2013, (5): 28-33.
[11] 刘炜, 夏翠娟, 张春景. 大数据与关联数据:正在到来的数据技术革命[J]. 现代图书情报技术, 2013, (4): 2-9.
[12] 夏翠娟. RDB2RDF标准及应用研究[J]. 现代图书情报技术, 2013, (4): 10-17.
[13] 朱雯晶, 夏翠娟, 刘炜. SILK关联发现框架综析[J]. 现代图书情报技术, 2013, (4): 18-24.
[14] 钟远薪, 李田章, 刘炜. OPAC混搭关联数据应用研究[J]. 现代图书情报技术, 2013, (4): 25-29.
[15] 高劲松, 梁艳琪, 马倩倩, 周习曼, 付旭雄. 面向关联数据的引文知识链接模式研究[J]. 现代图书情报技术, 2013, 29(3): 21-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn