Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (1): 121-130    DOI: 10.11925/infotech.2096-3467.2019.0955
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向多源词表整合的概念自动更新策略研究*
孙海霞1,2,邓盼盼2,李姣2,沈柳2,钱庆2()
1南京大学信息管理学院 南京 210046
2中国医学科学院医学信息研究所 北京 100020
Automatic Concept Update Strategy Towards Heterogeneous Terminology Integration
Haixia Sun1,2,Panpan Deng2,Jiao Li2,Liu Shen2,Qing Qian2()
1School of Information Management, Nanjing University, Nanjing 210093, China
2Institute of Medical Information,Chinese Academy of Medical Sciences, Beijing 100020, China
全文: PDF(819 KB)   HTML ( 14
输出: BibTeX | EndNote (RIS)      
摘要 

【目标】 提出面向KOS版本演化的整合概念更新方法,促进多源异构词表整合系统动态发展。【方法】 聚焦术语、同义词集合和优选术语三类知识单元,通过字符串精确匹配识别来源术语和优选术语变更模式;通过概念向量空间识别来源概念同义词集合变更模式;融合规则和相似度更新整合概念同义词集合和优选术语;以STKOS超级科技词表的医学类整合概念集及其重要来源MeSH和HUGO进行实验与准确性评估。【结果】 新增术语更新同义归并准确率达94.96%,变更整合概念优选术语推荐准确率达99.91%。【局限】 概念变更模式识别未考虑术语歧义性;多表同时更新时,变更概念术语归并准确率受词表部数和更新顺序影响。【结论】 本文提出的整合概念自动更新策略可用于来源KOS版本升级引发的同义互操作系统概念升级。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
孙海霞
邓盼盼
李姣
沈柳
钱庆
关键词 同义互操作互操作维护整合概念更新知识组织体系    
Abstract

[Objective] This paper proposes a method updating integrated concept for the version evolution of source Knowledge Organization Systems (KOSs), aiming to promote the dynamic development of the heterogeneous terminology integration system.[Methods] Our model focuses on terms, synonym sets and preferred terms of concepts. Firstly, we identified terms changing types and preferred terms changing modes of concepts in source KOSs by exact string matching. Then, we recognized their synonym sets changing patterns through concept vector space. Finally, we updated synonym sets and preferred terms of integrated concepts fusion rule and similarity. We also assessed the results yielded by our method using medical integration concept set of STKOS and its important sources, MeSH and HUGO.[Results] The synonymous merging rate of new term from source KOSs reached 94.96%, and the update accuracy of preferred term of changed integrated concepts reached 99.91%.[Limitations] We did not consider ambiguity of the terms and the results were affected by the number of vocabulary and update order.[Conclusions] The proposed method can be applied to update concepts of synonymous knowledge organization systems because of their source KOSs evolution.

Key wordsSynonymous Interoperability    Interoperability Maintenance    Integrated Concept Updating    Knowledge Organization Systems
收稿日期: 2019-08-20     
中图分类号:  TP393  
基金资助:*本文系国家科技图书文献中心“下一代国家科技创新开放知识服务系统”先期研发任务“STKOS 自动构建与维护关键技术研究”(XQYF0102);国家重点研发计划“精准医学本体和语义网络构建”(2016YFC0901901);中国医学科学院医学与健康科技创新工程项目“中文临床医学术语系统构建研究”的研究成果之一(2017-I2M-3-014)
通讯作者: 钱庆     E-mail: qing@imicams.ac.cn
引用本文:   
孙海霞,邓盼盼,李姣,沈柳,钱庆. 面向多源词表整合的概念自动更新策略研究*[J]. 数据分析与知识发现, 2020, 4(1): 121-130.
Haixia Sun,Panpan Deng,Jiao Li,Liu Shen,Qing Qian. Automatic Concept Update Strategy Towards Heterogeneous Terminology Integration. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2019.0955.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.0955
图1  基于来源词表版本演化的同义互操作系统整合概念自动更新技术路线
图2  新增术语扩充更新算法流程
新旧版本来源词表概念变化类别 变更数量
MeSH 2017 HUGO 2016
术语变更 删除术语DT 6873 20 383
新增术语NT 29 209 52 002
未变化术语UT 205 760 115 646
同义词集合变更 模式“0” 删除概念同义词集合Cdel 228 1 522
模式“1” 完全新增概念同义词集合Cnew 4 341 10 186
模式“2” 未变化同义词集合Sunc 44 883 16 766
拆分同义词集合Sut-split 135 0
合并同义词集合Sut-merge 91 2
复合操作变更同义词集合Sut-change 15 0
模式“3” 变化概念同义词集合Sc 5 246 12 862
优选术语变更 优选术语未变化PTunc 49 804 1 522
优选术语变化PTc 566 17 968
新概念优选术语PTnew 4 314 10 186
表1  MeSH和HUGO版本变更统计
整合概念变更类别 实验1 实验2 实验3 实验4
删除概念Cdel 127 1 575 1 649 1 693
新增概念Cnew Ptnew 2 272 9 446 7 187 11 704
同义词集合不变Sunc 优选术语未变Ptunc 388 867 16 856 55 037 55 489
同义词集合不变Sunc 优选术语改变Ptc 2 0 2 2
同义词集合改变Sc 优选术语未变Ptunc 4 438 11 638 16 492 16 049
同义词集合改变Sc 优选术语改变Ptc 114 1 411 1 565 1 521
表2  4轮实验后整合概念变更统计
评价数据集 实验1 实验2 实验3 实验4
序号 整合概念变更类别 整合概念数 新增术语数 整合概念数 新增术语数 整合概念数 新增术语数 整合概念数 新增术语数
1 Cnew 114 565 472 1 502 359 1 338 585 1 942
2 Sunc Ptc 2 0 0 0 2 0 2 0
3 Sc Ptunc 222 940 581 861 824 1 763 802 1 840
4 Sc Ptc 6 22 70 244 78 628 76 244
合计 344 1 527 1 123 2 607 1 263 3 729 1 465 4 026
表3  评价数据集抽样结果
同义词集合变更类别 实验1 实验2 实验3 实验4
Cnew 100.00% 100.00% 100.00% 100.00%
Sc 92.00% 87.33% 72.48% 85.41%
合计 94.96% 94.63% 82.35% 92.45%
表4  新增术语归准率评估结果
整合概念变更类别 实验1 实验2 实验3 实验4
Cnew Ptnew 98.25% 100.00% 98.33% 99.66%
Sunc Ptc 100.00% - 100.00% 100.00%
Sc Ptunc 97.30% 99.83% 99.88% 100.00%
Sc Ptc 50.00% 100.00% 94.87% 92.11%
合计 96.80% 99.91% 99.13% 99.45%
表5  变更整合概念优选术语推准率
[1] 司莉 . 知识组织系统的互操作及其实现[J]. 现代图书情报技术, 2007(3):29-34.
( Si Li . Interoperability and Its Implementation Among Knowledge Organization Systems[J]. New Technology of Library and Information Service, 2007(3):29-34.)
[2] 宋文 . 知识组织体系语义互操作研究[J]. 图书馆论坛, 2012,32(6):117-121.
( Song Wen . Research on Interoperation of Knowledge Organization System[J]. Library Tribune, 2012,32(6):117-121.)
[3] 孙坦, 刘峥 . 面向外文科技文献信息的知识组织体系建设思路[J]. 图书与情报, 2013(1):2-7.
( Sun Tan, Liu Zheng . Methodology Framework of Knowledge Organization System for Scientific & Technological Literature[J]. Library and Information, 2013(1):2-7.)
[4] Lindberg C . The Unified Medical Language System (UMLS) of the National Library of Medicine[J]. Journal (American Medical Record Association), 1990,61(5):40-42.
[5] 潘洪建 . 知识本质: 内在、开放、动态——新知识观的思考[J]. 教育理论与实践, 2003,23(2):1-6.
( Pan Hongjian . The Essence of Knowledge: Inner, Open and Dynamic—On the Outlook on Knowledge[J]. Theory and Practice of Education, 2003,23(2):1-6.)
[6] MeSH [DB/OL]. [2019-03-28]..
[7] SNOMEDCT International [DB/OL]. [2019-03-28]..
[8] LOINC[DB/OL]. [2019-03-28]..
[9] HGNC [DB/OL]. [2019-03-28]..
[10] 中医药主题词表 [DB/OL]. [ 2019- 03- 28]. .
( Traditional Chinese Medicine Thesaurus [DB/OL]. [ 2019- 03- 28]. )
[11] 中文医学主题词表[DB/OL][ 2019- 03- 28]. .
( Chinese Medical Subject Headings)[DB/OL] [ 2019- 03- 28]. .)
[12] Saitwal H, Qing D, Jones S , et al. Cross-terminology Mapping Challenges: A Demonstration Using Medication Terminological Systems[J]. Journal of Biomedical Informatics, 2012,45(4):613-625.
[13] Dos Reis J C, Pruski C, Da SilveiraM , et al. Understanding Semantic Mapping Evolution by Observing Changes in Biomedical Ontologies[J]. Journal of Biomedical Informatics, 2014,47:71-82.
[14] The National Library of Medicine. UMLS [DB/OL].[2019-03-28]. .
[15] Da Silveira M, Dos Reis J C, Pruski C, . Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges[J]. Yearbook of Medical Informatics, 2015,24(1):125-133.
[16] Noy N F, Musen M A. PROMPTDIFF: A Fixed-Point Algorithm for Comparing Ontology Versions [C]//Proceedings of the 18th National Conference on Artificial Intelligence,Edmonton. 2002: 744-750.
[17] Hartung M, Kirsten T, Rahm E. Analyzing the Evolution of Life Science Ontologies and Mappings [C]// Proceedings of the 5th International Workshop on Data Integration in the Life Sciences.Berlin: Springer, 2008: 11-27.
[18] Hartung M, Kirsten T, Gross A , et al. OnEX: Exploring Changes in Life Science Ontologies[J]. BMC Bioinformatics, 2009, 10:Article No. 250.
[19] Hartung M, Gross A, Rahm E . COnto-Diff: Generation of Complex Evolution Mappings for Life Science Ontologies[J]. Journal of Biomedical Informatics, 2013,46(1):15-32.
[20] Gross A, Hartung M, Thor A, et al. How do Computed Ontology Mappings Evolve? A Case Study for Life Science Ontologies [C]// Proceedings of the 2012 Joint Workshop on Knowledge Evolution and Ontology Dynamics, Boston. 2012: 1-12.
[21] Dos Reis J C, Dinh D, Da Silveira M , et al. Recognizing Lexical and Semantic Change Patterns in Evolving Life Science Ontologies to Inform Mapping Adaptation[J]. Artificial Intelligence in Medicine, 2015,63(3):153-170.
[22] Dinh D, Dos Reis J C, Pruski C , et al. Identifying Relevant Concept Attributes to Support Mapping Maintenance Under Ontology Evolution[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2014,29:53-66.
[23] Meilicke C, Stuckenschmidt H, Tamilin A . Reasoning Support for Mapping Revision[J]. Journal of Logic and Computation, 2009,19(5):807-829.
[24] Castano S, Ferrara A, Lorusso D . Mapping Validation by Probabilistic Reasoning [C]// Proceedings of the 5th European Semantic Web Conference on the Semantic Web: Research & Applications, Canary Islands, Spain. Berlin, Heidelberg: Springer, 2008: 170-184.
[25] Meyniel F, Schlunegger D, Dehaene S . The Sense of Confidence During Probabilistic Learning: A Normative Account[J]. PLoS Computational Biology, 2015,11(6):e1004305.
[26] Khattak A M, Pervez Z, Latif K , et al. Time Efficient Reconciliation of Mappings in Dynamic Web Ontologies[J]. Knowledge-Based Systems, 2012,35:369-374.
[27] Tang F, Tang R. Minimizing Influence of Ontology Evolution In Ontology-based Data Access System [C]// Proceedings of the 2010 IEEE International Conference on Progress in Informatics and Computing, Shanghai, China. 2010: 10-14.
[28] Martins H, Silva N. A User-driven and a Semantic-based Ontology Mapping Evolution Approach [C]// Proceedings of the 11th International Conference on Enterprise Information Systems, Milan, Italy. 2009: 214-221.
[29] Dos Reis J C, Pruski C, Da Silveira M , et al. DyKOSMap: A Framework for Mapping Adaptation Between Biomedical Knowledge Organization Systems[J]. Journal of Biomedical Informatics, 2015,55:153-173.
[30] 孙海霞, 成颖 . 信息集成中的字符串匹配技术研究[J]. 现代图书情报技术, 2007(7):22-26.
( Sun Haixia, Cheng Ying . Study on String-based Matching of Information Integration[J]. New Technology of Library and Information Service, 2007(7):22-26.)
[31] 孙海霞, 李军莲, 华薇娜 , 等. 科技知识组织体系语义互操作网络协同工作平台设计与实现[J]. 农业图书情报, 2019,31(1):23-34.
( Sun Haixia, Li Junlian, Hua Weina , et al. Design and Implementation of Network Collaborative Work Platform for Semantic Interoperability of Science and Technology Knowledge Organization Systems[J]. Agricultural Library and Information, 2019,31(1):23-34.)
[32] STKOS超级科技词表协同构建与管理系统[DB/OL]. [ 2019- 04- 28]. .
( The Collaborative Building and Management System for Super Scientific and Technological Thesaurus [DB/OL]. [ 2019- 04- 28]. .)
[1] 付鸿鹄, 张智雄, 刘建华, 钱力, 王颖. 构建STKOS术语发布与共享服务平台[J]. 现代图书情报技术, 2015, 31(9): 76-81.
[2] 刘丹军, 付鸿鹄, 文奕, 胡正银, 杨宁, 向彬, 钱力, 刘春江. 科技知识组织体系版本管理系统设计与实践应用[J]. 现代图书情报技术, 2015, 31(4): 79-86.
[3] 李鹏, 朱礼军, 刘亚洁, 闫莹莹. 一种改进RBAC模型在规范概念协同工作平台任务管理中的实现[J]. 现代图书情报技术, 2014, 30(2): 86-91.
[4] 李亚子, 孙海霞, 蒋君, 钱庆. 协同工作系统中用户角色的设计与实施[J]. 现代图书情报技术, 2013, 29(2): 77-81.
[5] 张旺强, 祝忠明, 卢利农, 周子健, 张士男, 黄金霞, 宋文, 刘毅. 机构知识库集成OpenKOS主题标引与检索聚类服务的实现及应用[J]. 现代图书情报技术, 2012, 28(3): 1-7.
[6] 曲建峰, 李芳, 张轶华, 李鲍. 知识组织系统自动映射规则研究与实现——以《杜威十进分类法》和《中国图书馆分类法》为例[J]. 现代图书情报技术, 2012, (10): 83-88.
[7] 刘毅, 宋文, 汤怡洁, 杨锐, 黄金霞, 周子健. 基于Vitro构建专业领域知识应用环境[J]. 现代图书情报技术, 2010, 26(12): 21-27.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn