基于加权XML模型的XML数据与DTD模式匹配*

doi:10.11925/infotech.1003-3513.2010.01.11

现代图书情报技术

2010, Vol. 26

Issue (1): 57-65 https://doi.org/10.11925/infotech.1003-3513.2010.01.11

知识组织与知识管理

本期目录 | 过刊浏览 | 高级检索

基于加权XML模型的XML数据与DTD模式匹配*

李树青程国达王维民

（南京财经大学信息工程学院南京 210046）

The Schema Matching of XML and DTD Based on Weighted XML Data Model

Li Shuqing Cheng Guoda Wang Weimin

(College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210046, China)

摘要
参考文献
相关文章
Metrics

全文: PDF (498 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

首先说明利用加权XML数据模型分别得到标准XML参考实例和XML数据实例的方法，并对DTD约束修饰符的表达方法进行介绍。其次，详细阐述相似度算法的实现方法，重点说明在XML数据实例中寻找与标准XML参考实例的匹配节点算法和计算标准 XML参考实例与XML数据实例的相似度算法。最后，对相关实验及其结论进行总结。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	李树青
	程国达
	王维民

关键词 ：加权XML, DTD, 相似度, 模式匹配

Abstract：

This paper first introduces standard XML reference instants and XML data instants based on the weighted XML data model. Then it displays the expression ways of constraints in DTD. Furthermore, the paper also shows the approaches on how to implement similarity algorithm，with an emphasis on how to find out a matching node with standard XML reference instants and to get the similarity algorithm of standard XML reference instants and that of XML data instants.

Key words： Weighted XML DTD Similarity Schema matching

收稿日期: 2009-12-07 出版日期: 2001-01-25

TP391

基金资助:

*本文系江苏省教育厅“青蓝工程”基金资助项目和江苏省教育厅高校自然科学研究项目“分布式数据流的挖掘及其应用”（项目编号：06KJD520073）的研究成果之一。

通讯作者: 李树青 E-mail: leeshuqing@163.com

作者简介: 李树青,程国达,王维民

引用本文:

李树青,程国达,王维民. 基于加权XML模型的XML数据与DTD模式匹配*[J]. 现代图书情报技术, 2010, 26(1): 57-65.
Li Shuqing,Cheng Guoda,Wang Weimin. The Schema Matching of XML and DTD Based on Weighted XML Data Model. New Technology of Library and Information Service, 2010, 26(1): 57-65.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2010.01.11 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2010/V26/I1/57

［1］ Bertino E， Guerrini G，Mesiti M. A Matching Algorithm for Measuring the Structural
Similarity Between an XML Documents and a DTD and Its Applications［J］. Information Systems, 2004, 29（1）:23-46.
［2］ Tekli J, Chbeir R, Yetongnon K. An XML Grammar Comparison Framework–Technical Report［R/OL］.
［2009-11-11］.http://www.u-bourgogne.fr/Dbconf/XGM/MatchersAndExperiments.pdf.
［3］ Rahm E，Bernstein P A. A Survey of Approaches to Automatic Schema Matching［J］. The VLDB Journal, 2001, 10(4):334-350.
［4］ Silvana Castano，Valeria De Antonellis，Sabrina De Capitani di Vimercati. Global Viewing
of Heterogeneous Data Sources［J］. IEEE Transactions on Knowledge and Data Engineering, 2001,13(2)：277-297.
［5］ Li J X, Liu J X，Liu C F, et al. Computing Structural Similarity of Source XML Schemas
Against Domain XML Schema［C］.In：Proceedings of the 19th Conference on Australasian
Database，Gold Coast, Australia. Darlinghurst, Australia：Australian Computer Society，2008:155-164.
［6］ Guerrini G, Mesiti M, Sanz I. An Overview of Similarity Measures for Clustering XML
Documents［EB/OL］. ［2009-12-01］.http://krono.act.uji.es/publications/pdf/gms.pdf.
［7］ Nierman A, Jagadish H V. Evaluating Structural Similarity in XML Documents［C］.
In:Proceedings of the 5th ACM SIGMOD International Workshop on the Web and Databases.2002: 61-66.
［8］ Dalamagas T, Cheng T, Winkel K, et al. A Methodology for Clustering XML Documents by
Structure［J］. Information Systems, 2006, 31(3):187-228.
［9］ Tekli J, Chbeir R, Yetongnon K. Structural Similarity Evaluation Between XML Documents
and DTDs［C］. In:Proceedings of the 8th International Conference on Web Information Systems
Engineering (WISE’07),Nancy, France. Berlin Heidelberg: Springer-Verlag, 2007: 196-201.
［10］ Yang R, Kalnis P, Tung A K H. Similarity Evaluation on Tree-structured Data［C］. In:
Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data，
Baltimore, Maryland. New York, USA：ACM Press, 2005: 754-765.
［11］ Doan A, Domingos P, Halevy A Y. Reconciling Schemas of Disparate Data Sources: A
Machine Learning Approach［J］. ACM SIGMOD Record, 2001,30(2):509-520.
［12］ Su H, Padmanabhan S, Lo M L. Identification of Syntactically Similar DTD Elements for
Schema Matching［C］. In: Proceedings of the 2nd International Conference on Advances in
Web-Age Information Management. London, UK:Springer-Verlag,2001: 145-159.
［13］ Boukottaya A, Vanoirbeek C. Schema Matching for Transforming Structured Documents
［C］. In:Proceedings of the 2005 ACM Symposium on Document Engineering, Bristol, UK. New York, USA: ACM Press, 2005: 101-110.
［14］ Yi S, Huang B, Chan W T. XML Application Schema Matching Using Similarity Measure
and Relaxation Labeling［J］. Information Sciences, 2005, 169(1-2): 27–46.
［15］ Formica A. Similarity of XML-Schema Elements: A Structural and Information Content
Approach［J］. The Computer Journal, 2008, 51(2):240-254.
［16］ Duta A C, Barker K, Alhajj R. RA: An XML Schema Reduction Algorithm［C］. In: Proceedings of ADBIS. 2006.
［17］ Thang H Q, Nam V S. XML Schema Automatic Matching Solution［J］. International
Journal of Computer Systems Science and Engineering, 2008, 4(1): 68-74.
［18］ Do H H, Rahm E. COMA: A System for Flexible Combination of Schema Matching
Approaches［C］. In: Proceedings of the 28th VLDB Conference, Hong Kong,China. 2002: 610-621.

[1]	韩辉, 刘秀文. 海事适任评估中主观题自动评分技术研究^*[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[2]	刘文斌, 何彦青, 吴振峰, 董诚. 基于BERT和多相似度融合的句子对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[3]	董美,常志军,张润杰. 一种面向科技文献元数据增量数据规范的多模式匹配算法^*[J]. 数据分析与知识发现, 2021, 5(6): 135-144.
[4]	向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 ^*[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[5]	闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 ^*[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[6]	吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述^*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[7]	吴彦文, 蔡秋亭, 刘智, 邓云泽. 融合多源数据和场景相似度计算的数字资源推荐研究^*[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[8]	盛嘉祺, 许鑫. 融合主题相似度与合著网络的学者标签扩展方法研究*[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[9]	徐以聪,田学东,李新福,杨芳,史青宣. 基于犹豫模糊权重的数学表达式检索 ^*[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[10]	苏庆,陈思兆,吴伟民,李小妹,黄佃宽. 基于学习情况协同过滤算法的个性化学习推荐模型研究^*[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[11]	刘萍,彭小芳. 基于形式概念分析的词汇相似度计算^*[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
[12]	高原,施元磊,张蕾,曹天奕,冯筠. 基于游记文本的游客游览行程重构^*[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[13]	李家全,李宝安,游新冬,吕学强. 基于专利知识图谱的专利术语相似度计算研究^*[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[14]	俞琰,陈磊,姜金德,赵乃瑄. 结合词向量和统计特征的专利相似度测量方法 ^*[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[15]	关鹏,王曰芬,傅柱. *基于LDA的主题语义演化分析方法研究 ^ ——以锂离子电池领域为例**[J]. 数据分析与知识发现, 2019, 3(7): 61-72.

Viewed

Full text

Abstract

Cited

Shared

Discussed