Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (5): 105-115     https://doi.org/10.11925/infotech.2096-3467.2022.0536
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于模态内相似性与语义保留的深度跨模态哈希*
李天煜,刘立波()
宁夏大学信息工程学院 银川 750021
Deep Cross-modal Hashing Based on Intra-modal Similarity and Semantic Preservation
Li Tianyu,Liu Libo()
School of Information Engineering, Ningxia University, Yinchuan 750021, China
全文: PDF (8383 KB)   HTML ( 8
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 解决现有大多数跨模态哈希方法在相似性度量时仅考虑模态间相似性,且无法充分利用标签语义信息,从而忽略了异构数据细节并导致语义信息丢失的问题。【方法】 首先对来自图像和文本的数据分别采用欧氏距离和谷本系数度量其模态内相似性;接着采用二者加权值度量模态间相似性以充分利用异构数据细节信息;之后通过保留数据标签的语义信息来提高哈希码的判别性,防止语义信息丢失;最后,对生成的哈希码计算量化损失并施加哈希位平衡约束,进一步提升哈希码质量。【结果】 与11种方法进行对比,在MIR-Flickr25k数据集中文检图和图检文任务上哈希码长度为64位时,mAP最高分别提升了9.5和5.8个百分点,在NUS-WIDE数据集中则最高分别提升了4.7和1.1个百分点。【局限】 模型训练时依赖标签信息,在无监督和半监督情况下性能下降。【结论】 所提方法能保留异构数据细节信息并防止语义信息丢失,有效提升了模型检索性能。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李天煜
刘立波
关键词 跨模态检索跨模态哈希模态内相似性保留语义信息保留    
Abstract

[Objective] This paper aims to solve the problem of most existing cross-modal hashing methods, which only consider inter-modal similarity and need to fully utilize label semantic information, thereby ignoring heterogeneous data details and leading to the loss of semantic information. [Methods] Firstly, we used Euclidean distance and Tanimoto coefficient to measure the intra-modal similarity of data from images and texts, respectively. Then, we used the weighted values of the two to measure the inter-modal similarity to fully utilize the detailed information of heterogeneous data. Next, we preserved the semantic information of data labels to improve the discriminability of the hash codes and prevent the loss of semantic information. Finally, we calculated the quantization loss of the generated hash codes and imposed the hash bit balance constraint to further improve the quality of the hash codes. [Results] Compared with 11 existing methods,the mAP score was increased by 9.5% and 5.8% in the Chinese image retrieval by text and text retrieval by image tasks of the MIR-Flickr25k dataset and by 4.7% and 1.1% on the NUS-WIDE dataset. [Limitations] The model training depends on label information, and its performance may decrease in unsupervised and semi-supervised situations. [Conclusions] The proposed method can preserve the detailed information of heterogeneous data and prevent the loss of semantic information, effectively improving the retrieval performance.

Key wordsCross-modal Retrieval    Cross-modal Hashing    Intra-modal Similarity Preservation    Semantic Preservation
收稿日期: 2022-05-26      出版日期: 2023-07-04
ZTFLH:  TP391  
基金资助:*国家自然科学基金(62262053);宁夏科技创新领军人才项目(2022GKLRLX03);宁夏大学研究生创新项目的研究成果之一(CXXM202256)
通讯作者: 刘立波,ORCID:0000-0003-0486-7501,E-mail:liulib@163.com。   
引用本文:   
李天煜, 刘立波. 基于模态内相似性与语义保留的深度跨模态哈希*[J]. 数据分析与知识发现, 2023, 7(5): 105-115.
Li Tianyu, Liu Libo. Deep Cross-modal Hashing Based on Intra-modal Similarity and Semantic Preservation. Data Analysis and Knowledge Discovery, 2023, 7(5): 105-115.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0536      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I5/105
Fig.1  图文跨模态哈希检索方法原理
网络层 网络结构
Conv1 F. 64×11×11; St. 4×4, Pad 0, LRN, 2 Pool
Conv2 F. 265×5×5; St. 1×1, Pad 2, LRN, 2 Pool
Conv3 F. 265×3×3; St. 1×1, Pad 1
Conv4 F. 265×3×3; St. 1×1, Pad 1
Conv5 F. 265×3×3; St. 1×1, Pad 1, 2 Pool
Full6 4 096
Full7 4 096
Full8 哈希码长度 k
Full9 类别预测码长度 c
Table 1  图像特征提取网络架构
网络层 网络结构
Full1 4 096
Full2 4 096
Full3 哈希码长度 k
Full4 类别预测码长度 c
Table 2  文本特征提取网络架构
Fig.2  网络整体框架
Fig.3  模态内和模态间相似性度量方法
ε 16位 32位 64位
I2T T2I I2T T2I I2T T2I
0.25 0.802 5 0.799 3 0.812 3 0.810 3 0.831 2 0.821 0
0.50 0.804 4 0.802 0 0.815 4 0.815 6 0.834 9 0.824 1
0.75 0.804 7 0.803 7 0.817 9 0.814 7 0.838 1 0.826 9
Table 3  MIR-Flickr25k数据集上 ε取不同值时的mAP值
ε 16位 32位 64位
I2T T2I I2T T2I I2T T2I
0.25 0.679 1 0.675 7 0.697 0 0.695 1 0.708 8 0.703 4
0.50 0.676 4 0.675 5 0.695 5 0.693 8 0.707 3 0.704 2
0.75 0.675 3 0.674 2 0.696 2 0.693 1 0.705 7 0.703 8
Table 4  NUS-WIDE数据集上 ε取不同值时的mAP值
任务 方法 MIR-Flickr25k NUS-WIDE
16位 32位 64位 16位 32位 64位
I2T CCQ 0.563 3 0.557 6 0.556 8 0.357 4 0.361 0 0.364 2
CMFH 0.575 2 05 793 0.578 1 0.372 2 0.378 1 0.378 9
LSRH 0.736 7 0.747 8 0.765 9 0.633 8 0.649 7 0.667 2
SePH 0.720 4 0.724 2 0.734 3 0.586 9 0.591 3 0.592 7
SRLCH 0.624 3 0.630 7 0.635 9 0.356 8 0.358 9 0.361 6
SCRATCH 0.646 1 0.649 8 0.653 7 0.552 3 0.563 1 0.568 2
DCMH 0.705 1 0.736 4 0.722 1 0.592 3 0.613 1 0.628 7
DCMHGMS 0.723 4 0.745 3 0.746 7 0.661 3 0.674 4 0.676 8
FSH 0.572 4 0.583 3 0.579 3 0.374 2 0.377 5 0.381 9
PRDH 0.695 5 0.702 3 0.713 7 0.592 1 0.603 5 0.611 2
DMSSP 0.727 4 0.748 8 0.744 9 0.656 3 0.674 4 0.677 0
本文方法 0.804 9 0.818 2 0.838 5 0.679 5 0.697 2 0.708 5
T2I CCQ 0.562 4 0.559 7 0.558 4 0.353 3 0.356 3 0.358 2
CMFH 0.567 3 0.569 0 0.570 2 0.371 4 0.374 5 0.375 7
LSRH 0.743 6 0.754 9 0.774 8 0.569 7 0.573 3 0.578 7
SePH 0.719 3 0.724 8 0.730 1 0.582 3 0.586 6 0.588 3
SRLCH 0.632 3 0.638 7 0.634 7 0.349 8 0.352 2 0.358 1
SCRATCH 0.637 9 0.643 9 0.649 5 0.557 3 0.562 8 0.566 4
DCMH 0.733 1 0.748 3 0.739 3 0.586 5 0.608 3 0.615 0
DCMHGMS 0.752 4 0.780 3 0.779 3 0.678 2 0.691 3 0.695 4
FSH 0.572 7 0.584 2 0.584 9 0.369 3 0.371 1 0.375 4
PRDH 0.734 4 0.742 5 0.759 3 0.600 8 0.609 3 0.621 1
DMSSP 0.761 2 0.780 7 0.777 1 0.677 9 0.691 0 0.697 2
本文方法 0.803 7 0.815 9 0.824 7 0.681 3 0.695 8 0.704 7
Table 5  各方法在两个数据集上的mAP值对比
任务 方法 MIRFlickr25k NUS-WIDE
16位 32位 64位 16位 32位 64位
I2T 方法1 0.790 2 0.799 3 0.815 2 0.661 7 0.684 2 0.691 5
方法2 0.795 8 0.806 4 0.819 3 0.668 3 0.689 3 0.700 2
方法3 0.794 6 0.805 5 0.817 7 0.664 7 0.687 7 0.697 1
方法4 0.795 1 0.806 0 0.818 5 0.665 2 0.690 1 0.698 8
方法5 0.795 3 0.807 5 0.820 7 0.668 3 0.689 7 0.697 7
本文方法 0.804 9 0.818 2 0.838 5 0.679 5 0.697 2 0.708 5
T2I 方法1 0.791 1 0.803 2 0.814 4 0.668 4 0.681 4 0.693 4
方法2 0.796 1 0.806 3 0.815 1 0.672 7 0.687 8 0.698 2
方法3 0.794 8 0.805 1 0.816 2 0.671 4 0.685 2 0.696 5
方法4 0.795 4 0.805 9 0.816 7 0.670 2 0.686 7 0.697 7
方法5 0.793 2 0.805 4 0.812 8 0.673 2 0.684 8 0.696 9
本文方法 0.803 7 0.815 9 0.824 7 0.681 3 0.695 8 0.704 7
Table 6  本文方法各部分对性能的影响
Fig.4  NUS-WIDE数据集上图像检索文本定性实验结果
Fig.5  NUS-WIDE数据集上文本检索图像定性分析实验结果
[1] 朱路, 邓芳, 刘坤, 等. 基于语义自编码哈希学习的跨模态检索方法[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
[1] (Zhu Lu, Deng Fang, Liu Kun, et al. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.)
[2] 朱路, 田晓梦, 曹赛男, 等. 基于高阶语义相关的子空间跨模态检索方法研究[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
[2] (Zhu Lu, Tian Xiaomeng, Cao Sainan, et al. Subspace Cross-modal Retrieval Based on High-Order Semantic Correlation[J]. Data Analysis and Knowledge Discovery, 2020, 4(5): 84-91.)
[3] Yang C, Deng Z Y, Li T Y, et al. Variational Deep Representation Learning for Cross-Modal Retrieval[C]// Proceedings of the 4th Chinese Conference on Pattern Recognition and Computer Vision. Berlin, Heidelberg: Springer, 2021: 498-510.
[4] Zhou J, Ding G G, Guo Y C. Latent Semantic Sparse Hashing for Cross-Modal Similarity Search[C]// Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: ACM, 2014: 415-424.
[5] Lin Z J, Ding G G, Hu M Q, et al. Semantics-preserving Hashing for Cross-view Retrieval[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3864-3872.
[6] Ding G G, Guo Y C, Zhou J L. Collective Matrix Factorization Hashing for Multimodal Data[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 2075-2082.
[7] Liong V E, Lu J W, Duan L Y, et al. Deep Variational and Structural Hashing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(3): 580-595.
[8] Zhang J, Peng Y X, Yuan M K. Unsupervised Generative Adversarial Cross-modal Hashing[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018.
[9] Su S P, Zhong Z S, Zhang C. Deep Joint-Semantics Reconstructing Hashing for Large-scale Unsupervised Cross-Modal Retrieval[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3027-3035.
[10] Jiang Q Y, Li W J. Deep Cross-Modal Hashing[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3232-3240.
[11] Wang X Z, Zou X T, Bakker E M, et al. Self-constraining and Attention-based Hashing Network for Bit-scalable Cross-modal Retrieval[J]. Neurocomputing, 2020, 400: 255-271.
[12] 张美佳. 基于相关性分析和结构保持的跨模态检索研究[D]. 山东: 山东师范大学, 2020.
[12] (Zhang Meijia. Cross-Modal Retrieval Research Based on Correlation Analysis and Structure Preserving[D]. Shandong: Shandong Normal University, 2020.)
[13] Prasetya D D, Wibawa A, Hirashima T. The Performance of Text Similarity Algorithms[J]. International Journal of Advances in Intelligent Informatics, 2018, 4(1): 63-69.
[14] Hu M Q, Yang Y, Shen F M, et al. Collective Reconstructive Embeddings for Cross-Modal Hashing[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2770-2784.
[15] Kryszkiewicz M. Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search Among Real Valued Vectors[J]. Fundamenta Informaticae, 2013, 127(1-4): 307-323.
[16] Huiskes M J, Lew M S. The MIR Flickr Retrieval Evaluation[C]// Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. New York: ACM, 2008: 39-43.
[17] Chua T S, Tang J H, Hong R C, et al. NUS-WIDE: A Real-World Web Image Database from National University of Singapore[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. New York: ACM, 2009: 1-9.
[18] Long M S, Cao Y, Wang J M, et al. Composite Correlation Quantization for Efficient Multimodal Retrieval[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2016: 579-588.
[19] Li K, Qi G J, Ye J, et al. Linear Subspace Ranking Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(9): 1825-1838.
[20] Liu L C, Yang Y, Hu M Q, et al. Index and Retrieve Multimedia Data: Cross-Modal Hashing by Learning Subspace Relation[C]// Proceedings of International Conference on Database Systems for Advanced Applications. Berlin, Heidelberg: Springer, 2018: 606-621.
[21] Li C X, Chen Z D, Zhang P F, et al. SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval[C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1-9.
[22] Li J Z. Deep Semantic Cross Modal Hashing Based on Graph Similarity of Modal-Specific[J]. IEEE Access, 2021, 9: 96064-96075.
[23] Liu H, Ji R R, Wu Y J, et al. Cross-Modality Binary Code Learning via Fusion Similarity Hashing[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 7380-7388.
[24] Yang E K, Deng C, Liu W, et al. Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2017.
[25] Jin L, Li Z C, Tang J H. Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals[J]. IEEE Transactions on Neural Networks and Learning Systems. DOI:10.1109/TNNLS.2020.2997020.
[1] 朱路, 邓芳, 刘坤, 贺婷婷, 刘媛媛. 基于语义自编码哈希学习的跨模态检索方法*[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
[2] 朱路,田晓梦,曹赛男,刘媛媛. 基于高阶语义相关的子空间跨模态检索方法研究*[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn