[Objective] This paper aims to solve the problem of most existing cross-modal hashing methods, which only consider inter-modal similarity and need to fully utilize label semantic information, thereby ignoring heterogeneous data details and leading to the loss of semantic information. [Methods] Firstly, we used Euclidean distance and Tanimoto coefficient to measure the intra-modal similarity of data from images and texts, respectively. Then, we used the weighted values of the two to measure the inter-modal similarity to fully utilize the detailed information of heterogeneous data. Next, we preserved the semantic information of data labels to improve the discriminability of the hash codes and prevent the loss of semantic information. Finally, we calculated the quantization loss of the generated hash codes and imposed the hash bit balance constraint to further improve the quality of the hash codes. [Results] Compared with 11 existing methods,the mAP score was increased by 9.5% and 5.8% in the Chinese image retrieval by text and text retrieval by image tasks of the MIR-Flickr25k dataset and by 4.7% and 1.1% on the NUS-WIDE dataset. [Limitations] The model training depends on label information, and its performance may decrease in unsupervised and semi-supervised situations. [Conclusions] The proposed method can preserve the detailed information of heterogeneous data and prevent the loss of semantic information, effectively improving the retrieval performance.
李天煜, 刘立波. 基于模态内相似性与语义保留的深度跨模态哈希*[J]. 数据分析与知识发现, 2023, 7(5): 105-115.
Li Tianyu, Liu Libo. Deep Cross-modal Hashing Based on Intra-modal Similarity and Semantic Preservation. Data Analysis and Knowledge Discovery, 2023, 7(5): 105-115.
(Zhu Lu, Deng Fang, Liu Kun, et al. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.)
(Zhu Lu, Tian Xiaomeng, Cao Sainan, et al. Subspace Cross-modal Retrieval Based on High-Order Semantic Correlation[J]. Data Analysis and Knowledge Discovery, 2020, 4(5): 84-91.)
[3]
Yang C, Deng Z Y, Li T Y, et al. Variational Deep Representation Learning for Cross-Modal Retrieval[C]// Proceedings of the 4th Chinese Conference on Pattern Recognition and Computer Vision. Berlin, Heidelberg: Springer, 2021: 498-510.
[4]
Zhou J, Ding G G, Guo Y C. Latent Semantic Sparse Hashing for Cross-Modal Similarity Search[C]// Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: ACM, 2014: 415-424.
[5]
Lin Z J, Ding G G, Hu M Q, et al. Semantics-preserving Hashing for Cross-view Retrieval[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3864-3872.
[6]
Ding G G, Guo Y C, Zhou J L. Collective Matrix Factorization Hashing for Multimodal Data[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 2075-2082.
[7]
Liong V E, Lu J W, Duan L Y, et al. Deep Variational and Structural Hashing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(3): 580-595.
[8]
Zhang J, Peng Y X, Yuan M K. Unsupervised Generative Adversarial Cross-modal Hashing[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2018.
[9]
Su S P, Zhong Z S, Zhang C. Deep Joint-Semantics Reconstructing Hashing for Large-scale Unsupervised Cross-Modal Retrieval[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 3027-3035.
[10]
Jiang Q Y, Li W J. Deep Cross-Modal Hashing[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3232-3240.
[11]
Wang X Z, Zou X T, Bakker E M, et al. Self-constraining and Attention-based Hashing Network for Bit-scalable Cross-modal Retrieval[J]. Neurocomputing, 2020, 400: 255-271.
[12]
张美佳. 基于相关性分析和结构保持的跨模态检索研究[D]. 山东: 山东师范大学, 2020.
[12]
(Zhang Meijia. Cross-Modal Retrieval Research Based on Correlation Analysis and Structure Preserving[D]. Shandong: Shandong Normal University, 2020.)
[13]
Prasetya D D, Wibawa A, Hirashima T. The Performance of Text Similarity Algorithms[J]. International Journal of Advances in Intelligent Informatics, 2018, 4(1): 63-69.
[14]
Hu M Q, Yang Y, Shen F M, et al. Collective Reconstructive Embeddings for Cross-Modal Hashing[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2770-2784.
[15]
Kryszkiewicz M. Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search Among Real Valued Vectors[J]. Fundamenta Informaticae, 2013, 127(1-4): 307-323.
[16]
Huiskes M J, Lew M S. The MIR Flickr Retrieval Evaluation[C]// Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. New York: ACM, 2008: 39-43.
[17]
Chua T S, Tang J H, Hong R C, et al. NUS-WIDE: A Real-World Web Image Database from National University of Singapore[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. New York: ACM, 2009: 1-9.
[18]
Long M S, Cao Y, Wang J M, et al. Composite Correlation Quantization for Efficient Multimodal Retrieval[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2016: 579-588.
[19]
Li K, Qi G J, Ye J, et al. Linear Subspace Ranking Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(9): 1825-1838.
[20]
Liu L C, Yang Y, Hu M Q, et al. Index and Retrieve Multimedia Data: Cross-Modal Hashing by Learning Subspace Relation[C]// Proceedings of International Conference on Database Systems for Advanced Applications. Berlin, Heidelberg: Springer, 2018: 606-621.
[21]
Li C X, Chen Z D, Zhang P F, et al. SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval[C]// Proceedings of the 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1-9.
[22]
Li J Z. Deep Semantic Cross Modal Hashing Based on Graph Similarity of Modal-Specific[J]. IEEE Access, 2021, 9: 96064-96075.
[23]
Liu H, Ji R R, Wu Y J, et al. Cross-Modality Binary Code Learning via Fusion Similarity Hashing[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 7380-7388.
[24]
Yang E K, Deng C, Liu W, et al. Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2017.
[25]
Jin L, Li Z C, Tang J H. Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals[J]. IEEE Transactions on Neural Networks and Learning Systems. DOI:10.1109/TNNLS.2020.2997020.