Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (12): 110-122    DOI: 10.11925/infotech.2096-3467.2021.0604
Current Issue | Archive | Adv Search |
Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning
Zhu Lu1,Deng Fang1(),Liu Kun1,He Tingting2,Liu Yuanyuan1
1School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
2External Liaison Office, East China Jiaotong University, Nanchang 330013, China
Download: PDF (4120 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper uses semantic auto-encoder to examine the correlation between low-level features and high-level semantics, aiming to reduce the heterogeneous gap between different modal data. It also combines semantic auto-encoder and hash learning to improve the accuracy and speed of cross-modal retrieval. [Methods] First, we used the label information to learn the semantic joint representation of features and to construct an affine matrix. Then, we combined the auto-encoder with linear regression to learn hash function. Finally, we got the optimal hash code with the help of similarity metrics. [Results] We examined our method with three open datasets of WIKI, MIRFLICKR and NUS-WIDE for four different code lengths. The average MAP value obtained by our method is 0.1135, 0.0278 and 0.0505 higher than the best results of LSSH, FSH, ACQ, DBRC, SPDH, SePH and SMH. [Limitations] Our method is mainly applicable to the linear projection of multi-modal data. However, it fails to achieve good results for nonlinear issues. [Conclusions] The proposed method effectively improves the accuracy and speed of cross-modal retrieval tasks.

Key wordsCross-Modal Retrieval      Auto-Encoder      Hash Learning      Multi-Modal     
Received: 20 June 2021      Published: 20 January 2022
ZTFLH:  TP393  
Fund:Ministry of Education Humanities and Social Sciences Research Planning Fund Project(18Y JAZH150)
Corresponding Authors: Deng Fang,ORCID:0000-0003-3973-9358     E-mail: dfzoe2026@outlook.com

Cite this article:

Zhu Lu, Deng Fang, Liu Kun, He Tingting, Liu Yuanyuan. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0604     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I12/110

The Model of Cross-Modal Retrieval
The Framework of Cross-Modal Retrieval Based on Semantic Auto-Encoder Hashing Learning
方法 图像检索文本 文本检索图像 检索平均值
16bit 32bit 64bit 128bit 16bit 32bit 64bit 128bit
LSSH
FSH
ACQ
DBRC
SePH
SPDH
SMH
本文
0.210 1
0.223 5
0.267 2
0.253 4
0.283 6
0.259 2
0.230 1
0.476 1
0.214 5
0.231 6
0.268 9
0.264 8
0.285 9
0.261 2
0.237 5
0.472 5
0.216 6
0.240 8
0.285 2
0.268 6
0.287 9
0.276 5
0.246 4
0.488 9
0.209 2
0.247 4
0.267 9
0.287 8
0.286 3
0.284 3
0.245 8
0.486 6
0.503 1
0.480 5
0.546 4
0.543 9
0.534 5
0.420 9
0.328 4
0.572 5
0.522 1
0.480 4
0.561 2
0.537 7
0.535 1
0.440 4
0.359 2
0.574 1
0.528 4
0.512 7
0.559 9
0.547 6
0.547 1
0.458 4
0.361 2
0.578 6
0.532 5
0.518 2
0.560 5
0.548 8
0.550 6
0.462 7
0.369 1
0.575 6
0.367 0
0.366 9
0.414 6
0.406 5
0.413 8
0.357 9
0.297 1
0.528 1
MAP in Different Methods on WIKI Dataset
Performance of All Methods on WIKI Dataset with Different Hash Code Lengths
方法 图像检索文本 文本检索图像 检索平均值
16bit 32bit 64bit 128bit 16bit 32bit 64bit 128bit
LSSH
FSH
ACQ
DBRC
SePH
SPDH
SMH
本文
0.578 4
0.589 3
0.609 3
0.587 3
0.657 3
0.686 9
0.600 2
0.697 1
0.580 4
0.602 7
0.619 7
0.589 8
0.660 3
0.686 7
0.613 0
0.699 2
0.579 7
0.600 6
0.600 2
0.590 2
0.661 6
0.694 8
0.620 9
0.696 8
0.581 6
0.602 2
0.586 7
0.590 7
0.663 7
0.690 6
0.625 1
0.699 8
0.589 8
0.586 5
0.614 5
0.588 3
0.648 1
0.681 9
0.620 5
0.730 4
0.592 7
0.597 0
0.611 1
0.596 3
0.652 1
0.689 6
0.638 7
0.734 5
0.593 2
0.596 5
0.601 3
0.596 2
0.654 5
0.686 5
0.642 1
0.730 7
0.593 2
0.596 9
0.586 5
0.597 5
0.653 4
0.684 4
0.647 6
0.734 9
0.586 1
0.596 4
0.603 6
0.592 0
0.656 3
0.687 6
0.626 0
0.715 4
MAP in Different Methods on MIRFLICKR Dataset
Performance of All Methods on MIRFLICKR Dataset with Different Hash Code Lengths
方法 图像检索文本 文本检索图像 检索平均值
16bit 32bit 64bit 128bit 16bit 32bit 64bit 128bit
LSSH
FSH
ACQ
DBRC
SePH
SPDH
SMH
本文
0.484 7
0.492 7
0.467 7
0.393 9
0.478 7
0.583 4
0.515 1
0.650 1
0.494 9
0.498 6
0.455 4
0.408 7
0.486 9
0.601 7
0.527 7
0.649 0
0.508 6
0.501 5
0.441 7
0.416 6
0.488 8
0.593 0
0.597 4
0.650 3
0.532 2
0.505 7
0.432 6
0.416 5
0.493 2
0.591 7
0.549 0
0.651 2
0.589 7
0.475 1
0.513 0
0.424 9
0.448 9
0.612 1
0.590 2
0.640 7
0.617 5
0.478 5
0.485 1
0.429 4
0.453 9
0.591 6
0.613 3
0.642 7
0.625 2
0.482 2
0.477 6
0.438 1
0.458 7
0.598 1
0.640 1
0.643 4
0.637 3
0.487 9
0.469 1
0.442 7
0.462 1
0.597 2
0.647 6
0.644 7
0.561 5
0.490 2
0.467 7
0.421 3
0.471 4
0.596 0
0.585 0
0.646 5
MAP in Different Methods on NUS-WIDE Dataset
Performance of All Methods on NUS-WIDE Dataset with Different Hash Code Lengths
Convergence Curves of the Objective Function Value
[1] Song Y L, Soleymani M. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1979-1988.
[2] Gu J X, Cai J F, Joty S, et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models [C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7181-7189.
[3] Ning H L, Zheng X T, Yuan Y, et al. Audio Description from Image by Modal Translation Network[J]. Neurocomputing, 2021, 423: 124-134.
doi: 10.1016/j.neucom.2020.10.053
[4] Carvalho M, Cadène R, Picard D, et al. Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 35-44.
[5] Wang B K, Yang Y, Xu X, et al. Adversarial Cross-Modal Retrieval [C]//Proceedings of the 25th ACM International Conference on Multimedia. 2017: 154-162.
[6] Peng Y X, Huang X, Zhao Y Z. An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385.
doi: 10.1109/TCSVT.2017.2705068
[7] Peng Y X, Zhai X Z, Zhao Y Z, et al. Semi-Supervised Cross-Media Feature Learning with Unified Patch Graph Regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(3): 583-596.
doi: 10.1109/TCSVT.2015.2400779
[8] Yang E, Deng C, Liu W, et al. Pairwise Relationship Guided Deep Hashing for Cross-modal Retrieval [C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017:1618-1625.
[9] Zhang D Q, Li W J. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization [C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 2177-2183.
[10] Mandal D, Chaudhury K N, Biswas S. Generalized Semantic Preserving Hashing for n-Label Cross-Modal Retrieval [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2633-2641.
[11] Zhou J L, Ding G G, Guo Y C. Latent Semantic Sparse Hashing for Cross-Modal Similarity Search [C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2014: 415-424.
[12] Irie G, Arai H, Taniguchi Y. Alternating Co-Quantization for Cross-Modal Hashing [C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 1886-1894.
[13] Ding G G, Guo Y C, Zhou J L. Collective Matrix Factorization Hashing for Multimodal Data [C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014: 2083-2090.
[14] Su S P, Zhong Z S, Zhang C. Deep Joint-semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval [C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 2019: 3027-3035.
[15] Hu D, Nie F P, Li X L. Deep Binary Reconstruction for Cross-modal Hashing[J]. IEEE Transactions on Multimedia, 2019, 21(4): 973-985.
doi: 10.1109/TMM.2018.2866771
[16] Zhang J, Peng Y X, Yuan M K. Unsupervised Generative Adversarial Cross-modal Hashing[OL]. arXiv Preprint, arXiv: 1712.00358.
[17] Shen X B, Shen F M, Sun Q S, et al. Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval[J]. IEEE Transactions on Cybernetics, 2017, 47(12): 4275-4288.
doi: 10.1109/TCYB.2016.2606441
[18] Zhang P F, Li Y, Huang Z, et al. Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval[J]. IEEE Transactions on Multimedia, DOI: 10.1109/TMM.2021.3053766.
doi: 10.1109/TMM.2021.3053766
[19] Li C, Deng C, Li N, et al. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval [C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 4242-4251.
[20] Lin Z J, Ding G G, Hu M Q, et al. Semantics-Preserving Hashing for Cross-View Retrieval [C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3864-3872.
[21] Jiang Q Y, Li W J. Deep Cross-Modal Hashing [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3270-3278.
[22] Nie X S, Wang B W, Li J J, et al. Deep Multiscale Fusion Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(1): 401-410.
doi: 10.1109/TCSVT.76
[23] Liu X, Cheung Y M, Hu Z K, et al. Adversarial Tri-fusion Hashing Network for Imbalanced Cross-Modal Retrieval[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2021, 5(4): 607-619.
doi: 10.1109/TETCI.2020.3007143
[24] Meng M, Wang H T, Yu J, et al. Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Image Processing, 2021, 30: 986-1000.
doi: 10.1109/TIP.83
[25] Guo J, Zhu W W. Collective Affinity Learning for Partial Cross-Modal Hashing[J]. IEEE Transactions on Image Processing, 2020, 29: 1344-1355.
doi: 10.1109/TIP.83
[26] Xu H W, Feng Y, Chen J, et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications [C]//Proceedings of the 2018 World Wide Web Conference. 2018: 187-196.
[27] Pumsirirat A, Yan L. Credit Card Fraud Detection Using Deep Learning Based on Auto-Encoder and Restricted Boltzmann Machine[J]. International Journal of Advanced Computer Science and Applications, 2018, 9(1): 18-25.
[28] Xuan R C, Shim J, Lee S G. Deep Semantic Hashing Using Pairwise Labels[J]. IEEE Access, 2021, 9: 91934-91949.
doi: 10.1109/ACCESS.2021.3092150
[29] Chen J, Zu Y X. Local Feature Hashing with Binary Auto-Encoder for Face Recognition[J]. IEEE Access, 2020, 8: 37526-37540.
doi: 10.1109/Access.6287639
[30] Song J K, Zhang H W, Li X P, et al. Self-Supervised Video Hashing with Hierarchical Binary Auto-Encoder[J]. IEEE Transactions on Image Processing, 2018, 27(7): 3210-3221.
doi: 10.1109/TIP.2018.2814344
[31] Liu H, Lin M B, Zhang S C, et al. Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval [C]//Proceedings of the 26th ACM International Conference on Multimedia. 2018: 1589-1597.
[32] Wu Y L, Wang S H, Huang Q M. Multi-Modal Semantic Autoencoder for Cross-Modal Retrieval[J]. Neurocomputing, 2019, 331: 165-175.
doi: 10.1016/j.neucom.2018.11.042
[33] Chen Y N, Lin H T. Feature-Aware Label Space Dimension Reduction for Multi-Label Classification [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012: 1529-1537.
[34] Ringle C M, Sarstedt M, Mitchell R, et al. Partial Least Squares Structural Equation Modeling in HRM Research[J]. The International Journal of Human Resource Management, 2020, 31(12): 1617-1643.
doi: 10.1080/09585192.2017.1416655
[35] Chen L, Harshaw C, Hassani H, et al. Projection-free Online Optimization with Stochastic Gradient: From Convexity to Submodularity [C]//Proceedings of the 35th International Conference on Machine Learning. 2018: 814-823.
[36] Zhou X W, Zhu M L, Daniilidis K. Multi-Image Matching via Fast Alternating Minimization [C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 4032-4040.
[37] Rasiwasia N, Costa Pereira J, Coviello E, et al. A New Approach to Cross-Modal Multimedia Retrieval [C]//Proceedings of the 18th ACM International Conference on Multimedia. 2010: 251-260.
[38] Huiskes M J, Lew M S. The MIR Flickr Retrieval Evaluation [C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 2008: 39-43.
[39] Chua T S, Tang J H, Hong R C, et al. NUS-WIDE: A Real-World Web Image Database from National University of Singapore [C]//Proceedings of the ACM International Conference on Image and Video Retrieval. 2009: 1-9.
[40] Yue Y S, Finley T, Radlinski F, et al. A Support Vector Method for Optimizing Average Precision [C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007: 271-278.
[41] Liu H, Ji R R, Wu Y J, et al. Cross-Modality Binary Code Learning via Fusion Similarity Hashing [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6345-6353.
[42] Zhen Y, Gao Y, Yeung D Y, et al. Spectral Multimodal Hashing and Its Application to Multimedia Retrieval[J]. IEEE Transactions on Cybernetics, 2016, 46(1): 27-38.
doi: 10.1109/TCYB.2015.2392052 pmid: 26208374
[1] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[2] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[3] Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[4] Zhu Lu,Tian Xiaomeng,Cao Sainan,Liu Yuanyuan. Subspace Cross-modal Retrieval Based on High-Order Semantic Correlation[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn