1School of Information Engineering, East China Jiaotong University, Nanchang 330013, China 2External Liaison Office, East China Jiaotong University, Nanchang 330013, China
[Objective] This paper uses semantic auto-encoder to examine the correlation between low-level features and high-level semantics, aiming to reduce the heterogeneous gap between different modal data. It also combines semantic auto-encoder and hash learning to improve the accuracy and speed of cross-modal retrieval. [Methods] First, we used the label information to learn the semantic joint representation of features and to construct an affine matrix. Then, we combined the auto-encoder with linear regression to learn hash function. Finally, we got the optimal hash code with the help of similarity metrics. [Results] We examined our method with three open datasets of WIKI, MIRFLICKR and NUS-WIDE for four different code lengths. The average MAP value obtained by our method is 0.1135, 0.0278 and 0.0505 higher than the best results of LSSH, FSH, ACQ, DBRC, SPDH, SePH and SMH. [Limitations] Our method is mainly applicable to the linear projection of multi-modal data. However, it fails to achieve good results for nonlinear issues. [Conclusions] The proposed method effectively improves the accuracy and speed of cross-modal retrieval tasks.
朱路, 邓芳, 刘坤, 贺婷婷, 刘媛媛. 基于语义自编码哈希学习的跨模态检索方法*[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
Zhu Lu, Deng Fang, Liu Kun, He Tingting, Liu Yuanyuan. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.
Song Y L, Soleymani M. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1979-1988.
[2]
Gu J X, Cai J F, Joty S, et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models [C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7181-7189.
[3]
Ning H L, Zheng X T, Yuan Y, et al. Audio Description from Image by Modal Translation Network[J]. Neurocomputing, 2021, 423: 124-134.
doi: 10.1016/j.neucom.2020.10.053
[4]
Carvalho M, Cadène R, Picard D, et al. Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 35-44.
[5]
Wang B K, Yang Y, Xu X, et al. Adversarial Cross-Modal Retrieval [C]//Proceedings of the 25th ACM International Conference on Multimedia. 2017: 154-162.
[6]
Peng Y X, Huang X, Zhao Y Z. An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385.
doi: 10.1109/TCSVT.2017.2705068
[7]
Peng Y X, Zhai X Z, Zhao Y Z, et al. Semi-Supervised Cross-Media Feature Learning with Unified Patch Graph Regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(3): 583-596.
doi: 10.1109/TCSVT.2015.2400779
[8]
Yang E, Deng C, Liu W, et al. Pairwise Relationship Guided Deep Hashing for Cross-modal Retrieval [C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017:1618-1625.
[9]
Zhang D Q, Li W J. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization [C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 2177-2183.
[10]
Mandal D, Chaudhury K N, Biswas S. Generalized Semantic Preserving Hashing for n-Label Cross-Modal Retrieval [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2633-2641.
[11]
Zhou J L, Ding G G, Guo Y C. Latent Semantic Sparse Hashing for Cross-Modal Similarity Search [C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2014: 415-424.
[12]
Irie G, Arai H, Taniguchi Y. Alternating Co-Quantization for Cross-Modal Hashing [C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 1886-1894.
[13]
Ding G G, Guo Y C, Zhou J L. Collective Matrix Factorization Hashing for Multimodal Data [C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014: 2083-2090.
[14]
Su S P, Zhong Z S, Zhang C. Deep Joint-semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval [C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 2019: 3027-3035.
[15]
Hu D, Nie F P, Li X L. Deep Binary Reconstruction for Cross-modal Hashing[J]. IEEE Transactions on Multimedia, 2019, 21(4): 973-985.
doi: 10.1109/TMM.2018.2866771
[16]
Zhang J, Peng Y X, Yuan M K. Unsupervised Generative Adversarial Cross-modal Hashing[OL]. arXiv Preprint, arXiv: 1712.00358.
[17]
Shen X B, Shen F M, Sun Q S, et al. Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval[J]. IEEE Transactions on Cybernetics, 2017, 47(12): 4275-4288.
doi: 10.1109/TCYB.2016.2606441
[18]
Zhang P F, Li Y, Huang Z, et al. Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval[J]. IEEE Transactions on Multimedia, DOI: 10.1109/TMM.2021.3053766.
doi: 10.1109/TMM.2021.3053766
[19]
Li C, Deng C, Li N, et al. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval [C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 4242-4251.
[20]
Lin Z J, Ding G G, Hu M Q, et al. Semantics-Preserving Hashing for Cross-View Retrieval [C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3864-3872.
[21]
Jiang Q Y, Li W J. Deep Cross-Modal Hashing [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3270-3278.
[22]
Nie X S, Wang B W, Li J J, et al. Deep Multiscale Fusion Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(1): 401-410.
doi: 10.1109/TCSVT.76
[23]
Liu X, Cheung Y M, Hu Z K, et al. Adversarial Tri-fusion Hashing Network for Imbalanced Cross-Modal Retrieval[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2021, 5(4): 607-619.
doi: 10.1109/TETCI.2020.3007143
[24]
Meng M, Wang H T, Yu J, et al. Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Image Processing, 2021, 30: 986-1000.
doi: 10.1109/TIP.83
[25]
Guo J, Zhu W W. Collective Affinity Learning for Partial Cross-Modal Hashing[J]. IEEE Transactions on Image Processing, 2020, 29: 1344-1355.
doi: 10.1109/TIP.83
[26]
Xu H W, Feng Y, Chen J, et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications [C]//Proceedings of the 2018 World Wide Web Conference. 2018: 187-196.
[27]
Pumsirirat A, Yan L. Credit Card Fraud Detection Using Deep Learning Based on Auto-Encoder and Restricted Boltzmann Machine[J]. International Journal of Advanced Computer Science and Applications, 2018, 9(1): 18-25.
[28]
Xuan R C, Shim J, Lee S G. Deep Semantic Hashing Using Pairwise Labels[J]. IEEE Access, 2021, 9: 91934-91949.
doi: 10.1109/ACCESS.2021.3092150
[29]
Chen J, Zu Y X. Local Feature Hashing with Binary Auto-Encoder for Face Recognition[J]. IEEE Access, 2020, 8: 37526-37540.
doi: 10.1109/Access.6287639
[30]
Song J K, Zhang H W, Li X P, et al. Self-Supervised Video Hashing with Hierarchical Binary Auto-Encoder[J]. IEEE Transactions on Image Processing, 2018, 27(7): 3210-3221.
doi: 10.1109/TIP.2018.2814344
[31]
Liu H, Lin M B, Zhang S C, et al. Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval [C]//Proceedings of the 26th ACM International Conference on Multimedia. 2018: 1589-1597.
[32]
Wu Y L, Wang S H, Huang Q M. Multi-Modal Semantic Autoencoder for Cross-Modal Retrieval[J]. Neurocomputing, 2019, 331: 165-175.
doi: 10.1016/j.neucom.2018.11.042
[33]
Chen Y N, Lin H T. Feature-Aware Label Space Dimension Reduction for Multi-Label Classification [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012: 1529-1537.
[34]
Ringle C M, Sarstedt M, Mitchell R, et al. Partial Least Squares Structural Equation Modeling in HRM Research[J]. The International Journal of Human Resource Management, 2020, 31(12): 1617-1643.
doi: 10.1080/09585192.2017.1416655
[35]
Chen L, Harshaw C, Hassani H, et al. Projection-free Online Optimization with Stochastic Gradient: From Convexity to Submodularity [C]//Proceedings of the 35th International Conference on Machine Learning. 2018: 814-823.
[36]
Zhou X W, Zhu M L, Daniilidis K. Multi-Image Matching via Fast Alternating Minimization [C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 4032-4040.
[37]
Rasiwasia N, Costa Pereira J, Coviello E, et al. A New Approach to Cross-Modal Multimedia Retrieval [C]//Proceedings of the 18th ACM International Conference on Multimedia. 2010: 251-260.
[38]
Huiskes M J, Lew M S. The MIR Flickr Retrieval Evaluation [C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 2008: 39-43.
[39]
Chua T S, Tang J H, Hong R C, et al. NUS-WIDE: A Real-World Web Image Database from National University of Singapore [C]//Proceedings of the ACM International Conference on Image and Video Retrieval. 2009: 1-9.
[40]
Yue Y S, Finley T, Radlinski F, et al. A Support Vector Method for Optimizing Average Precision [C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007: 271-278.
[41]
Liu H, Ji R R, Wu Y J, et al. Cross-Modality Binary Code Learning via Fusion Similarity Hashing [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6345-6353.
[42]
Zhen Y, Gao Y, Yeung D Y, et al. Spectral Multimodal Hashing and Its Application to Multimedia Retrieval[J]. IEEE Transactions on Cybernetics, 2016, 46(1): 27-38.
doi: 10.1109/TCYB.2015.2392052
pmid: 26208374