Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (12): 110-122     https://doi.org/10.11925/infotech.2096-3467.2021.0604
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于语义自编码哈希学习的跨模态检索方法*
朱路1,邓芳1(),刘坤1,贺婷婷2,刘媛媛1
1华东交通大学信息工程学院 南昌 330013
2华东交通大学对外联络处 南昌 330013
Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning
Zhu Lu1,Deng Fang1(),Liu Kun1,He Tingting2,Liu Yuanyuan1
1School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
2External Liaison Office, East China Jiaotong University, Nanchang 330013, China
全文: PDF (4120 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 通过语义自编码器挖掘底层特征和高层语义之间的相关性,缩小不同模态数据之间的异构鸿沟,并结合哈希学习提高跨模态检索的精度和速率。【方法】 利用语义标签信息学习特征语义联合表示,构造语义仿射矩阵,结合自编码器和线性回归学习哈希函数,通过相似性度量获得最优的哈希码。【结果】 在WIKI、MIRFLICKR、NUS-WIDE三个公开数据集上进行验证,所提方法在4种不同码长下的平均MAP值较LSSH、FSH、ACQ、DBRC、SPDH、SePH、SMH中的最高值分别提高0.113 5、0.027 8、0.050 5。【局限】 所提方法主要适用于对多种模态数据进行线性投影,对于非线性问题未能取得较好的效果。【结论】 所提方法可以缩小多模态数据之间的异构鸿沟,将不同模态相似数据转化为相同的哈希码,有效提高了跨模态检索的精度和速率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
朱路
邓芳
刘坤
贺婷婷
刘媛媛
关键词 跨模态检索自编码器哈希学习多模态    
Abstract

[Objective] This paper uses semantic auto-encoder to examine the correlation between low-level features and high-level semantics, aiming to reduce the heterogeneous gap between different modal data. It also combines semantic auto-encoder and hash learning to improve the accuracy and speed of cross-modal retrieval. [Methods] First, we used the label information to learn the semantic joint representation of features and to construct an affine matrix. Then, we combined the auto-encoder with linear regression to learn hash function. Finally, we got the optimal hash code with the help of similarity metrics. [Results] We examined our method with three open datasets of WIKI, MIRFLICKR and NUS-WIDE for four different code lengths. The average MAP value obtained by our method is 0.1135, 0.0278 and 0.0505 higher than the best results of LSSH, FSH, ACQ, DBRC, SPDH, SePH and SMH. [Limitations] Our method is mainly applicable to the linear projection of multi-modal data. However, it fails to achieve good results for nonlinear issues. [Conclusions] The proposed method effectively improves the accuracy and speed of cross-modal retrieval tasks.

Key wordsCross-Modal Retrieval    Auto-Encoder    Hash Learning    Multi-Modal
收稿日期: 2021-06-20      出版日期: 2022-01-20
ZTFLH:  TP393  
基金资助:* 教育部人文社会科学研究规划基金项目(18Y JAZH150)
通讯作者: 邓芳,ORCID:0000-0003-3973-9358     E-mail: dfzoe2026@outlook.com
引用本文:   
朱路, 邓芳, 刘坤, 贺婷婷, 刘媛媛. 基于语义自编码哈希学习的跨模态检索方法*[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
Zhu Lu, Deng Fang, Liu Kun, He Tingting, Liu Yuanyuan. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0604      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I12/110
Fig.1  跨模态检索模型
Fig.2  基于语义自编码哈希学习的跨模态检索模型
方法 图像检索文本 文本检索图像 检索平均值
16bit 32bit 64bit 128bit 16bit 32bit 64bit 128bit
LSSH
FSH
ACQ
DBRC
SePH
SPDH
SMH
本文
0.210 1
0.223 5
0.267 2
0.253 4
0.283 6
0.259 2
0.230 1
0.476 1
0.214 5
0.231 6
0.268 9
0.264 8
0.285 9
0.261 2
0.237 5
0.472 5
0.216 6
0.240 8
0.285 2
0.268 6
0.287 9
0.276 5
0.246 4
0.488 9
0.209 2
0.247 4
0.267 9
0.287 8
0.286 3
0.284 3
0.245 8
0.486 6
0.503 1
0.480 5
0.546 4
0.543 9
0.534 5
0.420 9
0.328 4
0.572 5
0.522 1
0.480 4
0.561 2
0.537 7
0.535 1
0.440 4
0.359 2
0.574 1
0.528 4
0.512 7
0.559 9
0.547 6
0.547 1
0.458 4
0.361 2
0.578 6
0.532 5
0.518 2
0.560 5
0.548 8
0.550 6
0.462 7
0.369 1
0.575 6
0.367 0
0.366 9
0.414 6
0.406 5
0.413 8
0.357 9
0.297 1
0.528 1
Table 1  不同方法在WIKI数据集上的MAP值比较
Fig.3  在WIKI 数据集上不同码长的结果
方法 图像检索文本 文本检索图像 检索平均值
16bit 32bit 64bit 128bit 16bit 32bit 64bit 128bit
LSSH
FSH
ACQ
DBRC
SePH
SPDH
SMH
本文
0.578 4
0.589 3
0.609 3
0.587 3
0.657 3
0.686 9
0.600 2
0.697 1
0.580 4
0.602 7
0.619 7
0.589 8
0.660 3
0.686 7
0.613 0
0.699 2
0.579 7
0.600 6
0.600 2
0.590 2
0.661 6
0.694 8
0.620 9
0.696 8
0.581 6
0.602 2
0.586 7
0.590 7
0.663 7
0.690 6
0.625 1
0.699 8
0.589 8
0.586 5
0.614 5
0.588 3
0.648 1
0.681 9
0.620 5
0.730 4
0.592 7
0.597 0
0.611 1
0.596 3
0.652 1
0.689 6
0.638 7
0.734 5
0.593 2
0.596 5
0.601 3
0.596 2
0.654 5
0.686 5
0.642 1
0.730 7
0.593 2
0.596 9
0.586 5
0.597 5
0.653 4
0.684 4
0.647 6
0.734 9
0.586 1
0.596 4
0.603 6
0.592 0
0.656 3
0.687 6
0.626 0
0.715 4
Table 2  不同方法在MIRFLICKR数据集上的MAP值比较
Fig.4  在MIRFLICKR数据集上不同码长的结果
方法 图像检索文本 文本检索图像 检索平均值
16bit 32bit 64bit 128bit 16bit 32bit 64bit 128bit
LSSH
FSH
ACQ
DBRC
SePH
SPDH
SMH
本文
0.484 7
0.492 7
0.467 7
0.393 9
0.478 7
0.583 4
0.515 1
0.650 1
0.494 9
0.498 6
0.455 4
0.408 7
0.486 9
0.601 7
0.527 7
0.649 0
0.508 6
0.501 5
0.441 7
0.416 6
0.488 8
0.593 0
0.597 4
0.650 3
0.532 2
0.505 7
0.432 6
0.416 5
0.493 2
0.591 7
0.549 0
0.651 2
0.589 7
0.475 1
0.513 0
0.424 9
0.448 9
0.612 1
0.590 2
0.640 7
0.617 5
0.478 5
0.485 1
0.429 4
0.453 9
0.591 6
0.613 3
0.642 7
0.625 2
0.482 2
0.477 6
0.438 1
0.458 7
0.598 1
0.640 1
0.643 4
0.637 3
0.487 9
0.469 1
0.442 7
0.462 1
0.597 2
0.647 6
0.644 7
0.561 5
0.490 2
0.467 7
0.421 3
0.471 4
0.596 0
0.585 0
0.646 5
Table 3  不同方法在NUS-WIDE数据集上的MAP值比较
Fig.5  在NUS-WIDE数据集上不同码长的结果
Fig.6  目标函数值的收敛曲线
[1] Song Y L, Soleymani M. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1979-1988.
[2] Gu J X, Cai J F, Joty S, et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models [C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7181-7189.
[3] Ning H L, Zheng X T, Yuan Y, et al. Audio Description from Image by Modal Translation Network[J]. Neurocomputing, 2021, 423: 124-134.
doi: 10.1016/j.neucom.2020.10.053
[4] Carvalho M, Cadène R, Picard D, et al. Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [C]//Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 35-44.
[5] Wang B K, Yang Y, Xu X, et al. Adversarial Cross-Modal Retrieval [C]//Proceedings of the 25th ACM International Conference on Multimedia. 2017: 154-162.
[6] Peng Y X, Huang X, Zhao Y Z. An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385.
doi: 10.1109/TCSVT.2017.2705068
[7] Peng Y X, Zhai X Z, Zhao Y Z, et al. Semi-Supervised Cross-Media Feature Learning with Unified Patch Graph Regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(3): 583-596.
doi: 10.1109/TCSVT.2015.2400779
[8] Yang E, Deng C, Liu W, et al. Pairwise Relationship Guided Deep Hashing for Cross-modal Retrieval [C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017:1618-1625.
[9] Zhang D Q, Li W J. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization [C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014: 2177-2183.
[10] Mandal D, Chaudhury K N, Biswas S. Generalized Semantic Preserving Hashing for n-Label Cross-Modal Retrieval [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2633-2641.
[11] Zhou J L, Ding G G, Guo Y C. Latent Semantic Sparse Hashing for Cross-Modal Similarity Search [C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2014: 415-424.
[12] Irie G, Arai H, Taniguchi Y. Alternating Co-Quantization for Cross-Modal Hashing [C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 1886-1894.
[13] Ding G G, Guo Y C, Zhou J L. Collective Matrix Factorization Hashing for Multimodal Data [C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014: 2083-2090.
[14] Su S P, Zhong Z S, Zhang C. Deep Joint-semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval [C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 2019: 3027-3035.
[15] Hu D, Nie F P, Li X L. Deep Binary Reconstruction for Cross-modal Hashing[J]. IEEE Transactions on Multimedia, 2019, 21(4): 973-985.
doi: 10.1109/TMM.2018.2866771
[16] Zhang J, Peng Y X, Yuan M K. Unsupervised Generative Adversarial Cross-modal Hashing[OL]. arXiv Preprint, arXiv: 1712.00358.
[17] Shen X B, Shen F M, Sun Q S, et al. Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval[J]. IEEE Transactions on Cybernetics, 2017, 47(12): 4275-4288.
doi: 10.1109/TCYB.2016.2606441
[18] Zhang P F, Li Y, Huang Z, et al. Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval[J]. IEEE Transactions on Multimedia, DOI: 10.1109/TMM.2021.3053766.
doi: 10.1109/TMM.2021.3053766
[19] Li C, Deng C, Li N, et al. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval [C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 4242-4251.
[20] Lin Z J, Ding G G, Hu M Q, et al. Semantics-Preserving Hashing for Cross-View Retrieval [C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3864-3872.
[21] Jiang Q Y, Li W J. Deep Cross-Modal Hashing [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3270-3278.
[22] Nie X S, Wang B W, Li J J, et al. Deep Multiscale Fusion Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(1): 401-410.
doi: 10.1109/TCSVT.76
[23] Liu X, Cheung Y M, Hu Z K, et al. Adversarial Tri-fusion Hashing Network for Imbalanced Cross-Modal Retrieval[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2021, 5(4): 607-619.
doi: 10.1109/TETCI.2020.3007143
[24] Meng M, Wang H T, Yu J, et al. Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval[J]. IEEE Transactions on Image Processing, 2021, 30: 986-1000.
doi: 10.1109/TIP.83
[25] Guo J, Zhu W W. Collective Affinity Learning for Partial Cross-Modal Hashing[J]. IEEE Transactions on Image Processing, 2020, 29: 1344-1355.
doi: 10.1109/TIP.83
[26] Xu H W, Feng Y, Chen J, et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications [C]//Proceedings of the 2018 World Wide Web Conference. 2018: 187-196.
[27] Pumsirirat A, Yan L. Credit Card Fraud Detection Using Deep Learning Based on Auto-Encoder and Restricted Boltzmann Machine[J]. International Journal of Advanced Computer Science and Applications, 2018, 9(1): 18-25.
[28] Xuan R C, Shim J, Lee S G. Deep Semantic Hashing Using Pairwise Labels[J]. IEEE Access, 2021, 9: 91934-91949.
doi: 10.1109/ACCESS.2021.3092150
[29] Chen J, Zu Y X. Local Feature Hashing with Binary Auto-Encoder for Face Recognition[J]. IEEE Access, 2020, 8: 37526-37540.
doi: 10.1109/Access.6287639
[30] Song J K, Zhang H W, Li X P, et al. Self-Supervised Video Hashing with Hierarchical Binary Auto-Encoder[J]. IEEE Transactions on Image Processing, 2018, 27(7): 3210-3221.
doi: 10.1109/TIP.2018.2814344
[31] Liu H, Lin M B, Zhang S C, et al. Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval [C]//Proceedings of the 26th ACM International Conference on Multimedia. 2018: 1589-1597.
[32] Wu Y L, Wang S H, Huang Q M. Multi-Modal Semantic Autoencoder for Cross-Modal Retrieval[J]. Neurocomputing, 2019, 331: 165-175.
doi: 10.1016/j.neucom.2018.11.042
[33] Chen Y N, Lin H T. Feature-Aware Label Space Dimension Reduction for Multi-Label Classification [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012: 1529-1537.
[34] Ringle C M, Sarstedt M, Mitchell R, et al. Partial Least Squares Structural Equation Modeling in HRM Research[J]. The International Journal of Human Resource Management, 2020, 31(12): 1617-1643.
doi: 10.1080/09585192.2017.1416655
[35] Chen L, Harshaw C, Hassani H, et al. Projection-free Online Optimization with Stochastic Gradient: From Convexity to Submodularity [C]//Proceedings of the 35th International Conference on Machine Learning. 2018: 814-823.
[36] Zhou X W, Zhu M L, Daniilidis K. Multi-Image Matching via Fast Alternating Minimization [C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015: 4032-4040.
[37] Rasiwasia N, Costa Pereira J, Coviello E, et al. A New Approach to Cross-Modal Multimedia Retrieval [C]//Proceedings of the 18th ACM International Conference on Multimedia. 2010: 251-260.
[38] Huiskes M J, Lew M S. The MIR Flickr Retrieval Evaluation [C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 2008: 39-43.
[39] Chua T S, Tang J H, Hong R C, et al. NUS-WIDE: A Real-World Web Image Database from National University of Singapore [C]//Proceedings of the ACM International Conference on Image and Video Retrieval. 2009: 1-9.
[40] Yue Y S, Finley T, Radlinski F, et al. A Support Vector Method for Optimizing Average Precision [C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007: 271-278.
[41] Liu H, Ji R R, Wu Y J, et al. Cross-Modality Binary Code Learning via Fusion Similarity Hashing [C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6345-6353.
[42] Zhen Y, Gao Y, Yeung D Y, et al. Spectral Multimodal Hashing and Its Application to Multimedia Retrieval[J]. IEEE Transactions on Cybernetics, 2016, 46(1): 27-38.
doi: 10.1109/TCYB.2015.2392052 pmid: 26208374
[1] 谢豪,毛进,李纲. 基于多层语义融合的图文信息情感分类研究*[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[2] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[3] 王雨竹,谢珺,陈波,续欣莹. 基于跨模态上下文感知注意力的多模态情感分析 *[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[4] 朱路,田晓梦,曹赛男,刘媛媛. 基于高阶语义相关的子空间跨模态检索方法研究*[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
[5] 陈文杰. 基于翻译模型的科研合作预测研究*[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn