Abstract
[Objective] In order to solve the problem that most of existing cross-modal hashing methods only consider inter-modal similarity, and cannot make full use of the label semantic information, thus ignoring the heterogeneous data details and leading to the loss of semantic information.
[Methods] Firstly, the methods uses Euclidean distance and Tanimoto coefficient to measure the intra-modal similarity of data from images and texts respectively; Then the weighted values of the two are used to measure the inter-modal similarity to make full use of the detailed information of heterogeneous data; After that, the discriminativeness of the hash code is improved by preserving the semantic information of the data label, and the loss of semantic information is prevented; Finally, the quantization loss is calculated on the generated hash code and the hash bit balance constraint is applied to further improve the quality of the hash code.
[Results] Compare with 11 methods,the highest mAP for the retrieval tasks of image to text and text to image increase by 9.5% and 5.8%, respectively, on MIRFlickr25k and 4.7% and 1.1% on NUS-WIDE.
[Limitations] Model training relies on label information, and the effect decreases in unsupervised and semi-supervised situations.
[Conclusions] The proposed method can retain the detailed information of heterogeneous data and prevent the loss of semantic information, which effectively improves the model retrieval performance.
|