[Objective] This paper optimizes feature extraction based on the theory of cross-media fusion mechanism, aiming to reduce the semantic gaps between heterogeneous data. [Methods] With the help of LDA2Vec and ResNet V2 models, we extracted features from the texts and images. Then, we used semantic association matching technique to map the heterogeneous text / image features to the consistent expression space. [Results] Compared with the performance of the LDA and SIFT algorithms, the proposed method increased the MAP value of text / image mutual retrieval to 0.454. [Limitations] The size of training sets needs to be expanded and extracting the optimization features has limited impacts on cross-media fusion. [Conclusions] The proposed method is effective and provides new directions for cross-media studies.
( Zhao Xueyi, Li Xi, Zhang Zhongfei. Multimedia Information Retrieval Based on Multi-label Relationship[C]// Proceedings of the 2015 Annual Conference of the Signal Processing Society of Zhejiang Province. Hangzhou: Zhejiang University Press, 2015.)
Rasiwasia N, Pereira J C, Coviello E , et al. A New Approach to Cross-Modal Multimedia Retrieval [C]// Proceedings of International Conference on Multimedia. Firenze: ACM, 2010: 251-260.
( Ye Yuqing, Qiu Xiaohui . Image Copy and Paste Tamper Detection Based on SIFT and K-means[J]. Computer Technology and Development, 2018,28(6):121-124.)
Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks [C]// Proceedings of International Conference on Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 84-90.
( Liu Yao . Cross-modal Multimedia Information Retrieval with CCA and Adaboost[D]. Chongqing: Southwest University, 2016.)
Andrew G, Arora R, Bilmes J , et al. Deep Canonical Correlation Analysis [C]// Proceedings of International Conference on Machine Learning. Atlanta: ICML, 2013: 1247-1255.
Wei Y, Zhao Y, Lu C , et al. Cross-Modal Retrieval with CNN Visual Features: A New Baseline[J]. IEEE Transactions on Cybernetics, 2017,47(2):449-460.
Huang X, Peng Y. Deep Cross-media Knowledge Transfer [C]// Proceedings of Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 8837-8846.
Qi J, Peng Y. Cross-modal Bidirectional Translation via Reinforcement Learning [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 2630-2636.
最全英文停用词表整理( 891个)[EB/OL]. [2018-10-03].
( The Most Complete English Stop Word List (891)[EB/OL]. [2018-10-03]. .)