1Institute of Scientific and Technical Information of China, Beijing 100038, China 2Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, Beijing 100038, China
[Objective] This paper optimizes feature extraction based on the theory of cross-media fusion mechanism, aiming to reduce the semantic gaps between heterogeneous data. [Methods] With the help of LDA2Vec and ResNet V2 models, we extracted features from the texts and images. Then, we used semantic association matching technique to map the heterogeneous text / image features to the consistent expression space. [Results] Compared with the performance of the LDA and SIFT algorithms, the proposed method increased the MAP value of text / image mutual retrieval to 0.454. [Limitations] The size of training sets needs to be expanded and extracting the optimization features has limited impacts on cross-media fusion. [Conclusions] The proposed method is effective and provides new directions for cross-media studies.
( Pan Gang, Zhang Yunliang, Zhong Qinghong . Thinking and Practice of Knowledge Services in Engineering Field[J]. Technology Intelligence Engineering, 2018,4(5):4-12.)
[2]
杨毅 . 跨媒体信息技术与应用[M]. 北京: 电子工业出版社, 2014: 3-10.
[2]
( Yang Yi. Cross-media Information Technology and Application [M]. Beijing: Publishing House of Electronics Industry, 2014: 3-10.)
( Zhao Xueyi, Li Xi, Zhang Zhongfei. Multimedia Information Retrieval Based on Multi-label Relationship[C]// Proceedings of the 2015 Annual Conference of the Signal Processing Society of Zhejiang Province. Hangzhou: Zhejiang University Press, 2015.)
[5]
Rasiwasia N, Pereira J C, Coviello E , et al. A New Approach to Cross-Modal Multimedia Retrieval [C]// Proceedings of International Conference on Multimedia. Firenze: ACM, 2010: 251-260.
( Li Xiangyang, Zhuang Yueting, Pan Yunhe . The Technique and Systems of Content-based Image Retrieval[J]. Journal of Computer Research and Development, 2001,38(3):344-354.)
[8]
庄越挺 . 智能多媒体信息分析与检索的研究[D]. 杭州: 浙江大学, 1998.
[8]
( Zhuang Yueting . Research on Intelligent Multimedia Information Analysis and Retrieval[D]. Hangzhou: Zhejiang University, 1998.)
[9]
魏云超 . 跨媒体数据的语义分类和检索[D]. 北京: 北京交通大学, 2016.
[9]
( Wei Yunchao . Semantic Classification and Retrieval of Cross-media Data[D]. Beijing: Beijing Jiaotong University, 2016.)
( Lu Peng, Zhuang Min, Long Gang . Analysis and Prospect of Research on Text Feature Extraction[J]. Technological Innovation and Brand, 2017(4):70-74.)
( Chen Jinglin . Research on Cross-media Retrieval of Online Product Based on Feature Learning and Association Learning[D]. Nanchang: East China Jiaotong University, 2016.)
[13]
Mikolov T, Sutskever I, Kai C , et al. Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013,26:3111-3119.
[14]
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[15]
Moody C E . Mixing Dirichlet Topic Models and Word Embeddings to Make Lda2vec[OL]. arxiv Preprint, arxiv: 1605. 02019.
( Zhai Junhai, Zhao Wenxiu, Wang Xizhao . Research on the Image Feature Extraction[J]. Journal of Hebei University: Natural Science Edition, 2009,29(1):106-112.)
( Ye Yuqing, Qiu Xiaohui . Image Copy and Paste Tamper Detection Based on SIFT and K-means[J]. Computer Technology and Development, 2018,28(6):121-124.)
[19]
Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks [C]// Proceedings of International Conference on Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 84-90.
( Huang Xiao, Gu Shuo, Ma Xiaoye , et al. Artificial Intelligence of Diabetic Retinopathy Image Recognition Used in the Real World[J]. Technology Intelligence Engineering, 2018,4(1):24-30.)
( Ding Liang, Yao Changqing, He Yanqing , et al. Application of Deep Learning in Statistical Machine Translation Domain Adaptation[J]. Technology Intelligence Engineering, 2017,3(3):64-76.)
( Sun Shengli, Zhao Danxin . A New Method of Aircraft Target Detection Based on ResNet for Remote Sensing Images[J]. Electronic Design Engineering, 2018,26(22):164-168.)
[23]
He K, Zhang X, Ren S , et al. Deep Residual Learning for Image Recognition[OL]. arXiv Preprint, arXiv: 1512. 03385.
[24]
师少杰 . 典型相关分析: 在机器学习方法上应用的概述[D]. 北京: 北京交通大学, 2012.
[24]
( Shi Shaojie . Canonical Correlation Analysis: An Overview of Application on Machine Learning Methods[D]. Beijing: Beijing Jiaotong University, 2012.)
( Liu Yao . Cross-modal Multimedia Information Retrieval with CCA and Adaboost[D]. Chongqing: Southwest University, 2016.)
[26]
Andrew G, Arora R, Bilmes J , et al. Deep Canonical Correlation Analysis [C]// Proceedings of International Conference on Machine Learning. Atlanta: ICML, 2013: 1247-1255.
[27]
Wei Y, Zhao Y, Lu C , et al. Cross-Modal Retrieval with CNN Visual Features: A New Baseline[J]. IEEE Transactions on Cybernetics, 2017,47(2):449-460.
[28]
Huang X, Peng Y. Deep Cross-media Knowledge Transfer [C]// Proceedings of Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 8837-8846.
[29]
Qi J, Peng Y. Cross-modal Bidirectional Translation via Reinforcement Learning [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 2630-2636.
[30]
最全英文停用词表整理( 891个)[EB/OL]. [2018-10-03].
[30]
( The Most Complete English Stop Word List (891)[EB/OL]. [2018-10-03]. .)