Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (10): 78-88    DOI: 10.11925/infotech.2096-3467.2019.0052
Current Issue | Archive | Adv Search |
Cross-media Fusion Method Based on LDA2Vec and Residual Network
Qinghong Zhong1,2,Xiaodong Qiao1,Yunliang Zhang1,2(),Mengjuan Weng1,2
1Institute of Scientific and Technical Information of China, Beijing 100038, China
2Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, Beijing 100038, China
Download: PDF (723 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper optimizes feature extraction based on the theory of cross-media fusion mechanism, aiming to reduce the semantic gaps between heterogeneous data. [Methods] With the help of LDA2Vec and ResNet V2 models, we extracted features from the texts and images. Then, we used semantic association matching technique to map the heterogeneous text / image features to the consistent expression space. [Results] Compared with the performance of the LDA and SIFT algorithms, the proposed method increased the MAP value of text / image mutual retrieval to 0.454. [Limitations] The size of training sets needs to be expanded and extracting the optimization features has limited impacts on cross-media fusion. [Conclusions] The proposed method is effective and provides new directions for cross-media studies.

Key wordsCross-media Data      Feature Extraction      LDA2Vec      ResNet      Semantic Association     
Received: 14 January 2019      Published: 25 November 2019
ZTFLH:  G354  
Corresponding Authors: Yunliang Zhang     E-mail: zhangyl@istic.ac.cn

Cite this article:

Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network. Data Analysis and Knowledge Discovery, 2019, 3(10): 78-88.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0052     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I10/78

研究方法 优点 缺点
基于统计的方法 TF、DF、TF-IDF[10] 用词出现频率来衡量其重要性, 易于理解,
突出重要单词, 实现难度小
仅考虑词频不能有效反映词汇的重要程度
以及分布情况
信息增益(IG)[11] 选择最大的信息增益属性进行划分, 能很
好地衡量出文档分类效果
其结果会偏向取值较多的特征
互信息(MI)[12] 依据每个特征项与各个类别的平均关联程
度选择出一定量的特征项
容易选出生僻词甚至是噪音词, 这些词的
MI大但是携带较少的类别信息
基于词向量的方法 Word2Vec[13] 包含语义信息, 一定程度上解决了语义鸿
沟问题, 高效
需要大量训练语料, 具有不可解释性
基于主题模型的方法 LDA[14] 找到没有共同词的两篇文档之间的潜在联
系、挖掘出文档中的潜在词
受文档长度限制, 短文本无法挖掘出有用
的词
LDA2Vec[15] 吸收了Word2Vec局部预测和LDA全局预
测的优点, 在单词和文档上构建表示
训练时间长, 同样受文档长度影响
研究方法 优点 缺点
视觉底层
特征提取
颜色特征 RGB、HIS、HSV、
CMYK[17]
对图片大小、方向都不敏感, 在一些情况下
表现出相当强的鲁棒性
只用颜色特性很难完整而准确地描述一个具体物体, 不适合人的视觉特点
形状特征 SIFT特征提取算
[18]
适用于不变特征的提取, 提取的特征比较
稳定, 具有较好的区分性, 信息量丰富, 适
用于海量特征数据的处理
特征描述符的维数过大以及耗时过长, 实时性不高, 对边缘光滑的目标无法准确提取特征点
纹理特征 灰度共生矩阵、
随机场模型方法、
小波变换
反映了图像中的同质现象, 具有旋转不变
性以及良好的抗噪性
容易受分辨率变化、光照变化等的影响
空间关系
特征
姿态估计 可加强对图像内容的描述区分能力 对图像目标的旋转、图像目标的反转以及尺度变化都较为敏感
神经网络方法 卷积神经网络
CNN[19]
无需手动选取特征, 对高维数据的处理
无压力, 尤其对图像这种高度非结构化、分
布复杂的数据具有很强的处理能力
需要大量的训练数据、对计算计性能要求高
实验 映射方法 Image to Text Text to Image Avg
文献[5]
LDA
CCA 0.249 0.196 0.223
SM 0.225 0.223 0.224
SCM 0.277 0.226 0.252
LDA2Vec
Epoch=5
CCA 0.243 0.198 0.221
SM 0.298 0.228 0.263
SCM 0.283 0.227 0.255
LDA2Vec
Epoch=10
CCA 0.244 0.199 0.222
SM 0.302 0.228 0.265
SCM 0.288 0.233 0.261
LDA2Vec
Epoch=15
CCA 0.244 0.199 0.222
SM 0.301 0.224 0.263
SCM 0.291 0.225 0.258
LDA2Vec
Epoch=20
CCA 0.244 0.199 0.222
SM 0.301 0.223 0.262
SCM 0.287 0.224 0.256
LDA2Vec
Epoch=25
CCA 0.245 0.198 0.222
SM 0.298 0.225 0.262
SCM 0.284 0.226 0.255
Topic1 Topic2 Topic3 Topic4 Topic5 Topic6 Topic7 Topic8 Topic9 Topic10
king book water empire attack park government album game specie
queen publish whale roman japanese build party band team animal
prince story specie emperor air creek country release win cell
son fiction bird french command building national music match dinosaur
george poem north military navy town india song league bird
duke literature range government german river school studio player bone
family letter tree germany battle railway economic episode club skull
jame life forest battle fire centre university scene score male
william writer nest power aircraft road people production test study
henry author population treaty gun west public star final prey
Royalty &
Nobility
Literature& Theater / Warfare Art &
Architecture
/ Music Sport &
Recreation
Biology
实验 映射方法 Image to Text Text to Image Avg
文献[5]
SIFT
CCA 0.249 0.196 0.223
SM 0.225 0.223 0.224
SCM 0.277 0.226 0.252
ResNet V2
256维
CCA 0.417 0.410 0.414
SM 0.435 0.379 0.407
SCM 0.454 0.449 0.452
ResNet V2
2 048维
CCA 0.392 0.379 0.386
SM 0.415 0.366 0.391
SCM 0.437 0.437 0.437
实验 文本特征提取方法 图片特征提取方法 映射方法 Image to Text Text to Image Avg
文献[5] LDA SIFT SCM 0.277 0.226 0.252
文献[31] LDA CNN-Fc6 SCM 0.417 0.352 0.385
文献[9] LDA CNN-Fc7 SM 0.430 0.370 0.400
文献[27] TextNet CNN-Fc8 Deep-SM 0.478 0.422 0.450
文献[26] / / DCCA 0.445 0.399 0.422
4.2节 LDA2Vec SIFT SM 0.302 0.228 0.265
4.3节 LDA ResNet V2 256维 SCM 0.454 0.449 0.452
LDA2Vec+ResNet V2 256维 LDA2Vec ResNet V2 256维 CCA 0.418 0.411 0.415
SM 0.445 0.375 0.410
SCM 0.462 0.446 0.454
[1] 潘刚, 张运良, 钟庆虹 . 工程科技领域知识服务的思考与实践[J]. 情报工程, 2018,4(5):4-12.
[1] ( Pan Gang, Zhang Yunliang, Zhong Qinghong . Thinking and Practice of Knowledge Services in Engineering Field[J]. Technology Intelligence Engineering, 2018,4(5):4-12.)
[2] 杨毅 . 跨媒体信息技术与应用[M]. 北京: 电子工业出版社, 2014: 3-10.
[2] ( Yang Yi. Cross-media Information Technology and Application [M]. Beijing: Publishing House of Electronics Industry, 2014: 3-10.)
[3] 谢毓湘, 栾悉道, 吴玲达 . 多媒体数据语义鸿沟问题分析[J]. 武汉理工大学学报: 信息与管理工程版, 2011,33(6):859-863.
[3] ( Xie Yuxiang, Luan Xidao, Wu Lingda . Multimedia Data Semantic Gap Analysis[J]. Journal of Wuhan University of Technology: IAME, 2011,33(6):859-863.)
[4] 赵学义, 李玺, 张仲非 . 基于多标签关系的多媒体信息检索[C]// 见: 浙江省信号处理学会2015年学术年会论文集. 杭州: 浙江大学出版社, 2015.
[4] ( Zhao Xueyi, Li Xi, Zhang Zhongfei. Multimedia Information Retrieval Based on Multi-label Relationship[C]// Proceedings of the 2015 Annual Conference of the Signal Processing Society of Zhejiang Province. Hangzhou: Zhejiang University Press, 2015.)
[5] Rasiwasia N, Pereira J C, Coviello E , et al. A New Approach to Cross-Modal Multimedia Retrieval [C]// Proceedings of International Conference on Multimedia. Firenze: ACM, 2010: 251-260.
[6] 庄凌, 庄越挺, 吴江琴 , 等. 一种基于稀疏典型性相关分析的图像检索方法[J]. 软件学报, 2012,23(5):1295-1304.
[6] ( Zhuang Ling, Zhuang Yueting, Wu Jiangqin , et al. Image Retrieval Approach Based on Sparse Canonical Correlation Analysis[J]. Journal of Software, 2012,23(5):1295-1304.)
[7] 李向阳, 庄越挺, 潘云鹤 . 基于内容的图像检索技术与系统[J]. 计算机研究与发展, 2001,38(3):344-354.
[7] ( Li Xiangyang, Zhuang Yueting, Pan Yunhe . The Technique and Systems of Content-based Image Retrieval[J]. Journal of Computer Research and Development, 2001,38(3):344-354.)
[8] 庄越挺 . 智能多媒体信息分析与检索的研究[D]. 杭州: 浙江大学, 1998.
[8] ( Zhuang Yueting . Research on Intelligent Multimedia Information Analysis and Retrieval[D]. Hangzhou: Zhejiang University, 1998.)
[9] 魏云超 . 跨媒体数据的语义分类和检索[D]. 北京: 北京交通大学, 2016.
[9] ( Wei Yunchao . Semantic Classification and Retrieval of Cross-media Data[D]. Beijing: Beijing Jiaotong University, 2016.)
[10] 鹿鹏, 庄敏, 龙刚 , 等. 文本特征提取研究现状分析与展望[J]. 科技创新与品牌, 2017(4):70-74.
[10] ( Lu Peng, Zhuang Min, Long Gang . Analysis and Prospect of Research on Text Feature Extraction[J]. Technological Innovation and Brand, 2017(4):70-74.)
[11] 陈磊, 李俊 . 基于词向量的文本特征选择方法研究[J]. 小型微型计算机系统, 2018,39(5):129-132.
[11] ( Chen Lei, Li Jun . Research on Text Feature Selection Method Based on Word Vector[J]. Journal of Chinese Computer Systems, 2018,39(5):129-132.)
[12] 陈婧琳 . 基于特征学习和关联学习的在线商品跨媒体检索研究[D]. 南昌: 华东交通大学, 2016.
[12] ( Chen Jinglin . Research on Cross-media Retrieval of Online Product Based on Feature Learning and Association Learning[D]. Nanchang: East China Jiaotong University, 2016.)
[13] Mikolov T, Sutskever I, Kai C , et al. Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013,26:3111-3119.
[14] Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[15] Moody C E . Mixing Dirichlet Topic Models and Word Embeddings to Make Lda2vec[OL]. arxiv Preprint, arxiv: 1605. 02019.
[16] 翟俊海, 赵文秀, 王熙照 . 图像特征提取研究[J]. 河北大学学报: 自然科学版, 2009,29(1):106-112.
[16] ( Zhai Junhai, Zhao Wenxiu, Wang Xizhao . Research on the Image Feature Extraction[J]. Journal of Hebei University: Natural Science Edition, 2009,29(1):106-112.)
[17] 常芳, 尚振宏, 刘辉 , 等. 一种基于颜色特征的自适应目标跟踪算法[J]. 信息技术, 2018(3):10-14.
[17] ( Chang Fang, Shang Zhenhong, Liu Hui . An Adaptive Target Tracking Algorithm Based on Color Features[J]. Information Technology, 2018(3):10-14.)
[18] 叶雨晴, 邱晓晖 . 基于SIFT与K-means的图像复制粘贴篡改检测[J]. 计算机技术与发展, 2018,28(6):121-124.
[18] ( Ye Yuqing, Qiu Xiaohui . Image Copy and Paste Tamper Detection Based on SIFT and K-means[J]. Computer Technology and Development, 2018,28(6):121-124.)
[19] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks [C]// Proceedings of International Conference on Neural Information Processing Systems. Lake Tahoe: NIPS, 2012: 84-90.
[20] 黄潇, 谷硕, 马晓晔 , 等. 人工智能糖网眼底图像识别在真实世界的应用[J]. 情报工程, 2018,4(1):24-30.
[20] ( Huang Xiao, Gu Shuo, Ma Xiaoye , et al. Artificial Intelligence of Diabetic Retinopathy Image Recognition Used in the Real World[J]. Technology Intelligence Engineering, 2018,4(1):24-30.)
[21] 丁亮, 姚长青, 何彦青 , 等. 深度学习在统计机器翻译领域自适应中的应用研究[J]. 情报工程, 2017,3(3):64-76.
[21] ( Ding Liang, Yao Changqing, He Yanqing , et al. Application of Deep Learning in Statistical Machine Translation Domain Adaptation[J]. Technology Intelligence Engineering, 2017,3(3):64-76.)
[22] 孙胜利, 赵丹新 . 基于ResNet的遥感图像飞机目标检测新方法[J]. 电子设计工程, 2018,26(22):164-168.
[22] ( Sun Shengli, Zhao Danxin . A New Method of Aircraft Target Detection Based on ResNet for Remote Sensing Images[J]. Electronic Design Engineering, 2018,26(22):164-168.)
[23] He K, Zhang X, Ren S , et al. Deep Residual Learning for Image Recognition[OL]. arXiv Preprint, arXiv: 1512. 03385.
[24] 师少杰 . 典型相关分析: 在机器学习方法上应用的概述[D]. 北京: 北京交通大学, 2012.
[24] ( Shi Shaojie . Canonical Correlation Analysis: An Overview of Application on Machine Learning Methods[D]. Beijing: Beijing Jiaotong University, 2012.)
[25] 刘瑶 . 融合CCA和Adaboost的跨模态多媒体信息检索[D]. 重庆: 西南大学, 2016.
[25] ( Liu Yao . Cross-modal Multimedia Information Retrieval with CCA and Adaboost[D]. Chongqing: Southwest University, 2016.)
[26] Andrew G, Arora R, Bilmes J , et al. Deep Canonical Correlation Analysis [C]// Proceedings of International Conference on Machine Learning. Atlanta: ICML, 2013: 1247-1255.
[27] Wei Y, Zhao Y, Lu C , et al. Cross-Modal Retrieval with CNN Visual Features: A New Baseline[J]. IEEE Transactions on Cybernetics, 2017,47(2):449-460.
[28] Huang X, Peng Y. Deep Cross-media Knowledge Transfer [C]// Proceedings of Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 8837-8846.
[29] Qi J, Peng Y. Cross-modal Bidirectional Translation via Reinforcement Learning [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 2630-2636.
[30] 最全英文停用词表整理( 891个)[EB/OL]. [2018-10-03].
[30] ( The Most Complete English Stop Word List (891)[EB/OL]. [2018-10-03]. .)
[31] 邹辉 . 基于深度学习与中心相关性度量算法的跨媒体检索方法研究[D]. 厦门: 华侨大学, 2016.
[31] ( Zou Hui . A Cross-Modal Multimedia Retrieval Method Research Based on Deep Learning and Centered Correlation[D]. Xiamen: Huaqiao University, 2016.)
[1] Lu Quan, He Chao, Chen Jing, Tian Min, Liu Ting. A Multi-Label Classification Model with Two-Stage Transfer Learning[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[2] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[3] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[4] Cai Jingxuan,Wu Jiang,Wang Chengkun. Predicting Usefulness of Crowd Testing Reports with Deep Learning[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
[5] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[6] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[7] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[8] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[9] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[10] Zhou Lixin,Lin Jie. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
[11] Huang Xiaoxi,Li Hanyu,Wang Rongbo,Wang Xiaohua,Chen Zhiqun. Recognizing Metaphor with Convolution Neural Network and SVM[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[12] Li Weiqing,Wang Weijun. Building Product Feature Dictionary with Large-scale Review Data[J]. 数据分析与知识发现, 2018, 2(1): 41-50.
[13] Li Changbing,Pang Chongpeng,Li Meiping. Extracting Product Features with Weight-based Apriori Algorithm[J]. 数据分析与知识发现, 2017, 1(9): 83-89.
[14] Li Xiangdong,Ba Zhichao,Gao Fan. Review of Digital Documents Automatic Classification Research[J]. 现代图书情报技术, 2016, 32(9): 17-26.
[15] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn