|
|
Classifying Images of Intangible Cultural Heritages with Multimodal Fusion |
Fan Tao,Wang Hao(),Li Yueyan,Deng Sanhong |
School of Information Management, Nanjing University, Nanjing 210023, China |
|
|
Abstract [Objective] This paper proposes a new method combining images and texual descriptions, aiming to improve the classification of Intangible Cultural Heritage (ICH) images. [Methods] We built the new model with multimodal fusion, which includes a fine-tuned deep pre-trained model for extracting visual semantic features, a BERT model for extracting textual features, a fusion layer for concatenating visual and textual features, and an output layer for predicting labels. [Results] We examined the proposed model with the national ICH project-New Year Prints to classify the Mianzu Prints, Taohuawu Prints, Yangjiabu Prints, and Yangliuqing Prints. We found that fine-tuning the convolutional layer strengthened the visual semantics features of the ICH images, and the F1 value for classification reached 72.028%. Compared with the baseline models, our method yielded the best results, with a F1 value of 77.574%. [Limitations] The proposed model was only tested on New Year Prints, which needs to be expanded to more ICH projects in the future. [Conclusions] Adding textual description features can improve the performance of ICH image classification. Fine-tuning convolutional layers in image deep pre-trained model can improve extraction of visual semantics features.
|
Received: 25 August 2021
Published: 18 February 2022
|
|
Fund:National Natural Science Foundation of China(72074108);Fundamental Research Funds for the Central Universities(010814370113) |
Corresponding Authors:
Wang Hao,ORCID: 0000-0002-0131-0823
E-mail: ywhaowang@nju.edu.cn
|
[1] |
项兆伦. 非遗保护要见人见物见生活[N]. 人民日报, 2018-06-06(12).
|
[1] |
( Xiang Zhaolun. Protecting Intangible Cultural Heritage Involving People, Things and Lives[N]. People Daily, 2018-06-06(12).)
|
[2] |
文化和旅游部. 关于印发《“十四五”非物质文化遗产保护规划》的通知[OL]. [2021-06-23]. http://zwgk.mct.gov.cn/zfxxgkml/fwzwhyc/202106/t20210609_925092.html.
|
[2] |
(Ministry of Culture and Tourism. Notice on Issuing the “Fourteenth Five-Year Plan for the Protection of Intangible Cultural Heritage”[OL]. [2021-06-23]. http://zwgk.mct.gov.cn/zfxxgkml/fwzwhyc/202106/t20210609_925092.html. )
|
[3] |
Do T N, Pham N K, Nguyen H H, et al. Stacking of SVMs for Classifying Intangible Cultural Heritage Images[C]// Proceedings of the 6th International Conference on Computer Science, Applied Mathematics and Applications. Springer, Cham, 2019: 186-196.
|
[4] |
Janković R. Machine Learning Models for Cultural Heritage Image Classification: Comparison Based on Attribute Selection[J]. Information, 2019, 11(1):12.
doi: 10.3390/info11010012
|
[5] |
Li Q C, Gkoumas D, Lioma C, et al. Quantum-Inspired Multimodal Fusion for Video Sentiment Analysis[J]. Information Fusion, 2021, 65:58-71.
doi: 10.1016/j.inffus.2020.08.006
|
[6] |
Abdu S A, Yousef A H, Salem A. Multimodal Video Sentiment Analysis Using Deep Learning Approaches, a Survey[J]. Information Fusion, 2021, 76:204-226.
doi: 10.1016/j.inffus.2021.06.003
|
[7] |
Ananthram A, Saravanakumar K K, Huynh J, et al. Multi-Modal Emotion Detection with Transfer Learning[OL]. arXiv Preprint, arXiv:2011.07065.
|
[8] |
Xu J, Li Z J, Huang F R, et al. Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations[J]. IEEE Transactions on Industrial Informatics, 2021, 17(4):2974-2982.
doi: 10.1109/TII.2020.3005405
|
[9] |
Huang F R, Zhang X M, Zhao Z H, et al. Image-Text Sentiment Analysis via Deep Multimodal Attentive Fusion[J]. Knowledge-Based Systems, 2019, 167:26-37.
doi: 10.1016/j.knosys.2019.01.019
|
[10] |
Campos V, Jou B, Giró-i-Nieto X. From Pixels to Sentiment: Fine-Tuning CNNS for Visual Sentiment Prediction[J]. Image and Vision Computing, 2017, 65:15-22.
doi: 10.1016/j.imavis.2017.01.011
|
[11] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Minneapolis, Minnesota: Association for Computational Linguistics, 2019: 4171-4186.
|
[12] |
任秋彤, 王昊, 熊欣, 等. 融合GCN远距离约束的非遗戏剧术语抽取模型构建及其应用研究[J]. 数据分析与知识发现, 2021, 5(12):123-136.
|
[12] |
( Ren Qiutong, Wang Hao, Xiong Xin, et al. Extracting Drama Terms with GCN Long-Distance Constrain[J]. Data Analysis and Knowledge Discovery, 2021, 5(12):123-136.)
|
[13] |
刘浏, 秦天允, 王东波. 非物质文化遗产传统音乐术语自动抽取[J]. 数据分析与知识发现, 2020, 4(12):68-75.
|
[13] |
( Liu Liu, Qin Tianyun, Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. Data Analysis and Knowledge Discovery, 2020, 4(12):68-75.)
|
[14] |
朱学芳, 王若宸. 非遗图像语义信息本体构建及其关联数据存储和发布研究[J]. 现代情报, 2021, 41(6):54-63.
|
[14] |
( Zhu Xuefang, Wang Ruochen. Research on ICH Image Semantic Information Ontology Construction of and Its Linked Data Storage & Publication[J]. Journal of Modern Information, 2021, 41(6):54-63.)
|
[15] |
Kulkarni U, Meena S M, Gurlahosur S V, et al. Classification of Cultural Heritage Sites Using Transfer Learning[C]// Proceedings of the 5th International Conference on Multimedia Big Data. IEEE, 2019: 391-397.
|
[16] |
Yunari N, Yuniarno E M, Purnomo M. Indonesian Batik Image Classification Using Statistical Texture Feature Extraction Gray Level Co-Occurrence Matrix (GLCM) and Learning Vector Quantization (LVQ)[J]. Journal of Telecommunication, Electronic and Computer Engineering, 2018, 10:67-71.
|
[17] |
Soleymani M, Garcia D, Jou B, et al. A Survey of Multimodal Sentiment Analysis[J]. Image and Vision Computing, 2017, 65:3-14.
doi: 10.1016/j.imavis.2017.08.003
|
[18] |
Majumder N, Hazarika D, Gelbukh A, et al. Multimodal Sentiment Analysis Using Hierarchical Fusion with Context Modeling[J]. Knowledge-Based Systems, 2018, 161:124-133.
doi: 10.1016/j.knosys.2018.07.041
|
[19] |
You Q Z, Luo J B, Jin H L, et al. Cross-Modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia[C]// Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 2016: 13-22.
|
[20] |
Majumder N, Poria S, Peng H Y, et al. Sentiment and Sarcasm Classification with Multitask Learning[J]. IEEE Intelligent Systems, 2019, 34(3):38-43.
doi: 10.1109/MIS.2019.2904691
|
[21] |
Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[OL]. arXiv Preprint, arXiv: 1409.1556.
|
[22] |
Zhao Z Y, Zhu H Y, Xue Z H, et al. An Image-Text Consistency Driven Multimodal Sentiment Analysis Approach for Social Media[J]. Information Processing & Management, 2019, 56(6):102097.
doi: 10.1016/j.ipm.2019.102097
|
[23] |
Dashtipour K, Gogate M, Cambria E, et al. A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis[J]. Neurocomputing, 2021, 457:377-388.
doi: 10.1016/j.neucom.2021.02.020
|
[24] |
Poria S, Cambria E, Bajpai R, et al. A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion[J]. Information Fusion, 2017, 37:98-125.
doi: 10.1016/j.inffus.2017.02.003
|
[25] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90.
doi: 10.1145/3065386
|
[26] |
Pérez Rosas V, Mihalcea R, Morency L P. Multimodal Sentiment Analysis of Spanish Online Videos[J]. IEEE Intelligent Systems, 2013, 28(3):38-45.
doi: 10.1109/MIS.2013.9
|
[27] |
Wang S, Manning C. Fast Dropout Training[C]// Proceedings of the 30th International Conference on International Conference on Machine Learning. PMLR, 2013: 118-126.
|
[28] |
Zhang X, Zou Y, Shi W. Dilated Convolution Neural Network with LeakyReLU for Environmental Sound Classification[C]// Proceedings of the 22nd International Conference on Digital Signal Processing. IEEE, 2017: 1-5.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|