Classifying Images of Intangible Cultural Heritages with Multimodal Fusion

doi:10.11925/infotech.2096-3467.2021.0911

Data Analysis and Knowledge Discovery

2022, Vol. 6

Issue (2/3): 329-337 DOI: 10.11925/infotech.2096-3467.2021.0911

Current Issue | Archive | Adv Search

Classifying Images of Intangible Cultural Heritages with Multimodal Fusion

Fan Tao,Wang Hao(

),Li Yueyan,Deng Sanhong

School of Information Management, Nanjing University, Nanjing 210023, China

Download: PDF (6418 KB) HTML ( 15 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This paper proposes a new method combining images and texual descriptions, aiming to improve the classification of Intangible Cultural Heritage (ICH) images. [Methods] We built the new model with multimodal fusion, which includes a fine-tuned deep pre-trained model for extracting visual semantic features, a BERT model for extracting textual features, a fusion layer for concatenating visual and textual features, and an output layer for predicting labels. [Results] We examined the proposed model with the national ICH project-New Year Prints to classify the Mianzu Prints, Taohuawu Prints, Yangjiabu Prints, and Yangliuqing Prints. We found that fine-tuning the convolutional layer strengthened the visual semantics features of the ICH images, and the F1 value for classification reached 72.028%. Compared with the baseline models, our method yielded the best results, with a F1 value of 77.574%. [Limitations] The proposed model was only tested on New Year Prints, which needs to be expanded to more ICH projects in the future. [Conclusions] Adding textual description features can improve the performance of ICH image classification. Fine-tuning convolutional layers in image deep pre-trained model can improve extraction of visual semantics features.

Key words： Digital Humanities Multimodal Classification Image Classification

Received: 25 August 2021 Published: 18 February 2022

ZTFLH:

G202

Fund:National Natural Science Foundation of China(72074108);Fundamental Research Funds for the Central Universities(010814370113)

Corresponding Authors: Wang Hao,ORCID： 0000-0002-0131-0823 E-mail: ywhaowang@nju.edu.cn

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Tao Fan
	Hao Wang
	Yueyan Li
	Sanhong Deng

Cite this article:

Fan Tao, Wang Hao, Li Yueyan, Deng Sanhong. Classifying Images of Intangible Cultural Heritages with Multimodal Fusion. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 329-337.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0911 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/329

An Example of ICH Image and Its Textual Description

ICH Image Classification Model Based on Multimodal Fusion

The Structure of FICM

The Distribution of the Number of New Year Prints Images and Textual Descriptions Based on the Type

Fine-tuning Results of Convolutional Layers and fc Layers of block4 in FICM

Visualization Results of Convolutional Layers in block5 of FICM

Results Between ICMMF and Other Baseline Models

Classification Results of Different New Year Prints with Different Modalities

The Impact of dropout Value on the Performance of ICMMF Model

[1]	项兆伦. 非遗保护要见人见物见生活[N]. 人民日报, 2018-06-06(12).
[1]	( Xiang Zhaolun. Protecting Intangible Cultural Heritage Involving People, Things and Lives[N]. People Daily, 2018-06-06(12).)
[2]	文化和旅游部. 关于印发《“十四五”非物质文化遗产保护规划》的通知[OL]. [2021-06-23]. http://zwgk.mct.gov.cn/zfxxgkml/fwzwhyc/202106/t20210609_925092.html.
[2]	(Ministry of Culture and Tourism. Notice on Issuing the “Fourteenth Five-Year Plan for the Protection of Intangible Cultural Heritage”[OL]. [2021-06-23]. http://zwgk.mct.gov.cn/zfxxgkml/fwzwhyc/202106/t20210609_925092.html. )
[3]	Do T N, Pham N K, Nguyen H H, et al. Stacking of SVMs for Classifying Intangible Cultural Heritage Images[C]// Proceedings of the 6th International Conference on Computer Science, Applied Mathematics and Applications. Springer, Cham, 2019: 186-196.
[4]	Janković R. Machine Learning Models for Cultural Heritage Image Classification: Comparison Based on Attribute Selection[J]. Information, 2019, 11(1):12. doi: 10.3390/info11010012
[5]	Li Q C, Gkoumas D, Lioma C, et al. Quantum-Inspired Multimodal Fusion for Video Sentiment Analysis[J]. Information Fusion, 2021, 65:58-71. doi: 10.1016/j.inffus.2020.08.006
[6]	Abdu S A, Yousef A H, Salem A. Multimodal Video Sentiment Analysis Using Deep Learning Approaches, a Survey[J]. Information Fusion, 2021, 76:204-226. doi: 10.1016/j.inffus.2021.06.003
[7]	Ananthram A, Saravanakumar K K, Huynh J, et al. Multi-Modal Emotion Detection with Transfer Learning[OL]. arXiv Preprint, arXiv:2011.07065.
[8]	Xu J, Li Z J, Huang F R, et al. Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations[J]. IEEE Transactions on Industrial Informatics, 2021, 17(4):2974-2982. doi: 10.1109/TII.2020.3005405
[9]	Huang F R, Zhang X M, Zhao Z H, et al. Image-Text Sentiment Analysis via Deep Multimodal Attentive Fusion[J]. Knowledge-Based Systems, 2019, 167:26-37. doi: 10.1016/j.knosys.2019.01.019
[10]	Campos V, Jou B, Giró-i-Nieto X. From Pixels to Sentiment: Fine-Tuning CNNS for Visual Sentiment Prediction[J]. Image and Vision Computing, 2017, 65:15-22. doi: 10.1016/j.imavis.2017.01.011
[11]	Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Minneapolis, Minnesota: Association for Computational Linguistics, 2019: 4171-4186.
[12]	任秋彤, 王昊, 熊欣, 等. 融合GCN远距离约束的非遗戏剧术语抽取模型构建及其应用研究[J]. 数据分析与知识发现, 2021, 5(12):123-136.
[12]	( Ren Qiutong, Wang Hao, Xiong Xin, et al. Extracting Drama Terms with GCN Long-Distance Constrain[J]. Data Analysis and Knowledge Discovery, 2021, 5(12):123-136.)
[13]	刘浏, 秦天允, 王东波. 非物质文化遗产传统音乐术语自动抽取[J]. 数据分析与知识发现, 2020, 4(12):68-75.
[13]	( Liu Liu, Qin Tianyun, Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. Data Analysis and Knowledge Discovery, 2020, 4(12):68-75.)
[14]	朱学芳, 王若宸. 非遗图像语义信息本体构建及其关联数据存储和发布研究[J]. 现代情报, 2021, 41(6):54-63.
[14]	( Zhu Xuefang, Wang Ruochen. Research on ICH Image Semantic Information Ontology Construction of and Its Linked Data Storage & Publication[J]. Journal of Modern Information, 2021, 41(6):54-63.)
[15]	Kulkarni U, Meena S M, Gurlahosur S V, et al. Classification of Cultural Heritage Sites Using Transfer Learning[C]// Proceedings of the 5th International Conference on Multimedia Big Data. IEEE, 2019: 391-397.
[16]	Yunari N, Yuniarno E M, Purnomo M. Indonesian Batik Image Classification Using Statistical Texture Feature Extraction Gray Level Co-Occurrence Matrix (GLCM) and Learning Vector Quantization (LVQ)[J]. Journal of Telecommunication, Electronic and Computer Engineering, 2018, 10:67-71.
[17]	Soleymani M, Garcia D, Jou B, et al. A Survey of Multimodal Sentiment Analysis[J]. Image and Vision Computing, 2017, 65:3-14. doi: 10.1016/j.imavis.2017.08.003
[18]	Majumder N, Hazarika D, Gelbukh A, et al. Multimodal Sentiment Analysis Using Hierarchical Fusion with Context Modeling[J]. Knowledge-Based Systems, 2018, 161:124-133. doi: 10.1016/j.knosys.2018.07.041
[19]	You Q Z, Luo J B, Jin H L, et al. Cross-Modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia[C]// Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 2016: 13-22.
[20]	Majumder N, Poria S, Peng H Y, et al. Sentiment and Sarcasm Classification with Multitask Learning[J]. IEEE Intelligent Systems, 2019, 34(3):38-43. doi: 10.1109/MIS.2019.2904691
[21]	Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[OL]. arXiv Preprint, arXiv: 1409.1556.
[22]	Zhao Z Y, Zhu H Y, Xue Z H, et al. An Image-Text Consistency Driven Multimodal Sentiment Analysis Approach for Social Media[J]. Information Processing & Management, 2019, 56(6):102097. doi: 10.1016/j.ipm.2019.102097
[23]	Dashtipour K, Gogate M, Cambria E, et al. A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis[J]. Neurocomputing, 2021, 457:377-388. doi: 10.1016/j.neucom.2021.02.020
[24]	Poria S, Cambria E, Bajpai R, et al. A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion[J]. Information Fusion, 2017, 37:98-125. doi: 10.1016/j.inffus.2017.02.003
[25]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[J]. Communications of the ACM, 2017, 60(6):84-90. doi: 10.1145/3065386
[26]	Pérez Rosas V, Mihalcea R, Morency L P. Multimodal Sentiment Analysis of Spanish Online Videos[J]. IEEE Intelligent Systems, 2013, 28(3):38-45. doi: 10.1109/MIS.2013.9
[27]	Wang S, Manning C. Fast Dropout Training[C]// Proceedings of the 30th International Conference on International Conference on Machine Learning. PMLR, 2013: 118-126.
[28]	Zhang X, Zou Y, Shi W. Dilated Convolution Neural Network with LeakyReLU for Environmental Sound Classification[C]// Proceedings of the 22nd International Conference on Digital Signal Processing. IEEE, 2017: 1-5.

[1]	Zhou Zeyu, Wang Hao, Zhang Xiaoqin, Tao Fao, Ren Qiutong. Classification Model for Chinese Traditional Embroidery Based on Xception-TD[J]. 数据分析与知识发现, 2022, 6(2/3): 338-347.
[2]	Li Gang, Zhang Ji, Mao Jin. Social Media Image Classification for Emergency Portrait[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.
[3]	Zhang Qi,Jiang Chuan,Ji Youshu,Feng Minxuan,Li Bin,Xu Chao,Liu Liu. Unified Model for Word Segmentation and POS Tagging of Multi-Domain Pre-Qin Literature[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[4]	Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[5]	Zhao Yuxiang,Lian Jingwen. Review of Cultural Heritage Crowdsourcing in the Domain of Digital Humanities[J]. 数据分析与知识发现, 2021, 5(1): 36-55.
[6]	Liang Jiwen,Jiang Chuan,Wang Dongbo. Chinese-English Sentence Alignment of Ancient Literature Based on Multi-feature Fusion[J]. 数据分析与知识发现, 2020, 4(9): 123-132.
[7]	Xu Chenfei, Ye Haiying, Bao Ping. Automatic Recognition of Produce Entities from Local Chronicles with Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[8]	Liu Liu,Qin Tianyun,Wang Dongbo. Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J]. 数据分析与知识发现, 2020, 4(12): 68-75.
[9]	Haici Yang,Jun Wang. Visualizing Knowledge Graph of Academic Inheritance in Song Dynasty[J]. 数据分析与知识发现, 2019, 3(6): 109-116.
[10]	Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.

Viewed

Full text

Abstract

Cited

Shared

Discussed