Vocal Music Classification Based on Multi-category Feature Fusion

doi:10.11925/infotech.2096-3467.2020.0902

Data Analysis and Knowledge Discovery

2021, Vol. 5

Issue (5): 59-70 DOI: 10.11925/infotech.2096-3467.2020.0902

Current Issue | Archive | Adv Search

Vocal Music Classification Based on Multi-category Feature Fusion

Meng Zhen,Wang Hao(

),Yu Wei,Deng Sanhong,Zhang Baolong

Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China

Download: PDF (1841 KB) HTML ( 13 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This paper creates a new model combining the statistical characteristics of audio and image properties, aiming to address the classification issues facing music retrieval. [Methods] First, we extracted the statistical characteristics of audios and the Mel spectrogram characteristics of images with the help of machine learning methods. Then, we transformed the audio classification tasks to image categorization. Finally, we constructed a deep learning method combining audio statistics and Mel spectrogram image features. [Results] In vocal music classification, the F1 value of the new method based on image features was about 6 percentage points higher than that of the classic machine learning methods. The F1 value of the deep learning model based on feature fusion was more than 69%, which is 3.4 percentage points higher than that of the model with image features. [Limitations] The size of experimental data is small, and the advantages of deep learning methods were not fully utilized. [Conclusions] The setting of the sampling parameters of the Mel spectrogram influences the experimental results. The new feature fusion method can effectively improve the performance of vocal music classification.

Key words： Vocal Music Classification CNN Feature Fusion Music Information Retrieval Mel-Frequency Cepstrum

Received: 15 September 2020 Published: 08 March 2021

ZTFLH:

TP391

Fund:The work is supported by the National Social Science Fund of China(17ZDA291)

Corresponding Authors: Wang Hao E-mail: ywhaowang@nju.edu.cn

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Zhen Meng
	Hao Wang
	Wei Yu
	Sanhong Deng
	Baolong Zhang

Cite this article:

Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion. Data Analysis and Knowledge Discovery, 2021, 5(5): 59-70.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0902 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I5/59

The Research Framework

Description of Statistical Characteristics of Speech Signals

Example of Sonogram

Description of librosa Mel Spectrum Graph Sampling Parameters

Example of Mel-Frequency Cepstrum Diagram

Diagram of Network Data Flow

Music Classification Results of Machine Learning Models

Various Vocal Recognition Indexes of SVM Model Based on Statistical Features

SVM Vocal Classification Result Confusion Matrix

Statistical Feature Visualization

Change in Learning Rate

hop_length Value Change and Experimental Results

n_mels and spec_width Value Changes and Experimental Results

Various Vocal Recognition Indicators of Deep Learning Model Based on Image Features

Vocal Music Classification Index Based on Image Pre-training Model

Feature Fusion and Single Image Feature Deep Learning Model Recognition Index

F1 Value of Classification Results of Various Vocal Music on Each Classifier

[1]	腾讯音乐娱乐. 2019 华语乐坛的流行趋势[EB/OL]. [2020-08-25]. https://yobang.tencentmusic.com/unireport/overview.
[1]	(Tencent Music Entertainment. 2019 Chinese Music Trends[EB/OL]. [2020-08-25]. https://yobang.tencentmusic.com/unireport/overview.)
[2]	Aucouturier J J, Pachet F. Representing Musical Genre: A State of the Art[J]. Journal of New Music Research, 2003,32(1):83-93. doi: 10.1076/jnmr.32.1.83.16801
[3]	王昊, 邓三鸿, 朱立平, 等. 大数据环境下政务数据的情报价值及其利用研究——以海关报关商品归类风险规避为例[J]. 科技情报研究, 2020,2(4):74-89.
[3]	( Wang Hao, Deng Sanhong, Zhu Liping, et al. A Study of Intelligence Value and Employment of Political Data in Big Data Environment——The Risk Avoidance of Customs Declaration Commodities[J]. Scientific Information Research, 2020,2(4):78-89.)
[4]	Lambrou T, Kudumakis P, Speller R, et al. Classification of Audio Signals Using Statistical Features on Time and Wavelet Transform Domains[C]// Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. 1998: 3621-3624.
[5]	Mandel M I, Ellis D P W. Song-Level Features and Support Vector Machines for Music Classification[C]// Proceedings of the 6th International Conference on Music Information Retrieval. 2005.
[6]	Li T, Ogihara M, Li Q. A Comparative Study on Content-Based Music Genre Classification[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003: 282-289.
[7]	Shao X, Xu C S, Kankanhalli M S. Unsupervised Classification of Music Genre Using Hidden Markov Model[C]// Proceedings of 2004 IEEE International Conference on Multimedia and Expo(ICME). 2004: 2023-2026.
[8]	Silla Jr C N, Koerich A L, Kaestner C A A. Feature Selection in Automatic Music Genre Classification[C]// Proceedings of 2008 10th IEEE International Symposium on Multimedia. 2008: 39-44.
[9]	Rajanna A R, Aryafar K, Shokoufandeh A, et al. Deep Neural Networks: A Case Study for Music Genre Classification[C]// Proceedings of 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). 2015: 655-660.
[10]	雷文康. 基于深度神经网络的音乐流派分类研究[D]. 广州: 华南理工大学, 2017.
[10]	( Lei Wenkang. Research of Music Genre Classification Based on Deep Neural Network[D]. Guangzhou: South China University of Technology, 2017.)
[11]	Zhang W B, Lei W K, Xu X M, et al. Improved Music Genre Classification with Convolutional Neural Networks[C]// Proceedings of the 17th Annual Conference of the International Speech Communication Association. 2016: 3304-3308.
[12]	Bahuleyan H. Music Genre Classification Using Machine Learning Techniques[OL]. arXiv Preprint, arXiv: 1804. 01149.
[13]	Defferrard M, Benzi K, Vandergheynst P, et al. FMA: A Dataset for Music Analysis[OL]. arXiv Preprint, arXiv: 1612. 01840.
[14]	Kim J, Urbano J, Liem C C S, et al. One Deep Music Representation to Rule Them All? A Comparative Analysis of Different Representation Learning Strategies[J]. Neural Computing and Applications, 2020,32(4):1067-1093. doi: 10.1007/s00521-019-04076-1
[15]	Benzi K M. From Recommender Systems to Spatio-Temporal Dynamics with Network Science[R]. EPFL, 2017.
[16]	Choi J, Nam J. Zero-shot Learning for Audio-based Music Classification and Tagging[C]//Proceedings of the 20th International Society for Music Information Retrieval Conference ( ISMIR). 2019.
[17]	McFee B, Raffel C, Liang D, et al. librosa: Audio and Music Signal Analysis in Python[C]// Proceedings of the 14th Python in Science Conference. 2015: 18-25.
[18]	肖云鹏, 叶卫平. 基于特征参数归一化的鲁棒语音识别方法综述[J]. 中文信息学报, 2010,24(5):106-117.
[18]	( Xiao Yunpeng, Ye Weiping. Survey of Feature Normalization Techniques for Robust Speech Recognition[J]. Journal of Chinese Information Processing, 2010,24(5):106-117.)
[19]	刘晓明, 覃胜, 刘宗行, 等. 语音端点检测的仿真研究[J]. 系统仿真学报, 2005,17(8):1974-1976.
[19]	( Liu Xiaoming, Qin Sheng, Liu Zongxing, et al. Simulation of Speech Endpoint Detection[J]. Journal of System Simulation, 2005,17(8):1974-1976.)
[20]	陈功, 张雄伟. 一种基于灰关联分析的语音/音乐分类方法[J]. 声学技术, 2007,26(2):262-267.
[20]	( Chen Gong, Zhang Xiongwei. Speech/Music Discrimination Method Based on Gray Correlation Analysis[J]. Technical Acoustics, 2007,26(2):262-267.)
[21]	王冬冬. 基于节拍和关键背景模型的音频分类与分割[D]. 哈尔滨: 哈尔滨工业大学, 2017.
[21]	( Wang Dongdong. Audio Classification and Segmentation Method Based on Beats and Key Background Models[D]. Harbin: Harbin Institute of Technology, 2017.)
[22]	Qawaqneh Z, Mallouh A A, Barkana B D. Deep Neural Network Framework and Transformed MFCCs for Speaker's Age and Gender Classification[J]. Knowledge-Based Systems, 2017,115:5-14. doi: 10.1016/j.knosys.2016.10.008
[23]	Ellis D P W, Poliner G E. Identifying ‘Cover Songs’ with Chroma Features and Dynamic Programming Beat Tracking[C]// Proceedings of 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. 2007: 1429-1432.
[24]	庄海燕. 基于支持向量机的新闻音频分类[D]. 天津: 天津大学, 2007.
[24]	( Zhuang Haiyan. SVM-based News Audio Classification[D]. Tianjin: Tianjin University, 2007.)
[25]	Jiang D N, Lu L, Zhang H J, et al. Music Type Classification by Spectral Contrast Feature[C]// Proceedings of IEEE International Conference on Multimedia and Expo. 2002: 113-116.
[26]	Kos M, KačIč Z, Vlaj D,. Acoustic Classification and Segmentation Using Modified Spectral Roll-Off and Variance-Based Features[J]. Digital Signal Processing, 2013,23(2):659-674. doi: 10.1016/j.dsp.2012.10.008
[27]	魏利利. 音频信号分类算法研究[D]. 大连: 大连理工大学, 2009.
[27]	( Wei Lili. The Research on Audio Signal Classification[D]. Dalian: Dalian University of Technology, 2009.)
[28]	LeCun Y, Bengio Y, Hinton G. Deep Learning[J]. Nature, 2015,521(7553):436-444. doi: 10.1038/nature14539 pmid: 26017442
[29]	周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017,40(6):1229-1251.
[29]	( Zhou Feiyan, Jin Linpeng, Dong Jun. Review of Convolutional Neural Network[J]. Chinese Journal of Computers, 2017,40(6):1229-1251.)
[30]	van der Maaten L, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008,9:2579-2605.
[31]	Bisong E. Building Machine Learning and Deep Learning Models on Google Cloud Platform[M]. Berkeley, CA: Apress, 2019: 59-64.

[1]	Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[2]	Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[3]	Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[4]	Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[5]	Lin Kerou,Wang Hao,Gong Lijuan,Zhang Baolong. Disambiguation of Chinese Author Names with Multiple Features[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[6]	Wang Yuzhu,Xie Jun,Chen Bo,Xu Xinying. Multi-modal Sentiment Analysis Based on Cross-modal Context-aware Attention[J]. 数据分析与知识发现, 2021, 5(4): 49-59.
[7]	Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[8]	Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[9]	Dai Jianhua, Deng Yubin. Extracting Emotion-Cause Pairs Based on Emotional Dilation Gated CNN[J]. 数据分析与知识发现, 2020, 4(8): 98-106.
[10]	Weng Mengjuan,Yao Changqing,Han Hongqi,Wang Lijun,Ran Yaxin. Classification and Indexing Method with CNN for Imbalanced Datasets[J]. 数据分析与知识发现, 2020, 4(7): 87-95.
[11]	Li Junlian,Wu Yingjie,Deng Panpan,Leng Fuhai. Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion[J]. 数据分析与知识发现, 2020, 4(5): 38-45.
[12]	Qi Ruihua,Jian Yue,Guo Xu,Guan Jinghua,Yang Mingxin. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[13]	Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[14]	Na Ma,Zhixiong Zhang,Pengmin Wu. Automatic Identification of Term Citation Object with Feature Fusion[J]. 数据分析与知识发现, 2020, 4(1): 89-98.
[15]	Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.

Viewed

Full text

Abstract

Cited

Shared

Discussed