[Objective] This paper creates a new model combining the statistical characteristics of audio and image properties, aiming to address the classification issues facing music retrieval. [Methods] First, we extracted the statistical characteristics of audios and the Mel spectrogram characteristics of images with the help of machine learning methods. Then, we transformed the audio classification tasks to image categorization. Finally, we constructed a deep learning method combining audio statistics and Mel spectrogram image features. [Results] In vocal music classification, the F1 value of the new method based on image features was about 6 percentage points higher than that of the classic machine learning methods. The F1 value of the deep learning model based on feature fusion was more than 69%, which is 3.4 percentage points higher than that of the model with image features. [Limitations] The size of experimental data is small, and the advantages of deep learning methods were not fully utilized. [Conclusions] The setting of the sampling parameters of the Mel spectrogram influences the experimental results. The new feature fusion method can effectively improve the performance of vocal music classification.
( Wang Hao, Deng Sanhong, Zhu Liping, et al. A Study of Intelligence Value and Employment of Political Data in Big Data Environment——The Risk Avoidance of Customs Declaration Commodities[J]. Scientific Information Research, 2020,2(4):78-89.)
Lambrou T, Kudumakis P, Speller R, et al. Classification of Audio Signals Using Statistical Features on Time and Wavelet Transform Domains[C]// Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. 1998: 3621-3624.
Mandel M I, Ellis D P W. Song-Level Features and Support Vector Machines for Music Classification[C]// Proceedings of the 6th International Conference on Music Information Retrieval. 2005.
Li T, Ogihara M, Li Q. A Comparative Study on Content-Based Music Genre Classification[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003: 282-289.
Shao X, Xu C S, Kankanhalli M S. Unsupervised Classification of Music Genre Using Hidden Markov Model[C]// Proceedings of 2004 IEEE International Conference on Multimedia and Expo(ICME). 2004: 2023-2026.
Silla Jr C N, Koerich A L, Kaestner C A A. Feature Selection in Automatic Music Genre Classification[C]// Proceedings of 2008 10th IEEE International Symposium on Multimedia. 2008: 39-44.
Rajanna A R, Aryafar K, Shokoufandeh A, et al. Deep Neural Networks: A Case Study for Music Genre Classification[C]// Proceedings of 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). 2015: 655-660.
雷文康. 基于深度神经网络的音乐流派分类研究[D]. 广州: 华南理工大学, 2017.
( Lei Wenkang. Research of Music Genre Classification Based on Deep Neural Network[D]. Guangzhou: South China University of Technology, 2017.)
Zhang W B, Lei W K, Xu X M, et al. Improved Music Genre Classification with Convolutional Neural Networks[C]// Proceedings of the 17th Annual Conference of the International Speech Communication Association. 2016: 3304-3308.
Bahuleyan H. Music Genre Classification Using Machine Learning Techniques[OL]. arXiv Preprint, arXiv: 1804. 01149.
Defferrard M, Benzi K, Vandergheynst P, et al. FMA: A Dataset for Music Analysis[OL]. arXiv Preprint, arXiv: 1612. 01840.
Kim J, Urbano J, Liem C C S, et al. One Deep Music Representation to Rule Them All? A Comparative Analysis of Different Representation Learning Strategies[J]. Neural Computing and Applications, 2020,32(4):1067-1093.
Benzi K M. From Recommender Systems to Spatio-Temporal Dynamics with Network Science[R]. EPFL, 2017.
Choi J, Nam J. Zero-shot Learning for Audio-based Music Classification and Tagging[C]//Proceedings of the 20th International Society for Music Information Retrieval Conference ( ISMIR). 2019.
McFee B, Raffel C, Liang D, et al. librosa: Audio and Music Signal Analysis in Python[C]// Proceedings of the 14th Python in Science Conference. 2015: 18-25.
( Chen Gong, Zhang Xiongwei. Speech/Music Discrimination Method Based on Gray Correlation Analysis[J]. Technical Acoustics, 2007,26(2):262-267.)
王冬冬. 基于节拍和关键背景模型的音频分类与分割[D]. 哈尔滨: 哈尔滨工业大学, 2017.
( Wang Dongdong. Audio Classification and Segmentation Method Based on Beats and Key Background Models[D]. Harbin: Harbin Institute of Technology, 2017.)
Qawaqneh Z, Mallouh A A, Barkana B D. Deep Neural Network Framework and Transformed MFCCs for Speaker's Age and Gender Classification[J]. Knowledge-Based Systems, 2017,115:5-14.
Ellis D P W, Poliner G E. Identifying ‘Cover Songs’ with Chroma Features and Dynamic Programming Beat Tracking[C]// Proceedings of 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. 2007: 1429-1432.
Jiang D N, Lu L, Zhang H J, et al. Music Type Classification by Spectral Contrast Feature[C]// Proceedings of IEEE International Conference on Multimedia and Expo. 2002: 113-116.
Kos M, KačIč Z, Vlaj D,. Acoustic Classification and Segmentation Using Modified Spectral Roll-Off and Variance-Based Features[J]. Digital Signal Processing, 2013,23(2):659-674.
魏利利. 音频信号分类算法研究[D]. 大连: 大连理工大学, 2009.
( Wei Lili. The Research on Audio Signal Classification[D]. Dalian: Dalian University of Technology, 2009.)