Evaluation of parallel and sequential deep learning models for music subgenre classification

The second author is supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC)

Abstract / Introduction
  • In this paper, we evaluate two deep learning models which integrate convolutional and recurrent neural networks. We implement both sequential and parallel architectures for fine-grain musical subgenre classification. Due to the exceptionally low signal to noise ratio (SNR) of our low level mel-spectrogram dataset, more sensitive yet robust learning models are required to generate meaningful results. We investigate the effects of three commonly applied optimizers, dropout, batch regularization, and sensitivity to varying initialization distributions. The results demonstrate that the sequential model specifically requires the RMSprop optimizer, while the parallel model implemented with the Adam optimizer yielded encouraging and stable results achieving an average F1 score of $ 0.63 $. When all factors are considered, the optimized hybrid parallel model outperformed the sequential in classification accuracy and system stability.

    Mathematics Subject Classification: Primary: 68T07; Secondary: 68T10.


  • Figure 1.  Baseline CNN model

    Figure 2.  CRNN sequential architecture

    Figure 3.  Parallel CNN-RNN architecture

    Figure 4.  Visualization of one song from our dataset

    Figure 5.  RMSprop learning process on two axes

    Figure 6.  Classification accuracy across 50 epochs

    Table 1.  F1 scores for optimizer evaluation

    Optimizer CNN CRNN CNN-RNN
    Adam 0.45 0.32 0.63
    Adadelta 0.30 0.31 0.35
    RMSprop 0.41 0.54 0.60
    Table 2.  Optimal classification accuracy

    Optimizer Adam RMSprop Adam
    Accuracy 0.31 0.57 0.64
    Table 3.  Marco F1 scores for effect of regularization

    Model Data Dropout Batch Normalization Dropout + Batch Normalization
    CRNN Train 0.67 1.00 0.98
    Validation 0.65 0.58 0.60
    Test 0.62 0.57 0.41
    CNN-RNN Train 0.65 1.00 0.90
    Validation 0.65 0.58 0.60
    Test 0.63 0.61 0.63
    Table 4.  Average F1 accuracy scores for effects of initialization methods

    Initialization CNN CRNN CNN-RNN
    Glorot Normal 0.31 0.63 0.63
    Glorot Uniform 0.34 0.60 0.59
    Random Normal 0.33 0.45 0.53
    Random Uniform 0.33 0.37 0.57
