skip to main content
10.1007/978-981-99-4742-3_23guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-K Selection Discriminator

Published: 10 August 2023 Publication History

Abstract

Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification (ESC) augmentation technique based on the diffusion probabilistic model (DPM) with DPM-Solver ++ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we propose a top-k selection technique to filter out the low-quality synthetic data samples. According to the experiment results, the synthetic data samples have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation.

References

[1]
Ho J et al. Denoising diffusion probabilistic models Adv. Neural. Inf. Process. Syst. 2020 33 6840-6851
[2]
Salamon J and Bello JP Deep convolutional neural networks and data augmentation for environmental sound classification IEEE Sig. Process. Lett. 2016 24 279-283
[3]
Gong, Y., et al.: AST: Audio Spectrogram Transformer. ArXiv abs/2104.01778 (2021)
[4]
Bahmei, B., et al.: CNN-RNN and data augmentation using deep convolutional generative adversarial network for environmental sound classification. IEEE Sign. Process. Lett. 29, 682–686 (2022)
[5]
Hershey, S., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135 (2016)
[6]
Zhu, X., et al.: Emotion classification with data augmentation using generative adversarial networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 349–360 (2018)
[7]
Arjovsky, M., et al.: Wasserstein GAN. ArXiv abs/1701.07875 (2017)
[8]
Zhao, H., et al.: Bias and generalization in deep generative models: an empirical study. Neural Inf. Process. Syst. 13 (2018)
[9]
Ho, J., et al.: Denoising Diffusion Probabilistic Models. ArXiv abs/2006.11239 (2020)
[10]
Dhariwal, P., Nichol, A.: Diffusion Models Beat GANs on Image Synthesis. ArXiv abs/2105.05233 (2021)
[11]
Müller-Franzes, G., et al.: Diffusion Probabilistic Models beat GANs on Medical Images. ArXiv abs/2212.07501 (2022)
[12]
Maz’e, F., Ahmed, F.: Diffusion Models Beat GANs on Topology Optimization (2022)
[13]
Song, J., et al.: Denoising Diffusion Implicit Models. ArXiv abs/2010.02502 (2020)
[14]
Cheng, L., et al.: DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. ArXiv abs/2211.01095 (2022)
[15]
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
[16]
Dickstein, S., Narain, J., et al.: Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ArXiv abs/1503.03585 (2015)
[17]
Saharia, C., et al.: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. ArXiv abs/2205.11487 (2022)
[18]
Font, F., et al.: Freesound technical demo. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 411–412 (2013)
[19]
He, K., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2015)
[21]
Ronneberger, O., et al.: U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv abs/1505.04597 (2015)
[22]
Iwana, B.K., Uchida, S.: An empirical survey of data augmentation for time series classification with neural networks. Plos One 16 (2020)
[23]
[24]
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2017)
[25]
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ArXiv. /abs/1312.6114 (2013). Accessed 22 March 2023
[26]
Ho, J.: Classifier-Free Diffusion Guidance. ArXiv abs/2207.12598 (2022)
[27]
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258 (2016)
[28]
d’Ascoli, S., et al.: ConViT: improving vision transformers with soft convolutional inductive biases. J. Statist. Mech. Theory Experiment 2022 (2021)
[29]
Mehta, S., Rastegari, M.: MobileViT: Light-weight, General purpose, and Mobile-friendly Vision Transformer. ArXiv abs/2110.02178 (2021)
[30]
Liu, Z., et al.: A ConvNet for the 2020s. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11966–11976 1800–1807 (2022)
[31]
Touvron, H., et al.: DeiT III: Revenge of the ViT. ArXiv abs/2204.07118 (2022)
[32]
.von-platen-etal-2022-diffusers, Patrick von Platen et al. 2022, Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers
[33]
Chen, Y., et al.: Effective audio classification network based on paired inverse pyramid structure and dense MLP Block. ArXiv abs/2211.02940 (2022)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Advanced Intelligent Computing Technology and Applications: 19th International Conference, ICIC 2023, Zhengzhou, China, August 10–13, 2023, Proceedings, Part II
Aug 2023
826 pages
ISBN:978-981-99-4741-6
DOI:10.1007/978-981-99-4742-3
  • Editors:
  • De-Shuang Huang,
  • Prashan Premaratne,
  • Baohua Jin,
  • Boyang Qu,
  • Kang-Hyun Jo,
  • Abir Hussain

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 August 2023

Author Tags

  1. Diffusion Probabilistic Models
  2. Data Augmentation
  3. Environmental Sound Classification

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media