Article

Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-K Selection Discriminator

Authors:

Yifan HuangAuthors Info & Claims

Advanced Intelligent Computing Technology and Applications: 19th International Conference, ICIC 2023, Zhengzhou, China, August 10–13, 2023, Proceedings, Part II

Pages 283 - 295

https://doi.org/10.1007/978-981-99-4742-3_23

Published: 10 August 2023 Publication History

Abstract

Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification (ESC) augmentation technique based on the diffusion probabilistic model (DPM) with DPM-Solver ++ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we propose a top-k selection technique to filter out the low-quality synthetic data samples. According to the experiment results, the synthetic data samples have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation.

References

[1]

Ho J et al. Denoising diffusion probabilistic models Adv. Neural. Inf. Process. Syst. 2020 33 6840-6851

[2]

Salamon J and Bello JP Deep convolutional neural networks and data augmentation for environmental sound classification IEEE Sig. Process. Lett. 2016 24 279-283

[3]

Gong, Y., et al.: AST: Audio Spectrogram Transformer. ArXiv abs/2104.01778 (2021)

[4]

Bahmei, B., et al.: CNN-RNN and data augmentation using deep convolutional generative adversarial network for environmental sound classification. IEEE Sign. Process. Lett. 29, 682–686 (2022)

[5]

Hershey, S., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135 (2016)

[6]

Zhu, X., et al.: Emotion classification with data augmentation using generative adversarial networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 349–360 (2018)

[7]

Arjovsky, M., et al.: Wasserstein GAN. ArXiv abs/1701.07875 (2017)

[8]

Zhao, H., et al.: Bias and generalization in deep generative models: an empirical study. Neural Inf. Process. Syst. 13 (2018)

[9]

Ho, J., et al.: Denoising Diffusion Probabilistic Models. ArXiv abs/2006.11239 (2020)

[10]

Dhariwal, P., Nichol, A.: Diffusion Models Beat GANs on Image Synthesis. ArXiv abs/2105.05233 (2021)

[11]

Müller-Franzes, G., et al.: Diffusion Probabilistic Models beat GANs on Medical Images. ArXiv abs/2212.07501 (2022)

[12]

Maz’e, F., Ahmed, F.: Diffusion Models Beat GANs on Topology Optimization (2022)

[13]

Song, J., et al.: Denoising Diffusion Implicit Models. ArXiv abs/2010.02502 (2020)

[14]

Cheng, L., et al.: DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. ArXiv abs/2211.01095 (2022)

[15]

Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)

[16]

Dickstein, S., Narain, J., et al.: Deep Unsupervised Learning using Nonequilibrium Thermodynamics. ArXiv abs/1503.03585 (2015)

[17]

Saharia, C., et al.: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. ArXiv abs/2205.11487 (2022)

[18]

Font, F., et al.: Freesound technical demo. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 411–412 (2013)

[19]

He, K., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2015)

[20]

lucidrains.2023.denoising-diffusion-pytorch (2023). https://github.com/lucidrains/denoising-diffusion-pytorch

[21]

Ronneberger, O., et al.: U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv abs/1505.04597 (2015)

[22]

Iwana, B.K., Uchida, S.: An empirical survey of data augmentation for time series classification with neural networks. Plos One 16 (2020)

[23]

rw2019timm, Ross Wightman, PyTorch Image Models (2019). https://github.com/rwightman/pytorch-image-models

[24]

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2017)

[25]

Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ArXiv. /abs/1312.6114 (2013). Accessed 22 March 2023

[26]

Ho, J.: Classifier-Free Diffusion Guidance. ArXiv abs/2207.12598 (2022)

[27]

Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258 (2016)

[28]

d’Ascoli, S., et al.: ConViT: improving vision transformers with soft convolutional inductive biases. J. Statist. Mech. Theory Experiment 2022 (2021)

[29]

Mehta, S., Rastegari, M.: MobileViT: Light-weight, General purpose, and Mobile-friendly Vision Transformer. ArXiv abs/2110.02178 (2021)

[30]

Liu, Z., et al.: A ConvNet for the 2020s. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11966–11976 1800–1807 (2022)

[31]

Touvron, H., et al.: DeiT III: Revenge of the ViT. ArXiv abs/2204.07118 (2022)

[32]

.von-platen-etal-2022-diffusers, Patrick von Platen et al. 2022, Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers

[33]

Chen, Y., et al.: Effective audio classification network based on paired inverse pyramid structure and dense MLP Block. ArXiv abs/2211.02940 (2022)

Recommendations

EnvGAN: a GAN-based augmentation to improve environmental sound classification
Abstract
Several deep learning algorithms have emerged for the automatic classification of environmental sounds. However, the non-availability of adequate labeled data for training limits the performance of these algorithms. Data augmentation is an ...
Unsupervised feature learning for environmental sound classification using Weighted Cycle-Consistent Generative Adversarial Network
Abstract
In this paper we propose a novel environmental sound classification approach incorporating unsupervised feature learning via the spherical K-Means++ algorithm and a new architecture for high-level data augmentation. The audio signal is ...
Highlights
- A novel cycle-consistent generative adversarial network for augmenting spectrograms.
- SURF features are encoded into a codebook using K-means++ algorithm.
- A random forest trained on codewords improve the recognition accuracy.
- ...
Deep convolutional neural network for environmental sound classification via dilation
Soft Computing Applications

In the recent time, enviromental sound classification has received much popularity. This area of research comes under domain of non-speech audio classification. In this work, we have proposed a dilated Convolutional Neural Network approch to classify ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Advanced Intelligent Computing Technology and Applications: 19th International Conference, ICIC 2023, Zhengzhou, China, August 10–13, 2023, Proceedings, Part II

Aug 2023

826 pages

ISBN:978-981-99-4741-6

DOI:10.1007/978-981-99-4742-3

Editors:
De-Shuang Huang
Department of Computer Science, Eastern Institute of Technology, Zhejiang, China
,
Prashan Premaratne
University of Wollongong, North Wollongong, NSW, Australia
,
Baohua Jin
Zhengzhou University of Light Industry, Zhengzhou, China
,
Boyang Qu
Zhong Yuan University of Technology, Zhengzhou, China
,
Kang-Hyun Jo
University of Ulsan, Ulsan, Korea (Republic of)
,
Abir Hussain
Department of Computer Science, Liverpool John Moores University, Liverpool, UK

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 August 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents