Single-Channel Blind Image Separation Based on Transformer-Guided GAN
Abstract
:1. Introduction
1.1. Related Work
1.1.1. BIS with UNet-GANs
1.1.2. Transformer
2. Method
2.1. Structure of the Generator
Transformer Scheme
2.2. Discriminator
2.3. Loss Function
3. Datasets and Experiment Settings
3.1. The MNIST Dataset
3.2. Bags–Shoes Dataset
3.3. Experimental Settings
4. Results
4.1. Qualitative Results
4.2. Quantitative Results
4.3. Ablation Test
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cherry, E.C. Some Experiments on the Recognition of Speech, with One and with Two Ears. J. Acoust. Soc. Am. 1953, 25, 975–979. [Google Scholar] [CrossRef]
- Hyvärinen, A.; Oja, E. Independent Component Analysis: Algorithms and Applications. Neural Netw. 2000, 13, 411–430. [Google Scholar] [CrossRef] [PubMed]
- Huang, P.-S.; Chen, S.D.; Smaragdis, P.; Hasegawa-Johnson, M. Singing-Voice Separation from Monaural Recordings Using Robust Principal Component Analysis. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012. [Google Scholar]
- Cichocki, A.; Mørup, M.; Smaragdis, P.; Wang, W.; Zdunek, R. Advances in Nonnegative Matrix and Tensor Factorization. Comput. Intell. Neurosci. 2008, 2008, 825187. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Fan, Z.-C.; Lai, Y.-L.; Jang, J.-S.R. SVSGAN: Singing Voice Separation via Generative Adversarial Network. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 726–730. [Google Scholar]
- Subakan, Y.C.; Smaragdis, P. Generative Adversarial Source Separation. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 26–30. [Google Scholar]
- Hoshen, Y. Towards Unsupervised Single-Channel Blind Source Separation Using Adversarial Pair Unmix-And-Remix. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3272–3276. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
- Vaswani, A.; Brain, G.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017. [Google Scholar]
- Liu, T.; Wang, W.; Zhang, X.; Guo, Y. One to Multiple Mapping Dual Learning: Learning Multiple Signals from One Mixture. Digit. Signal Process. 2022, 129, 103686. [Google Scholar] [CrossRef]
- Sun, X.; Xu, J.; Ma, Y.; Zhao, T.; Ou, S.; Peng, L. Blind Image Separation Based on Attentional Generative Adversarial Network. J. Ambient Intell. Humaniz. Comput. 2020, 13, 1397–1404. [Google Scholar] [CrossRef]
- Jia, F.; Xu, J.; Sun, X.; Ma, Y.; Ni, M. Blind Image Separation Method Based on Cascade Generative Adversarial Networks. Appl. Sci. 2021, 11, 9416. [Google Scholar] [CrossRef]
- Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, L.; Shazeer, N.; Ku, A.; Tran, D. Image Transformer. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4055–4064. [Google Scholar]
- Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating Long Sequences with Sparse Transformers. arXiv 2019, arXiv:1904.10509. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
- Gao, Y.; Zhou, M.; Metaxas, D.N. UTNet: A hybrid transformer architecture for medical image segmentation. In Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention—MICCAI 2021, Strasbourg, France, 27 September–1 October 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 61–71. [Google Scholar]
- Gao, G.; Xu, Z.; Li, J.; Yang, J.; Zeng, T.; Qi, G.J. Ctcnet: A cnn-transformer cooperation network for face image super-resolution. IEEE Trans. Image Process. 2023, 32, 1978–1991. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Cortes, C. The Mnist Database of Handwritten Digits. Available online: https://www.semanticscholar.org/paper/The-mnist-database-of-handwritten-digits-LeCun-Cortes (accessed on 8 February 2021).
- Yu, A.; Grauman, K. Fine-Grained Visual Comparisons with Local Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23–28 June 2014; pp. 192–199. [Google Scholar]
- Zhu, J.-Y.; Krähenbühl, P.; Shechtman, E.; Efros, A.A. Generative Visual Manipulation on the Natural Image Manifold. arXiv 2018, arXiv:1609.03552. [Google Scholar]
- Halperin, T.; Ephrat, A.; Hoshen, Y. Neural Separation of Observed and Unobserved Distributions. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
- Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
PSNR (dB)/SSIM | FastICA | NMF | NES | AGAN | PDualGAN | Ours |
---|---|---|---|---|---|---|
MNIST | 14.8/0.18 | 18.5/0.36 | 21.5/0.79 | 25.2/0.87 | 24.3/0.86 | 32.9/0.94 |
Noisy MNIST | 14.1/0.04 | 17.3/0.30 | 20.0/0.74 | 23.8/0.84 | 23.7/0.85 | 30.1/0.92 |
Bags–shoes | 17.2/0.29 | 16.4/0.32 | 17.4/0.76 | 23.6/0.89 | 22.7/0.88 | 31.3/0.95 |
Noisy bags–shoes | 16.0/0.10 | 16.0/0.26 | 12.8/0.62 | 19.0/0.81 | 20.3/0.84 | 23.5/0.88 |
PSNR (dB)/SSIM | Transformer-GAN | UNet-GAN | Ours |
---|---|---|---|
Bags–shoes | 8.6/0.41 | 22.9/0.88 | 31.3/0.95 |
MNIST | 11.1/0.30 | 24.6/0.86 | 32.9/0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, Y.; Jia, D.; Shen, Y.; Wang, L. Single-Channel Blind Image Separation Based on Transformer-Guided GAN. Sensors 2023, 23, 4638. https://doi.org/10.3390/s23104638
Su Y, Jia D, Shen Y, Wang L. Single-Channel Blind Image Separation Based on Transformer-Guided GAN. Sensors. 2023; 23(10):4638. https://doi.org/10.3390/s23104638
Chicago/Turabian StyleSu, Yaya, Dongli Jia, Yankun Shen, and Lin Wang. 2023. "Single-Channel Blind Image Separation Based on Transformer-Guided GAN" Sensors 23, no. 10: 4638. https://doi.org/10.3390/s23104638