LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators
Abstract
:1. Introduction
- We introduce a novel approach for a semi-supervised GAN for text classification using a trained distribution of BERT to reach better distributions for fake data.
- We show improvements with multiple generators using linguistically meaningful intermediate hidden layers of BERT to a single generator using the last layer of trained BERT.
- LMGAN produces better fake data distributions than baseline. We analyzed the distributions by using t-SNE.
2. Related Works
2.1. Semi-Supervised Learning for Text Classification
2.2. Semi-Supervised Generative Adversarial Networks
- (1)
- The discriminator has to be perfect at classifying labeled data.
- (2)
- The generator can be imperfect at imitating but must be similar to the real data distribution.
2.3. BERT Hidden Layers
3. Methods
Semi-Supervised Learning
4. Results and Discussion
4.1. Datasets
4.2. Baseline Models
4.3. Detailed Settings
4.4. Results
4.5. Linguistic Information of BERT Hidden Layers
4.6. Ablation Study
4.7. Embedding Visualization
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Loss Plot
References
- Agerri, R.; Artola, X.; Beloki, Z.; Rigau, G.; Soroa, A. Big data for Natural Language Processing: A streaming approach. Knowl.-Based Syst. 2015, 79, 36–42. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Available online: https://openai.com/blog/language-unsupervised/ (accessed on 23 August 2022).
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Xie, Q.; Dai, Z.; Hovy, E.; Luong, T.; Le, Q. Unsupervised Data Augmentation for Consistency Training. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 6256–6268. [Google Scholar]
- Chen, J.; Yang, Z.; Yang, D. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Minneapolis, MN, USA, 2020; pp. 2147–2157. [Google Scholar] [CrossRef]
- Goodfellow, I.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Miyato, T.; Dai, A.M.; Goodfellow, I. Adversarial Training Methods for Semi-Supervised Text Classification. In Proceedings of the International Conference on Learning Representations, Singapore, 5–6 October 2017. [Google Scholar]
- Park, J.; Kim, G.; Kang, J. Consistency Training with Virtual Adversarial Discrete Perturbation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, USA, 10–15 July 2022; Association for Computational Linguistics: Minneapolis, MN, USA, 2022; pp. 5646–5656. [Google Scholar] [CrossRef]
- Croce, D.; Castellucci, G.; Basili, R. GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Minneapolis, MN, USA, 2020; pp. 2114–2119. [Google Scholar] [CrossRef]
- Van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Odena, A. Semi-Supervised Learning with Generative Adversarial Networks. arXiv 2016, arXiv:1606.01583. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X.; Chen, X. Improved Techniques for Training GANs. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
- Liu, X.; Xiang, X. How Does GAN-based Semi-supervised Learning Work? arXiv 2020, arXiv:2007.05692. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, F.; Cohen, W.W.; Salakhutdinov, R.R. Good semi-supervised learning that requires a bad gan. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Donahue, J.; Krähenbühl, P.; Darrell, T. Adversarial feature learning. arXiv 2016, arXiv:1605.09782. [Google Scholar]
- Akcay, S.; Atapour-Abarghouei, A.; Breckon, T.P. Ganomaly: Semi-supervised anomaly detection via adversarial training. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; pp. 622–637. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
- Jawahar, G.; Sagot, B.; Seddah, D. What Does BERT Learn about the Structure of Language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3651–3657. [Google Scholar] [CrossRef] [Green Version]
- Kovaleva, O.; Romanov, A.; Rogers, A.; Rumshisky, A. Revealing the Dark Secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4365–4374. [Google Scholar] [CrossRef] [Green Version]
- Kim, T.; Choi, J.; Edmiston, D.; Goo Lee, S. Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction. In Proceedings of the International Conference on Learning Representations, Virtual, 26 April–1 May 2020. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; Raffel, C.A. MixMatch: A Holistic Approach to Semi-Supervised Learning. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- LI, X.; ROTH, D. Learning question classifiers: The role of semantic information. Nat. Lang. Eng. 2006, 12, 229–249. [Google Scholar] [CrossRef] [Green Version]
- Lang, K. NewsWeeder: Learning to Filter Netnews. In Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July 1995; Prieditis, A., Russell, S., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1995; pp. 331–339. [Google Scholar] [CrossRef]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.; Potts, C. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; Association for Computational Linguistics: Seattle, WA, USA, 2013; pp. 1631–1642. [Google Scholar]
- Williams, A.; Nangia, N.; Bowman, S. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 1112–1122. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Percentage of Labeled (%) | Accuracy (%) |
---|---|
1 | 35.3 |
10 | 73.3 |
50 | 78.8 |
Dataset | #Train | #Test | #Class |
---|---|---|---|
20 News Group | 11,314 | 7531 | 20 |
QC coarse-grained | 5500 | 500 | 6 |
QC fine-grained | 5500 | 500 | 50 |
SST-5 | 8544 | 2210 | 5 |
MNLI mismatched | 392,702 | 10,000 | 3 |
MNLI matched | 392,702 | 10,000 | 3 |
Annotated (%) | ||||||||
---|---|---|---|---|---|---|---|---|
Layer Set | 1 | 2 | 5 | 10 | 20 | 30 | 40 | 50 |
{12} | 68.8 | 73.9 | 91.7 | 93.3 | 95.1 | 96.0 | 95.7 | 96.3 |
{1,2} | 65.7 | 75.7 | 91.3 | 92.7 | 94.9 | 95.9 | 95.5 | 95.8 |
{1,2,3} | 66.1 | 75.1 | 91.7 | 93.3 | 95.9 | 95.1 | 96.1 | 96.7 |
{1,2,3,4} | 66.9 | 73.7 | 92.7 | 92.7 | 94.6 | 95.2 | 95.5 | 95.5 |
{9,12} | 64.5 | 72.9 | 90.6 | 93.1 | 95.2 | 94.9 | 95.8 | 96.7 |
{6,9,12} | 67.5 | 71.1 | 92.7 | 93.4 | 94.7 | 96.0 | 95.4 | 96.2 |
{6,8,9,12} | 69.1 | 72.5 | 92.5 | 93.4 | 94.5 | 95.7 | 95.6 | 95.7 |
{6,9} | 65.2 | 74.9 | 92.4 | 93.2 | 94.7 | 95.3 | 95.9 | 96.3 |
{6,7,9} | 69.1 | 74.7 | 92.4 | 93.3 | 95.5 | 95.3 | 95.9 | 96.5 |
Annotated (%) | ||||||||
---|---|---|---|---|---|---|---|---|
Layer Set | 1 | 2 | 5 | 10 | 20 | 30 | 40 | 50 |
{12} | 34.6 | 40.7 | 46.1 | 51.6 | 57.4 | 60.6 | 61.8 | 62.9 |
{1,2} | 36.5 | 40.6 | 42.6 | 51.0 | 56.5 | 59.4 | 61.0 | 62.4 |
{1,2,3} | 36.8 | 41.7 | 45.6 | 51.6 | 57.2 | 60.1 | 62.5 | 62.9 |
{1,2,3,4} | 35.4 | 40.6 | 46.3 | 49.7 | 56.4 | 60.2 | 61.4 | 62.1 |
{9,12} | 35.5 | 40.9 | 45.1 | 51.6 | 56.9 | 59.0 | 61.8 | 61.9 |
{6,9,12} | 36.4 | 41.9 | 45.3 | 51.9 | 57.3 | 60.6 | 61.8 | 62.7 |
{6,8,9,12} | 34.6 | 41.5 | 45.8 | 51.6 | 57.0 | 60.6 | 61.9 | 61.5 |
{6,9} | 36.0 | 40.8 | 45.1 | 51.6 | 57.1 | 59.5 | 61.2 | 62.0 |
{6,7,9} | 35.9 | 41.8 | 45.6 | 51.6 | 57.3 | 59.8 | 62.2 | 62.7 |
Model | Accuracy (%) |
---|---|
LMGAN | 69.1 |
- Trained BERT of GAN-BERT at Generators | 67.9 |
- Trained Discriminator of GAN-BERT | 62.9 |
- Trained BERT and Discriminator of GAN-BERT | 64.6 |
- Multiple Generators | 68.8 |
- Cross Entropy Loss at Generator | 65.0 |
- KL Divergence Loss at Generator | 67.5 |
- Both Loss at Generator | 61.2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cho, W.; Choi, Y. LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators. Sensors 2022, 22, 8761. https://doi.org/10.3390/s22228761
Cho W, Choi Y. LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators. Sensors. 2022; 22(22):8761. https://doi.org/10.3390/s22228761
Chicago/Turabian StyleCho, Whanhee, and Yongsuk Choi. 2022. "LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators" Sensors 22, no. 22: 8761. https://doi.org/10.3390/s22228761
APA StyleCho, W., & Choi, Y. (2022). LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators. Sensors, 22(22), 8761. https://doi.org/10.3390/s22228761