skip to main content
10.1007/978-3-031-10363-6_3guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Repairing Adversarial Texts Through Perturbation

Published: 08 July 2022 Publication History

Abstract

It is known that neural networks are subject to attacks through adversarial perturbations. Worse yet, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address this, in this work, we focus on the text domain and propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Experimental results show that our approach effectively repairs about 80% of adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired about one second on average.

References

[1]
Agarwal A, Singh R, Vatsa M, and Ratha NK Image transformation based defense against adversarial perturbation on deep learning models IEEE Trans. Dependable Secure Comput. 2021 15 5 2106-2121
[2]
Akhtar, N., Liu, J., Mian, A.: Defense against universal adversarial perturbations. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3389–3398 (2018)
[3]
Anscombe FJ Fixed-sample-size analysis of sequential observations Biometrics 1954 10 1 89-100
[4]
Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, pp. 274–283 (2018)
[5]
Avriel, M., Wilde, D.J.: Optimally proof for the symmetric Fibonacci search technique. Fibonacci Q. J. (1966)
[6]
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of the 38th IEEE Symposium on Security and Privacy, pp. 39–57 (2017)
[7]
Carlini, N., Wagner, D.A.: Audio adversarial examples: targeted attacks on speech-to-text. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 1–7 (2018)
[8]
Das, N., et al.: Keeping the bad guys out: protecting and vaccinating deep learning with JPEG compression. CoRR abs/1705.02900 (2017)
[9]
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: Proceedings of the 6th International Conference on Learning Representations (2018)
[10]
Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
[11]
Dua, D., Graff, C.: UCI machine learning repository (2017)
[12]
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 31–36 (2018)
[13]
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: Proceedings of the 39th IEEE Symposium on Security and Privacy Workshops, pp. 50–56 (2018)
[14]
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2015)
[15]
Hochreiter S and Schmidhuber J Long short-term memory Neural Comput. 1997 9 8 1735-1780
[16]
Hosseini, H., Kannan, S., Zhang, B., Poovendran, R.: Deceiving Google’s perspective API built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017)
[17]
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 125–136 (2019)
[18]
Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. arXiv preprint arXiv:1804.06059 (2018)
[19]
Jain G, Sharma M, and Agarwal B Spam detection in social media using convolutional and long short term memory neural network Ann. Math. Artif. Intell. 2019 85 1 21-44
[20]
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 8018–8025 (2020)
[21]
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014)
[22]
Li, J., Ji, S., Du, T., Li, B., Wang, T.: TextBugger: generating adversarial text against real-world applications. In: Proceedings of the 26th Annual Network and Distributed System Security Symposium (2019)
[23]
Liu, J., et al.: Detection based defense against adversarial examples from the steganalysis point of view. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4825–4834 (2019)
[24]
Lu, J., Issaranon, T., Forsyth, D.: SafetyNet: detecting and rejecting adversarial examples robustly. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 446–454 (2017)
[25]
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations (2018)
[26]
McKeeman WM Differential testing for software Digit. Tech. J. 1998 10 1 100-107
[27]
Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147 (2017)
[28]
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
[29]
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: Proceedings of the 2016 IEEE Military Communications Conference, pp. 49–54 (2016)
[30]
Pruthi, D., Dhingra, B., Lipton, Z.C.: Combating adversarial misspellings with robust word recognition, pp. 5582–5591 (2019)
[31]
Rastegari M, Ordonez V, Redmon J, and Farhadi A Leibe B, Matas J, Sebe N, and Welling M XNOR-Net: ImageNet classification using binary convolutional neural networks Computer Vision – ECCV 2016 2016 Cham Springer 525-542
[32]
Ren Y and Ji D Neural networks for deceptive opinion spam detection: an empirical study Inf. Sci. 2017 385 213-224
[33]
Ribeiro, M.T., Singh, S., Guestrin, C.: Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 856–865 (2018)
[34]
Rosenberg, I., Shabtai, A., Elovici, Y., Rokach, L.: Defense methods against adversarial examples for recurrent neural networks. arXiv preprint arXiv:1901.09963 (2019)
[35]
Schuster M and Paliwal KK Bidirectional recurrent neural networks IEEE Trans. Signal Process. 1997 45 11 2673-2681
[36]
Segler MH, Preuss M, and Waller MP Planning chemical syntheses with deep neural networks and symbolic AI Nature 2018 555 7698 604
[37]
Shafahi, A., et al.: Adversarial training for free! In: Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pp. 3353–3364 (2019)
[38]
Shaffer JP Multiple hypothesis testing Annu. Rev. Psychol. 1995 46 1 561-584
[39]
Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: PixelDefend: leveraging generative models to understand and defend against adversarial examples. In: Proceedings of the 6th International Conference on Learning Representations (2018)
[40]
Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (2014)
[41]
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
[42]
Van Erven T and Harremos P Rényi divergence and Kullback-Leibler divergence IEEE Trans. Inf. Theory 2014 60 7 3797-3820
[43]
Wald A Sequential tests of statistical hypotheses Ann. Math. Stat. 1945 16 2 117-186
[44]
Wald A Sequential Analysis 1947 1 Hoboken Wiley
[45]
Wang, J., Dong, G., Sun, J., Wang, X., Zhang, P.: Adversarial sample detection for deep neural network through model mutation testing. In: Proceedings of the 41st International Conference on Software Engineering, pp. 1245–1256. IEEE Press (2019)
[46]
Wang W, Wang R, Ke J, and Wang L TextFirewall: omni-defending against adversarial texts in sentiment classification IEEE Access 2021 9 27467-27475
[47]
Wang, X., Jin, H., He, K.: Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723 (2019)
[48]
Xu H et al. Adversarial attacks and defenses in images, graphs and text: a review Int. J. Autom. Comput. 2020 17 2 151-178
[49]
Yang, J., Wu, M., Liu, X.Z.: Defense against adversarial attack using PCA. In: Proceedings of the 6th International Conference on Artificial Intelligence and Security, pp. 627–636 (2020)
[50]
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
[51]
Zhang, H., Chen, H., Song, Z., Boning, D., Dhillon, I.S., Hsieh, C.J.: The limitations of adversarial training and the blind-spot attack. arXiv preprint arXiv:1901.04684 (2019)
[52]
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems, pp. 649–657 (2015)
[53]
Zheng, Z., Hong, P.: Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In: Proceedings of the 32nd Annual Conference on Neural Information Processing Systems, pp. 7913–7922 (2018)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Theoretical Aspects of Software Engineering: 16th International Symposium, TASE 2022, Cluj-Napoca, Romania, July 8–10, 2022, Proceedings
Jul 2022
440 pages
ISBN:978-3-031-10362-9
DOI:10.1007/978-3-031-10363-6

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 July 2022

Author Tags

  1. Adversarial text
  2. Detection
  3. Repair
  4. Perturbation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media