skip to main content
10.1145/3590003.3590077acmotherconferencesArticle/Chapter ViewAbstractPublication PagescacmlConference Proceedingsconference-collections
research-article

Gaussian-guided character erasure for data augment of industrial characters

Published: 29 May 2023 Publication History

Abstract

The application of scene text erasure technology in privacy protection, camera-based virtual reality translation and image editing has attracted more and more research interests. Recent efforts on scene text erasing have shown promising results. We utilize text removal methods as a component of industrial characters generation procedure to generate large-scale synthetic character images so as to mitigate the issue of insufficient samples in the recognition task of industrial characters. Existing character erasure models has achieved good performance in natural scenes. However, in industrial scenes, these erasure networks are easily affected by salient no-character regions leading to the attention shift. To overcome this limitation, we proposed a character erasure network based on attention mechanism which embed an additional region awareness layer to guide attention to the correct character regions. Meanwhile, we devise a gaussian heat map supervision method for learning additional region awareness layer. The experiments show that the proposed method performs favourably on four industrial character datasets.

References

[1]
Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion 58 (2020), 82–115.
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[3]
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9365–9374.
[4]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291–7299.
[5]
Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. 2022. Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2969–2978.
[6]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
[7]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
[8]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501–1510.
[9]
Kohei Inai, Mårten Pålsson, Volkmar Frinken, Yaokai Feng, and Seiichi Uchida. 2014. Selective concealment of characters for privacy protection. In 2014 22nd International Conference on Pattern Recognition. IEEE, 333–338.
[10]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125–1134.
[11]
Junho Kim, Minjae Kim, Hyeonwoo Kang, and Kwanghee Lee. 2019. U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830 (2019).
[12]
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. 2020. Real-time scene text detection with differentiable binarization. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 11474–11481.
[13]
Jun Ling, Han Xue, Li Song, Rong Xie, and Xiao Gu. 2021. Region-aware adaptive instance normalization for image harmonization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9361–9370.
[14]
Chongyu Liu, Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Canjie Luo, and Yongpan Wang. 2020. EraseNet: End-to-end text removal in the wild. IEEE Transactions on Image Processing 29 (2020), 8760–8775.
[15]
Shangbang Long, Xin He, and Cong Yao. 2021. Scene text detection and recognition: The deep learning era. International Journal of Computer Vision 129, 1 (2021), 161–184.
[16]
Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2794–2802.
[17]
Giovanni Mariani, Florian Scheidegger, Roxana Istrate, Costas Bekas, and Cristiano Malossi. 2018. Bagan: Data augmentation with balancing gan. arXiv preprint arXiv:1803.09655 (2018).
[18]
Toshiki Nakamura, Anna Zhu, Keiji Yanai, and Seiichi Uchida. 2017. Scene text eraser. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 832–837.
[19]
Sungho Suh, Haebom Lee, Paul Lukowicz, and Yong Oh Lee. 2021. CEGAN: Classification Enhancement Generative Adversarial Networks for unraveling data imbalance problems. Neural Networks 133 (2021), 69–86.
[20]
Hao Tang, Hong Liu, Dan Xu, Philip HS Torr, and Nicu Sebe. 2021. Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Transactions on Neural Networks and Learning Systems (2021).
[21]
Osman Tursun, Simon Denman, Rui Zeng, Sabesan Sivapalan, Sridha Sridharan, and Clinton Fookes. 2020. MTRNet++: One-stage mask-based scene text eraser. Computer Vision and Image Understanding 201 (2020), 103066.
[22]
Osman Tursun, Rui Zeng, Simon Denman, Sabesan Sivapalan, Sridha Sridharan, and Clinton Fookes. 2019. MTRNet: A generic scene text eraser. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 39–44.
[23]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[24]
Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, and Xiang Bai. 2020. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE transactions on pattern analysis and machine intelligence 43, 4 (2020), 1452–1459.
[25]
Shuaitao Zhang, Yuliang Liu, Lianwen Jin, Yaoxiong Huang, and Songxuan Lai. 2019. Ensnet: Ensconce text in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 801–808.
[26]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921–2929.
[27]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223–2232.

Index Terms

  1. Gaussian-guided character erasure for data augment of industrial characters

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CACML '23: Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning
    March 2023
    598 pages
    ISBN:9781450399449
    DOI:10.1145/3590003
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention shift
    2. character erasure
    3. neural networks
    4. region awareness

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • jointly funded by Guangdong and Macao.

    Conference

    CACML 2023

    Acceptance Rates

    CACML '23 Paper Acceptance Rate 93 of 241 submissions, 39%;
    Overall Acceptance Rate 93 of 241 submissions, 39%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 18
      Total Downloads
    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media