skip to main content
10.1145/3397481.3450675acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

DeepNAG: Deep Non-Adversarial Gesture Generation

Published: 14 April 2021 Publication History

Abstract

Synthetic data generation to improve classification performance (data augmentation) is a well-studied problem. Recently, generative adversarial networks (GAN) have shown superior image data augmentation performance, but their suitability in gesture synthesis has received inadequate attention. Further, GANs prohibitively require simultaneous generator and discriminator network training. We tackle both issues in this work. We first discuss a novel, device-agnostic GAN model for gesture synthesis called DeepGAN. Thereafter, we formulate DeepNAG by introducing a new differentiable loss function based on dynamic time warping and the average Hausdorff distance, which allows us to train DeepGAN’s generator without requiring a discriminator. Through evaluations, we compare the utility of DeepGAN and DeepNAG against two alternative techniques for training five recognizers using data augmentation over six datasets. We further investigate the perceived quality of synthesized samples via an Amazon Mechanical Turk user study based on the HYPE∞ benchmark. We find that DeepNAG outperforms DeepGAN in accuracy, training time (up to 17 × faster), and realism, thereby opening the door to a new line of research in generator network design and training for gesture synthesis. Our source code is available at https://www.deepnag.com.

References

[1]
2018. CyCADA: Cycle Consistent Adversarial Domain Adaptation. In International Conference on Machine Learning (ICML).
[2]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34nd International Conference on Machine Learning, ICML 2017, Sydney, Australia.
[3]
Dana H. Ballard. 1987. Modular Learning in Neural Networks. In Proceedings of the Sixth National Conference on Artificial Intelligence - Volume 1 (Seattle, Washington) (AAAI’87). AAAI Press, 279–284.
[4]
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2015. Generating Sentences from a Continuous Space. arxiv:1511.06349 [cs.LG]
[5]
Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. 2018. Understanding disentangling in β-VAE. arxiv:1804.03599 [stat.ML]
[6]
F. M. Caputo, S. Burato, G. Pavan, T. Voillemin, H. Wannous, J. P. Vandeborre, M. Maghoumi, E. M. Taranta II, A. Razmjoo, J. J. LaViola Jr., F. Manganaro, S. Pini, G. Borghi, R. Vezzani, R. Cucchiara, H. Nguyen, M. T. Tran, and A. Giachetti. 2019. Online Gesture Recognition. In Eurographics Workshop on 3D Object Retrieval.
[7]
Marco Cuturi and Mathieu Blondel. 2017. Soft-DTW: a differentiable loss function for time-series. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 894–903.
[8]
Kenny Davila, Stephanie Ludi, and Richard Zanibbi. 2014. Using off-line features and synthetic data for on-line handwritten math symbol recognition. In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE, 323–328.
[9]
M. . Dubuisson and A. K. Jain. 1994. A modified Hausdorff distance for object matching. In Proceedings of 12th International Conference on Pattern Recognition, Vol. 1. 566–568 vol.1. https://doi.org/10.1109/ICPR.1994.576361
[10]
Cristobal Esteban, Stephanie L. Hyland, and Gunnar Ratsch. 2017. Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. arxiv:1706.02633 [stat.ML]
[11]
Andreas Fischer, Muriel Visani, Van Cuong Kieu, and Ching Y. Suen. 2013. Generation of Learning Samples for Historical Handwriting Recognition Using Image Degradation. In Proceedings of the 2Nd International Workshop on Historical Document Imaging and Processing (Washington, District of Columbia, USA) (HIP ’13). ACM, New York, NY, USA, 73–79.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
[13]
Alex Graves. 2013. Generating Sequences With Recurrent Neural Networks. CoRR abs/1308.0850(2013). arxiv:1308.0850
[14]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems NIPS’17 (Long Beach, California, USA).
[15]
Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2017. Long Text Generation via Adversarial Training with Leaked Information. arXiv preprint arXiv:1709.08624(2017).
[16]
David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. ArXiv e-prints (April 2017). arxiv:1704.03477 [cs.NE]
[17]
G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313, 5786 (2006), 504–507. https://doi.org/10.1126/science.1127647
[18]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[19]
Diederik P Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. arxiv:1312.6114 [stat.ML]
[20]
Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC(LLVM ’15). ACM, Article 7, 6 pages.
[21]
Do-Hoon Lee and Hwan-Gue Cho. 1998. A new synthesizing method for handwriting Korean scripts. International Journal of Pattern Recognition and Artificial Intelligence 12, 01(1998), 45–61.
[22]
Luis A. Leiva. 2017. Large-Scale User Perception of Synthetic Stroke Gestures. In Proceedings of the 2017 Conference on Designing Interactive Systems (Edinburgh, United Kingdom). Association for Computing Machinery, 1135–1140.
[23]
Luis A. Leiva, Daniel Martín-Albo, and Réjean Plamondon. 2015. Gestures À Go Go: Authoring Synthetic Human-Like Stroke Gestures Using the Kinematic Theory of Rapid Movements. ACM Trans. Intell. Syst. Technol. 7, 2 (Nov. 2015), 15:1–15:29.
[24]
Luis A. Leiva, Daniel Martín-Albo, and Réjean Plamondon. 2017. The Kinematic Theory Produces Human-Like Stroke Gestures. Interacting with Computers 29, 4 (July 2017), 552–565.
[25]
W. Li, Z. Zhang, and Z. Liu. 2010. Action recognition based on a bag of 3D points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.
[26]
Zhongliang Li, Tian Xia, Xingyu Lou, Kaihe Xu, Shaojun Wang, and Jing Xiao. 2019. Adversarial Discrete Sequence Generation without Explicit NeuralNetworks as Discriminators. In Proceedings of Machine Learning Research(Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 3089–3098.
[27]
Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. 2017. Adversarial Ranking for Language Generation. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). 11.
[28]
Mehran Maghoumi and Joseph J. LaViola. 2019. DeepGRU: Deep Gesture Recognition Utility. In Advances in Visual Computing. Springer International Publishing, 16–31.
[29]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W.
[30]
Luis Perez and Jason Wang. 2017. The Effectiveness of Data Augmentation in Image Classification using Deep Learning. arxiv:1712.04621 [cs.CV]
[31]
Réjean Plamondon and Moussa Djioua. 2006. A multi-level representation paradigm for handwriting stroke generation. Human movement science 25, 4 (2006), 586–607.
[32]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arxiv:1511.06434 [cs.LG]
[33]
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’12) (Beijing, China).
[34]
Javier Ribera, David Güera, Yuhao Chen, and Edward J. Delp. 2019. Locating Objects Without Bounding Boxes. Proceedings of the Computer Vision and Pattern Recognition (CVPR) (June 2019). Long Beach, CA.
[35]
Dean Rubine. 1991. Specifying Gestures by Example. In Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’91). 329–337.
[36]
H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 1 (February 1978), 43–49. https://doi.org/10.1109/TASSP.1978.1163055
[37]
Jia Sheng. 2003. A study of adaboost in 3d gesture recognition. Department of Computer Science, University of Toronto (2003).
[38]
S. Shin and W. Kim. 2020. Skeleton-Based Dynamic Hand Gesture Recognition Using a Part-Based GRU-RNN for Gesture-Based Interface. IEEE Access 8(2020), 50236–50243.
[39]
Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 3483–3491.
[40]
Hao Tang, Wei Wang, Dan Xu, Yan Yan, and Nicu Sebe. 2018. GestureGAN for Hand Gesture-to-Gesture Translation in the Wild. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM ’18). ACM, 774–782.
[41]
Eugene M. Taranta, II and Joseph J. LaViola, Jr.2015. Penny Pincher: A Blazing Fast, Highly Accurate $-family Recognizer. In Proceedings of the 41st Graphics Interface Conference (GI ’15) (Halifax, Nova Scotia, Canada).
[42]
Eugene M Taranta II, Mehran Maghoumi, Corey R Pittman, and Joseph J LaViola Jr. 2016. A rapid prototyping approach to synthetic data generation for improved 2D gesture recognition. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 873–885.
[43]
Eugene M. Taranta II, Amirreza Samiei, Mehran Maghoumi, Pooya Khaloo, Corey R. Pittman, and Joseph J. LaViola Jr.2017. Jackknife: A Reliable Recognizer with Few Samples and Many Modalities. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17) (Denver, Colorado, USA). 12.
[44]
Tamás Varga, Daniel Kilchhofer, and Horst Bunke. 2005. Template-based synthetic handwriting generation for the training of recognition systems. In Proceedings of the 12th Conference of the International Graphonomics Society. 206–211.
[45]
Jacob O Wobbrock, Andrew D Wilson, and Yang Li. 2007. Gestures Without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology(UIST ’07).
[46]
L. Xia, C.C. Chen, and JK Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, 20–27.
[47]
Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, and Dahua Lin. 2018. Pose Guided Human Video Generation. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 204–219.
[48]
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI’17). AAAI Press, 2852–2858.
[49]
Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L. Berg, and Dimitris Samaras. 2012. Two-person Interaction Detection Using Body-Pose Features and Multiple Instance Learning. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE.
[50]
X. Zhang, F. Yin, Y. Zhang, C. Liu, and Y. Bengio. 2018. Drawing and Recognizing Chinese Characters with Recurrent Neural Network. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (April 2018), 849–862. https://doi.org/10.1109/TPAMI.2017.2695539
[51]
Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Li F Fei-Fei, and Michael Bernstein. 2019. Hype: A benchmark for human eye perceptual evaluation of generative models. In Advances in Neural Information Processing Systems. 3449–3461.
[52]
H. Zhu, Z. Gu, H. Zhao, K. Chen, C. Li, and L. He. 2018. Developing a pattern discovery method in time series data and its GPU acceleration. Big Data Mining and Analytics 1, 4 (2018), 266–283.
[53]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on.

Cited By

View all

Index Terms

  1. DeepNAG: Deep Non-Adversarial Gesture Generation
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces
            April 2021
            618 pages
            ISBN:9781450380171
            DOI:10.1145/3397481
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 14 April 2021

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. deep neural networks
            2. dynamic time warping
            3. generative adversarial networks
            4. generative modeling
            5. gesture generation

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            IUI '21
            Sponsor:

            Acceptance Rates

            Overall Acceptance Rate 746 of 2,811 submissions, 27%

            Upcoming Conference

            IUI '25

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)43
            • Downloads (Last 6 weeks)5
            Reflects downloads up to 06 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media