research-article

DeepNAG: Deep Non-Adversarial Gesture Generation

Authors:

Mehran Maghoumi,

Eugene Matthew Taranta,

Joseph LaViolaAuthors Info & Claims

IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces

Pages 213 - 223

https://doi.org/10.1145/3397481.3450675

Published: 14 April 2021 Publication History

Abstract

Synthetic data generation to improve classification performance (data augmentation) is a well-studied problem. Recently, generative adversarial networks (GAN) have shown superior image data augmentation performance, but their suitability in gesture synthesis has received inadequate attention. Further, GANs prohibitively require simultaneous generator and discriminator network training. We tackle both issues in this work. We first discuss a novel, device-agnostic GAN model for gesture synthesis called DeepGAN. Thereafter, we formulate DeepNAG by introducing a new differentiable loss function based on dynamic time warping and the average Hausdorff distance, which allows us to train DeepGAN’s generator without requiring a discriminator. Through evaluations, we compare the utility of DeepGAN and DeepNAG against two alternative techniques for training five recognizers using data augmentation over six datasets. We further investigate the perceived quality of synthesized samples via an Amazon Mechanical Turk user study based on the HYPE∞ benchmark. We find that DeepNAG outperforms DeepGAN in accuracy, training time (up to 17 × faster), and realism, thereby opening the door to a new line of research in generator network design and training for gesture synthesis. Our source code is available at https://www.deepnag.com.

References

[1]

2018. CyCADA: Cycle Consistent Adversarial Domain Adaptation. In International Conference on Machine Learning (ICML).

[2]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34nd International Conference on Machine Learning, ICML 2017, Sydney, Australia.

[3]

Dana H. Ballard. 1987. Modular Learning in Neural Networks. In Proceedings of the Sixth National Conference on Artificial Intelligence - Volume 1 (Seattle, Washington) (AAAI’87). AAAI Press, 279–284.

[4]

Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2015. Generating Sentences from a Continuous Space. arxiv:1511.06349 [cs.LG]

[5]

Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. 2018. Understanding disentangling in β-VAE. arxiv:1804.03599 [stat.ML]

[6]

F. M. Caputo, S. Burato, G. Pavan, T. Voillemin, H. Wannous, J. P. Vandeborre, M. Maghoumi, E. M. Taranta II, A. Razmjoo, J. J. LaViola Jr., F. Manganaro, S. Pini, G. Borghi, R. Vezzani, R. Cucchiara, H. Nguyen, M. T. Tran, and A. Giachetti. 2019. Online Gesture Recognition. In Eurographics Workshop on 3D Object Retrieval.

[7]

Marco Cuturi and Mathieu Blondel. 2017. Soft-DTW: a differentiable loss function for time-series. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 894–903.

[8]

Kenny Davila, Stephanie Ludi, and Richard Zanibbi. 2014. Using off-line features and synthetic data for on-line handwritten math symbol recognition. In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE, 323–328.

[9]

M. . Dubuisson and A. K. Jain. 1994. A modified Hausdorff distance for object matching. In Proceedings of 12th International Conference on Pattern Recognition, Vol. 1. 566–568 vol.1. https://doi.org/10.1109/ICPR.1994.576361

[10]

Cristobal Esteban, Stephanie L. Hyland, and Gunnar Ratsch. 2017. Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. arxiv:1706.02633 [stat.ML]

[11]

Andreas Fischer, Muriel Visani, Van Cuong Kieu, and Ching Y. Suen. 2013. Generation of Learning Samples for Historical Handwriting Recognition Using Image Degradation. In Proceedings of the 2Nd International Workshop on Historical Document Imaging and Processing (Washington, District of Columbia, USA) (HIP ’13). ACM, New York, NY, USA, 73–79.

Digital Library

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.

[13]

Alex Graves. 2013. Generating Sequences With Recurrent Neural Networks. CoRR abs/1308.0850(2013). arxiv:1308.0850

[14]

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems NIPS’17 (Long Beach, California, USA).

Digital Library

[15]

Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2017. Long Text Generation via Adversarial Training with Leaked Information. arXiv preprint arXiv:1709.08624(2017).

[16]

David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. ArXiv e-prints (April 2017). arxiv:1704.03477 [cs.NE]

[17]

G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science 313, 5786 (2006), 504–507. https://doi.org/10.1126/science.1127647

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[19]

Diederik P Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. arxiv:1312.6114 [stat.ML]

[20]

Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC(LLVM ’15). ACM, Article 7, 6 pages.

Digital Library

[21]

Do-Hoon Lee and Hwan-Gue Cho. 1998. A new synthesizing method for handwriting Korean scripts. International Journal of Pattern Recognition and Artificial Intelligence 12, 01(1998), 45–61.

[22]

Luis A. Leiva. 2017. Large-Scale User Perception of Synthetic Stroke Gestures. In Proceedings of the 2017 Conference on Designing Interactive Systems (Edinburgh, United Kingdom). Association for Computing Machinery, 1135–1140.

Digital Library

[23]

Luis A. Leiva, Daniel Martín-Albo, and Réjean Plamondon. 2015. Gestures À Go Go: Authoring Synthetic Human-Like Stroke Gestures Using the Kinematic Theory of Rapid Movements. ACM Trans. Intell. Syst. Technol. 7, 2 (Nov. 2015), 15:1–15:29.

Digital Library

[24]

Luis A. Leiva, Daniel Martín-Albo, and Réjean Plamondon. 2017. The Kinematic Theory Produces Human-Like Stroke Gestures. Interacting with Computers 29, 4 (July 2017), 552–565.

[25]

W. Li, Z. Zhang, and Z. Liu. 2010. Action recognition based on a bag of 3D points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[26]

Zhongliang Li, Tian Xia, Xingyu Lou, Kaihe Xu, Shaojun Wang, and Jing Xiao. 2019. Adversarial Discrete Sequence Generation without Explicit NeuralNetworks as Discriminators. In Proceedings of Machine Learning Research(Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 3089–3098.

[27]

Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. 2017. Adversarial Ranking for Language Generation. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). 11.

Digital Library

[28]

Mehran Maghoumi and Joseph J. LaViola. 2019. DeepGRU: Deep Gesture Recognition Utility. In Advances in Visual Computing. Springer International Publishing, 16–31.

[29]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W.

[30]

Luis Perez and Jason Wang. 2017. The Effectiveness of Data Augmentation in Image Classification using Deep Learning. arxiv:1712.04621 [cs.CV]

[31]

Réjean Plamondon and Moussa Djioua. 2006. A multi-level representation paradigm for handwriting stroke generation. Human movement science 25, 4 (2006), 586–607.

[32]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arxiv:1511.06434 [cs.LG]

[33]

Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’12) (Beijing, China).

Digital Library

[34]

Javier Ribera, David Güera, Yuhao Chen, and Edward J. Delp. 2019. Locating Objects Without Bounding Boxes. Proceedings of the Computer Vision and Pattern Recognition (CVPR) (June 2019). Long Beach, CA.

[35]

Dean Rubine. 1991. Specifying Gestures by Example. In Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’91). 329–337.

Digital Library

[36]

H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 1 (February 1978), 43–49. https://doi.org/10.1109/TASSP.1978.1163055

[37]

Jia Sheng. 2003. A study of adaboost in 3d gesture recognition. Department of Computer Science, University of Toronto (2003).

[38]

S. Shin and W. Kim. 2020. Skeleton-Based Dynamic Hand Gesture Recognition Using a Part-Based GRU-RNN for Gesture-Based Interface. IEEE Access 8(2020), 50236–50243.

[39]

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 3483–3491.

[40]

Hao Tang, Wei Wang, Dan Xu, Yan Yan, and Nicu Sebe. 2018. GestureGAN for Hand Gesture-to-Gesture Translation in the Wild. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM ’18). ACM, 774–782.

Digital Library

[41]

Eugene M. Taranta, II and Joseph J. LaViola, Jr.2015. Penny Pincher: A Blazing Fast, Highly Accurate $-family Recognizer. In Proceedings of the 41st Graphics Interface Conference (GI ’15) (Halifax, Nova Scotia, Canada).

[42]

Eugene M Taranta II, Mehran Maghoumi, Corey R Pittman, and Joseph J LaViola Jr. 2016. A rapid prototyping approach to synthetic data generation for improved 2D gesture recognition. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 873–885.

[43]

Eugene M. Taranta II, Amirreza Samiei, Mehran Maghoumi, Pooya Khaloo, Corey R. Pittman, and Joseph J. LaViola Jr.2017. Jackknife: A Reliable Recognizer with Few Samples and Many Modalities. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17) (Denver, Colorado, USA). 12.

[44]

Tamás Varga, Daniel Kilchhofer, and Horst Bunke. 2005. Template-based synthetic handwriting generation for the training of recognition systems. In Proceedings of the 12th Conference of the International Graphonomics Society. 206–211.

[45]

Jacob O Wobbrock, Andrew D Wilson, and Yang Li. 2007. Gestures Without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology(UIST ’07).

Digital Library

[46]

L. Xia, C.C. Chen, and JK Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE, 20–27.

[47]

Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, and Dahua Lin. 2018. Pose Guided Human Video Generation. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 204–219.

Digital Library

[48]

Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI’17). AAAI Press, 2852–2858.

[49]

Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L. Berg, and Dimitris Samaras. 2012. Two-person Interaction Detection Using Body-Pose Features and Multiple Instance Learning. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on. IEEE.

[50]

X. Zhang, F. Yin, Y. Zhang, C. Liu, and Y. Bengio. 2018. Drawing and Recognizing Chinese Characters with Recurrent Neural Network. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (April 2018), 849–862. https://doi.org/10.1109/TPAMI.2017.2695539

[51]

Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Li F Fei-Fei, and Michael Bernstein. 2019. Hype: A benchmark for human eye perceptual evaluation of generative models. In Advances in Neural Information Processing Systems. 3449–3461.

[52]

H. Zhu, Z. Gu, H. Zhao, K. Chen, C. Li, and L. He. 2018. Developing a pattern discovery method in time series data and its GPU acceleration. Big Data Mining and Analytics 1, 4 (2018), 266–283.

[53]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on.

Cited By

Mohammadi SBelgiu MStein A(2024)Few-Shot Learning for Crop Mapping from Satellite Image Time SeriesRemote Sensing10.3390/rs1606102616:6(1026)Online publication date: 14-Mar-2024
https://doi.org/10.3390/rs16061026
Bachert MHesenius M(2024)Towards a Framework for Evaluating Synthetic Surface GesturesCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3661327(22-30)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3660515.3661327
Lu HXu SZhao SHu XMa RHu B(2024)EPIC: Emotion Perception by Spatio-Temporal Interaction Context of GaitIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2022.323359728:5(2592-2601)Online publication date: May-2024
https://doi.org/10.1109/JBHI.2022.3233597
Show More Cited By

Index Terms

DeepNAG: Deep Non-Adversarial Gesture Generation
1. Computing methodologies
2. Human-centered computing

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-objective adversarial gesture generation
MIG '19: Proceedings of the 12th ACM SIGGRAPH Conference on Motion, Interaction and Games

Applications for conversational virtual agents are on the rise, but producing realistic non-verbal behavior for spoken utterances remains an unsolved problem. We explore the use of a generative adversarial training paradigm to map speech to 3D gesture ...
Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks
Information Security
Abstract
Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires expensive computing resources and a lot of training data, which are difficult to obtain for most individual ...
CapsuleGAN: Generative Adversarial Capsule Network
Computer Vision – ECCV 2018 Workshops
Abstract
We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces

April 2021

618 pages

ISBN:9781450380171

DOI:10.1145/3397481

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IUI '21

Sponsor:

IUI '21: 26th International Conference on Intelligent User Interfaces

April 14 - 17, 2021

TX, College Station, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)5

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mohammadi SBelgiu MStein A(2024)Few-Shot Learning for Crop Mapping from Satellite Image Time SeriesRemote Sensing10.3390/rs1606102616:6(1026)Online publication date: 14-Mar-2024
https://doi.org/10.3390/rs16061026
Bachert MHesenius M(2024)Towards a Framework for Evaluating Synthetic Surface GesturesCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3661327(22-30)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3660515.3661327
Lu HXu SZhao SHu XMa RHu B(2024)EPIC: Emotion Perception by Spatio-Temporal Interaction Context of GaitIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2022.323359728:5(2592-2601)Online publication date: May-2024
https://doi.org/10.1109/JBHI.2022.3233597
Meghanani AHain T(2024)SCORE: Self-Supervised Correspondence Fine-Tuning for Improved Content RepresentationsICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448060(12086-12090)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448060
Cho CChang EAnumanchipalli GKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Neural latent alignerProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618634(5661-5676)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618634
Chioccarello SSluÿters ATestolin AVanderdonckt JLambot S(2023)FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached RadarProceedings of the ACM on Human-Computer Interaction10.1145/35932317:EICS(1-25)Online publication date: 19-Jun-2023
https://dl.acm.org/doi/10.1145/3593231
Ko HPark GJeon HJo JKim JSeo J(2023)Large-scale Text-to-Image Generation Models for Visual Artists’ Creative WorksProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584078(919-933)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584078
Maslych MTaranta EAldilati MLaviola J(2023)Effective 2D Stroke-based Gesture Augmentation for RNNsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581358(1-13)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581358
Chu JAn DMa YCui WZhai SGu XBi X(2023)WordGesture-GAN: Modeling Word-Gesture Movement with Generative Adversarial NetworkProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581279(1-15)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581279
Krause MWeiß CMüller M(2023)Soft Dynamic Time Warping for Multi-Pitch Estimation and BeyondICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095907(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10095907

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents