research-article

Open access

MoGlow: probabilistic and controllable motion synthesis using normalising flows

Authors:

Gustav Eje Henter,

Simon Alexanderson,

Jonas BeskowAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 39, Issue 6

Article No.: 236, Pages 1 - 14

https://doi.org/10.1145/3414685.3417836

Published: 27 November 2020 Publication History

Abstract

Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics. This paper introduces a new class of probabilistic, generative, and controllable motion-data models based on normalising flows. Models of this kind can describe highly complex distributions, yet can be trained efficiently using exact maximum likelihood, unlike GANs or VAEs. Our proposed model is autoregressive and uses LSTMs to enable arbitrarily long time-dependencies. Importantly, is is also causal, meaning that each pose in the output sequence is generated without access to poses or control inputs from future time steps; this absence of algorithmic latency is important for interactive applications with real-time motion control. The approach can in principle be applied to any type of motion since it does not make restrictive, task-specific assumptions regarding the motion or the character morphology. We evaluate the models on motion-capture datasets of human and quadruped locomotion. Objective and subjective results show that randomly-sampled motion from the proposed method outperforms task-agnostic baselines and attains a motion quality close to recorded motion capture.

Supplementary Material

ZIP File (a236-henter.zip)

Supplemental material.

Download
65.46 MB

MP4 File (a236-henter.mp4)

Download
91.13 MB

MP4 File (3414685.3417836.mp4)

Presentation video

Download
371.52 MB

References

[1]

Simon Alexanderson and Gustav Eje Henter. 2020. Robust model training and generalisation with Studentising flows. In Proceedings of the Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+'20, Vol. 2). Article 15, 9 pages. https://arxiv.org/abs/2006.06599

[2]

Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-controllable speech-driven gesture synthesis using normalising flows. Comput. Graph. Forum 39, 2 (2020), 487--496.

[3]

Okan Arikan and David A. Forsyth. 2002. Interactive motion generation from examples. ACM Trans. Graph. 21, 3 (2002), 483--490.

Digital Library

[4]

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems (NIPS'15). Curran Associates, Inc., Red Hook, NY, USA, 1171--1179. http://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks

[5]

Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. 2020. What is the state of neural network pruning?. In Proceedings of the Conference on Machine Learning and Systems (MLSys'20). 129--146. https://proceedings.mlsys.org/book/2020/hash/d2ddea18f00665ce8623e36bd4e3c7c5

[6]

Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL'16). ACL, Berlin, Germany, 10--21.

[7]

Matthew Brand and Aaron Hertzmann. 2000. Style machines. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH'00). ACM Press/Addison-Wesley Publishing Co., USA, 183--192.

Digital Library

[8]

Christoph Bregler. 1997. Learning and recognizing human dynamics in video sequences. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'97). IEEE Computer Society, Los Alamitos, CA, USA, 568--574.

[9]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large scale GAN training for high fidelity natural image synthesis. In Proceedings of the International Conference on Learning Representations (ICLR'19). 35. https://openreview.net/forum?id=B1xsqj09Fm

[10]

Judith Bütepage, Michael J. Black, Danica Kragic, and Hedvig Kjellström. 2017. Deep representation learning for human motion prediction and classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'17). IEEE Computer Society, Los Alamitos, CA, USA, 1591--1599.

[11]

Jinxiang Chai and Jessica K. Hodgins. 2005. Performance animation from low-dimensional control signals. ACM Trans. Graph. 24, 3 (2005), 686--696.

Digital Library

[12]

Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2017. Variational lossy autoencoder. In Proceedings of the International Conference on Learning Representations (ICLR'17). 17. https://openreview.net/forum?id=BysvGP5ee

[13]

CMU Graphics Lab. 2003. Carnegie Mellon University motion capture database. http://mocap.cs.cmu.edu/

[14]

Chris Cremer, Xuechen Li, and David Duvenaud. 2018. Inference suboptimality in variational autoencoders. In Proceedings of the International Conference on Machine Learning (ICML'18). PMLR, 1078--1086. http://proceedings.mlr.press/v80/cremer18a.html

[15]

Gustavo Deco and Wilfried Brauer. 1994. Higher order statistical decorrelation without information loss. In Advances in Neural Information Processing Systems (NIPS'94). MIT Press, Cambridge, MA, USA, 247--254. https://papers.nips.cc/paper/901-higher-order-statistical-decorrelation-without-information-loss

[16]

Chuang Ding, Pengcheng Zhu, and Lei Xie. 2015. BLSTM neural networks for speech driven head motion synthesis. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH'15). ISCA, Grenoble, France, 3345--3349. https://www.isca-speech.org/archive/interspeech_2015/i15_3345.html

[17]

Laurent Dinh, David Krueger, and Yoshua Bengio. 2015. NICE: Non-linear independent components estimation. In Proceedings of the International Conference on Learning Representations, Workshop Track (ICLR'15). 13. https://arxiv.org/abs/1410.8516

[18]

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2017. Density estimation using Real NVP. In Proceedings of the International Conference on Learning Representations (ICLR'17). 32. https://openreview.net/forum?id=HkpbnH9lx

[19]

Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2019. Multi-objective adversarial gesture generation. In Proceedings of the ACM SIGGRAPH Conference on Motion, Interaction and Games (MIG'19). ACM, New York, NY, USA, Article 3, 10 pages.

Digital Library

[20]

Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'15). IEEE Computer Society, Los Alamitos, CA, USA, 4346--4354.

[21]

Ian Goodfellow. 2016. NIPS 2016 tutorial: Generative adversarial networks. arXiv:1701.00160

[22]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS'14). Curran Associates, Inc., Red Hook, NY, USA, 2672--2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets

[23]

F. Sebastian Grassia. 1998. Practical parameterization of rotations using the exponential map. J. Graph. Tools 3, 3 (1998), 29--48.

Digital Library

[24]

Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv:1308.0850

[25]

David Greenwood, Stephen Laycock, and Iain Matthews. 2017a. Predicting head pose from speech with a conditional variational autoencoder. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH'17). ISCA, Grenoble, France, 3991--3995.

[26]

David Greenwood, Stephen Laycock, and Iain Matthews. 2017b. Predicting head pose in dyadic conversation. In Proceedings of the International Conference on Intelligent Virtual Agents (IVA'17). Springer, Cham, Switzerland, 160--169.

[27]

Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popović. 2004. Style-based inverse kinematics. ACM Trans. Graph. 23, 3 (2004), 522--531.

Digital Library

[28]

Ikhansul Habibie, Daniel Holden, Jonathan Schwarz, Joe Yearsley, and Taku Komura. 2017. A recurrent variational autoencoder for human motion synthesis. In Proceedings of the British Machine Vision Conference (BMVC'17). BMVA Press, Durham, UK, Article 119, 12 pages.

[29]

Félix G. Harvey, Mike Yurick, Derek Nowrouzezahrai, and Christopher Pal. 2020. Robust motion in-betweening. ACM Trans. Graph. 39, 4, Article 60 (2020), 12 pages.

Digital Library

[30]

Dai Hasegawa, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. 2018. Evaluation of speech-to-gesture generation using bi-directional LSTM network. In Proceedings of the ACM International Conference on Intelligent Virtual Agents (IVA'18). ACM, New York, NY, USA, 79--86.

Digital Library

[31]

Gustav Eje Henter and W. Bastiaan Kleijn. 2016. Minimum entropy rate simplification of stochastic processes. IEEE T. Pattern Anal. 38, 12 (2016), 2487--2500.

Digital Library

[32]

Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2016. beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations (ICLR'16). 22. https://openreview.net/forum?id=Sy2fzU9gl

[33]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.

Digital Library

[34]

Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Trans. Graph. 36, 4, Article 42 (2017), 13 pages.

Digital Library

[35]

Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35, 4, Article 138 (2016), 11 pages.

Digital Library

[36]

Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs (SA'15). ACM, New York, NY, USA, Article 18, 4 pages.

Digital Library

[37]

Chin-Wei Huang, Faruk Ahmed, Kundan Kumar, Alexandre Lacoste, and Aaron Courville. 2020. Probability distillation: A caveat and alternatives. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI'20, Vol. 115). PMLR, 1212--1221. http://proceedings.mlr.press/v115/huang20c.html

[38]

Chin-Wei Huang, David Krueger, Alexandre Lacoste, and Aaron Courville. 2018. Neural autoregressive flows. In Proceedings of the International Conference on Machine Learning (ICML'18). PMLR, 2078--2087. http://proceedings.mlr.press/v80/huang18d.html

[39]

Ferenc Huszár. 2017. Is maximum likelihood useful for representation learning? http://www.inference.vc/maximum-likelihood-for-representation-learning-2/

[40]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning (ICML'15). PMLR, 448--456. http://proceedings.mlr.press/v37/ioffe15.html

Digital Library

[41]

Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, and Paavo Alku. 2019. GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH'19). ISCA, Grenoble, France, 694--698.

[42]

Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aäron van den Oord, Sander Dieleman, and Koray Kavukcuoglu. 2018. Efficient Neural Audio Synthesis. In Proceedings of the International Conference on Machine Learning (ICML'18). PMLR, 2410--2419. http://proceedings.mlr.press/v80/kalchbrenner18a.html

[43]

Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Trans. Graph. 36, 4, Article 94 (2017), 12 pages.

Digital Library

[44]

Sungwon Kim, Sang-Gil Lee, Jongyoon Song, Jaehyeon Kim, and Sungroh Yoon. 2019. FloWaveNet: A generative flow for raw audio. In Proceedings of the International Conference on Machine Learning (ICML'19). PMLR, 3370--3378. http://proceedings.mlr.press/v97/kim19b.html

[45]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR'15). 15. http://arxiv.org/abs/1412.6980

[46]

Diederik P. Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems (NeurIPS'18). Curran Associates, Inc., Red Hook, NY, USA, 10236--10245. http://papers.nips.cc/paper/8224-glow-generative-flow-with-invertible-1x1-con

[47]

Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR'14). 14. http://arxiv.org/abs/1312.6114

[48]

Lucas Kovar and Michael Gleicher. 2004. Automated extraction and parameterization of motions in large data sets. ACM Trans. Graph. 23, 3 (2004), 559--568.

Digital Library

[49]

Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2002. Motion graphs. ACM Trans. Graph. 21, 3 (2002), 473--482.

Digital Library

[50]

Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. 2019. Analyzing input and output representations for speech-driven gesture generation. In Proceedings of the ACM International Conference on Intelligent Virtual Agents (IVA'19). ACM, New York, NY, USA, 97--104.

Digital Library

[51]

Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, and Durk Kingma. 2020. VideoFlow: A conditional flow-based model for stochastic video generation. In Proceedings of the International Conference on Learning Representations (ICLR'20). 18. https://openreview.net/forum?id=rJgUfTEYvH

[52]

Neil Lawrence. 2005. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6, Nov. (2005), 1783--1816. http://www.jmlr.org/papers/v6/lawrence05a.html

[53]

Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018. Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 6, Article 180 (2018), 10 pages.

Digital Library

[54]

Sergey Levine, Jack M. Wang, Alexis Haraux, Zoran Popović, and Vladlen Koltun. 2012. Continuous character control with low-dimensional embeddings. ACM Trans. Graph. 31, 4, Article 28 (2012), 10 pages.

Digital Library

[55]

Hung Yu Ling, Fabio Zinno, George Cheng, and Michiel van de Panne. 2020. Character controllers using motion VAEs. ACM Trans. Graph. 39, 4, Article 40 (2020), 12 pages.

Digital Library

[56]

Peng Liu, Xixin Wu, Shiyin Kang, Guangzhi Li, Dan Su, and Dong Yu. 2019. Maximizing mutual information for Tacotron. arXiv:1909.01145

[57]

Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. 2018. Are GANs created equal? A large-scale study. In Advances in Neural Information Processing Systems (NeurIPS'18). Curran Associates, Inc., Red Hook, NY, USA, 700--709. http://papers.nips.cc/paper/7350-are-gans-created-equal-a-large-scale-study

[58]

Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In Proceedings of the International Conference on Machine Learning (ICML'18). PMLR, 3481--3490. http://proceedings.mlr.press/v80/mescheder18a.html

[59]

Shakir Mohamed and Balaji Lakshminarayanan. 2016. Learning in implicit generative models. arXiv:1610.03483

[60]

Tomohiko Mukai and Shigeru Kuriyama. 2005. Geostatistical motion interpolation. ACM Trans. Graph. 24, 3 (2005), 1062--1070.

Digital Library

[61]

Meinard Müller, Tido Röder, Michael Clausen, Bernhard Eberhardt, Björn Krüger, and Andreas Weber. 2007. Documentation Mocap Database HDM05. Technical Report CG-2007-2. Universität Bonn, Bonn, Germany. http://resources.mpi-inf.mpg.de/HDM05/07_MuRoClEbKrWe_HDM05.pdf

[62]

Kevin P. Murphy. 1998. Switching Kalman Filters. Technical Report 98--10. Compaq Cambridge Research Lab, Cambridge, MA, USA. https://www.cs.ubc.ca/~murphyk/Papers/skf.ps.gz

[63]

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. 2019. Do deep generative models know what they don't know?. In Proceedings of the International Conference on Learning Representations (ICLR'19). 19. https://openreview.net/forum?id=H1xwNhCcYm

[64]

OpenAI et al. 2019. Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680

[65]

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. 2019. Normalizing flows for probabilistic modeling and inference. arXiv:1912.02762

[66]

Dario Pavllo, David Grangier, and Michael Auli. 2018. QuaterNet: A quaternion-based recurrent model for human motion. In Proceedings of the British Machine Vision Conference (BMVC'18). BMVA Press, Durham, UK, 14. http://www.bmva.org/bmvc/2018/contents/papers/0675.pdf

[67]

Vladimir Pavlović, James M. Rehg, and John MacCormick. 2000. Learning switching linear models of human motion. In Advances in Neural Information Processing Systems (NIPS'00). MIT Press, Cambridge, MA, USA, 981--987. https://papers.nips.cc/paper/1892-learning-switching-linear-models-of-human-motion

[68]

Hai X. Pham, Yuting Wang, and Vladimir Pavlovic. 2018. Generative adversarial talking head: Bringing portraits to life with a weakly supervised neural network. arXiv:1803.07716

[69]

Ryan Prenger, Rafael Valle, and Bryan Catanzaro. 2019. WaveGlow: A flow-based generative network for speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'19). IEEE Signal Processing Society, Piscataway, NJ, USA, 3617--3621.

[70]

Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. GANimation: Anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV'18). Springer, Cham, Switzerland, 835--851.

Digital Library

[71]

Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257--286.

[72]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic back-propagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning (ICML'14). PMLR, 1278--1286. http://proceedings.mlr.press/v32/rezende14.html

[73]

Charles Rose, Michael F. Cohen, and Bobby Bodenheimer. 1998. Verbs and adverbs: Multidimensional motion interpolation. IEEE Comput. Graph. 18, 5 (1998), 32--40.

Digital Library

[74]

Paul Rubenstein. 2019. Variational autoencoders are not autoencoders. http://paulrubenstein.co.uk/variational-autoencoders-are-not-autoencoders/.

[75]

Najmeh Sadoughi and Carlos Busso. 2018. Novel realizations of speech-driven head movements with generative adversarial networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'18). IEEE Signal Processing Society, Piscataway, NJ, USA, 6169--6173.

Digital Library

[76]

Najmeh Sadoughi and Carlos Busso. 2019. Speech-driven animation with meaningful behaviors. Speech Commun. 110 (2019), 90--100.

Digital Library

[77]

Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. 2017. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. In Proceedings of the International Conference on Learning Representations (ICLR'17). 10. https://openreview.net/forum?id=BJrFC6ceg

[78]

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. 2018. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'18). IEEE Signal Processing Society, Piscataway, NJ, USA, 4799--4783.

Digital Library

[79]

Harrison Jesse Smith, Chen Cao, Michael Neff, and Yingying Wang. 2019. Efficient neural networks for real-time motion style transfer. Proceedings of the ACM on Computer Graphics and Interactive Techniques 2, 2, Article 13 (2019), 17 pages.

Digital Library

[80]

Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Zaman. 2020. Local motion phases for learning multi-contact character movements. ACM Trans. Graph. 39, 4, Article 54 (2020), 14 pages.

Digital Library

[81]

Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing Obama: Learning lip sync from audio. ACM Trans. Graph. 36, 4, Article 95 (2017), 13 pages.

Digital Library

[82]

Hideyuki Tachibana, Katsuya Uenoyama, and Shunsuke Aihara. 2018. Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'18). IEEE Signal Processing Society, Piscataway, NJ, USA, 4784--4788.

Digital Library

[83]

Graham W. Taylor and Geoffrey E. Hinton. 2009. Factored conditional restricted Boltzmann machines for modeling motion style. In Proceedings of the International Conference on Machine Learning (ICML'09). 1025--1032. https://icml.cc/Conferences/2009/papers/178.pdf

[84]

Graham W. Taylor, Geoffrey E. Hinton, and Sam T. Roweis. 2011. Two distributed-state models for generating high-dimensional time series. J. Mach. Learn. Res. 12, 28 (2011), 1025--1068. http://jmlr.org/papers/v12/taylor11a.html

Digital Library

[85]

Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A deep learning approach for generalized speech animation. ACM Trans. Graph. 36, 4, Article 93 (2017), 11 pages.

Digital Library

[86]

Benigno Uria, Iain Murray, Steve Renals, Cassia Valentini-Botinhao, and John Bridle. 2015. Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'15). IEEE Signal Processing Society, Piscataway, NJ, USA, 4465--4469.

[87]

Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. WaveNet: A generative model for raw audio. arXiv:1609.03499

[88]

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural discrete representation learning. In Advances in Neural Information Processing Systems (NIPS'17). Curran Associates, Inc., Red Hook, NY, USA, 6306--6315. http://papers.nips.cc/paper/7210-neural-discrete-representation-learning

[89]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS'17). Curran Associates, Inc., Red Hook, NY, USA, 5998--6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need

[90]

Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2018. End-to-end speech-driven facial animation with temporal GANs. In Proceedings of the British Machine Vision Conference (BMVC'18). BMVA Press, Durham, UK, 12. http://www.bmva.org/bmvc/2018/contents/papers/0539.pdf

[91]

Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2020. Realistic speech-driven facial animation with GANs. Int. J. Comput. Vis. 128, 5 (2020), 1398--1413.

Digital Library

[92]

Jack M. Wang, David J. Fleet, and Aaron Hertzmann. 2008. Gaussian process dynamical models for human motion. IEEE T. Pattern Anal. 30, 2 (2008), 283--298.

Digital Library

[93]

Xin Wang, Shinji Takaki, and Junichi Yamagishi. 2018. Autoregressive neural f0 model for statistical parametric speech synthesis. IEEE/ACM T. Audio Speech 26, 8 (2018), 1406--1419.

Digital Library

[94]

Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, and Rif A. Saurous. 2017. Tacotron: Towards end-to-end speech synthesis. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH'17). ISCA, Grenoble, France, 4006--4010.

[95]

Zhiyong Wang, Jinxiang Chai, and Shihong Xia. 2019. Combining Recurrent Neural Networks and Adversarial Training for Human Motion Synthesis and Control. IEEE T. Vis. Comput. Gr. (2019), 14.

Digital Library

[96]

Greg Welch and Gary Bishop. 1995. An Introduction to the Kalman Filter. Technical Report 95--041. Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. https://techreports.cs.unc.edu/papers/95-041.pdf

[97]

Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'19). IEEE Robotics and Automation Society, Piscataway, NJ, USA, 4303--4309.

Digital Library

[98]

G. Udny Yule. 1927. On a method of investigating periodicities disturbed series, with special reference to Wolfer's sunspot numbers. Philos. T. R. Soc. Lond. 226, 636--646 (1927), 267--298.

[99]

Heiga Zen and Andrew Senior. 2014. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'14). IEEE Signal Processing Society, Piscataway, NJ, USA, 3844--3848.

[100]

Hongyi Zhang, Yann N. Dauphin, and Tengyu Ma. 2019. Fixup initialization: Residual learning without normalization. In Proceedings of the International Conference on Learning Representations (ICLR'19). 16. https://openreview.net/forum?id=H1gsz30cKX

[101]

He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-adaptive neural networks for quadruped motion control. ACM Trans. Graph. 37, 4, Article 145 (2018), 11 pages.

Digital Library

[102]

Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-conditioned recurrent networks for extended complex human motion synthesis. In Proceedings of the International Conference on Learning Representations (ICLR'18). 13. https://openreview.net/forum?id=r11Q2SlRW

Cited By

Li ZWang YDu XWang CKoch RLiu M(2024)ASMNet: Action and Style-Conditioned Motion Generative Network for 3D Human Motion GenerationCyborg and Bionic Systems10.34133/cbsystems.00905Online publication date: 6-Feb-2024
https://doi.org/10.34133/cbsystems.0090
Wolfert PHenter GBelpaeme T(2024)Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal BehaviourApplied Sciences10.3390/app1404146014:4(1460)Online publication date: 10-Feb-2024
https://doi.org/10.3390/app14041460
Tao DRuizhen HLibin LLi YHao Z(2024)Research progress in human-like indoor scene interactionJournal of Image and Graphics10.11834/jig.24000429:6(1575-1606)Online publication date: 2024
https://doi.org/10.11834/jig.240004
Show More Cited By

Index Terms

MoGlow: probabilistic and controllable motion synthesis using normalising flows
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture
    2. Shape modeling

Recommendations

MuscleVAE: Model-Based Controllers of Muscle-Actuated Characters
SA '23: SIGGRAPH Asia 2023 Conference Papers

In this paper, we present a simulation and control framework for generating biomechanically plausible motion for muscle-actuated characters. We incorporate a fatigue dynamics model, the 3CC-r model, into the widely-adopted Hill-type muscle model to ...
DROP: Dynamics Responses from Human Motion Prior and Projective Dynamics
SA '23: SIGGRAPH Asia 2023 Conference Papers

Synthesizing realistic human movements, dynamically responsive to the environment, is a long-standing objective in character animation, with applications in computer vision, sports, and healthcare, for motion prediction and data augmentation. Recent ...
Dual gait generative models for human motion estimation from a single camera
Special issue on gait analysis

This paper presents a general gait representation framework for video-based human motion estimation. Specifically, we want to estimate the kinematics of an unknown gait from image sequences taken by a single camera. This approach involves two generative ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 39, Issue 6

December 2020

1605 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3414685

Editor:
Karol Myszkowski
MPI Informatik

Issue’s Table of Contents

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 November 2020

Published in TOG Volume 39, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Stiftelsen för Strategisk Forskning
Vetenskapsrådet
Knut och Alice Wallenbergs Stiftelse

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

136
Total Citations
View Citations
2,650
Total Downloads

Downloads (Last 12 months)570
Downloads (Last 6 weeks)79

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li ZWang YDu XWang CKoch RLiu M(2024)ASMNet: Action and Style-Conditioned Motion Generative Network for 3D Human Motion GenerationCyborg and Bionic Systems10.34133/cbsystems.00905Online publication date: 6-Feb-2024
https://doi.org/10.34133/cbsystems.0090
Wolfert PHenter GBelpaeme T(2024)Exploring the Effectiveness of Evaluation Practices for Computer-Generated Nonverbal BehaviourApplied Sciences10.3390/app1404146014:4(1460)Online publication date: 10-Feb-2024
https://doi.org/10.3390/app14041460
Tao DRuizhen HLibin LLi YHao Z(2024)Research progress in human-like indoor scene interactionJournal of Image and Graphics10.11834/jig.24000429:6(1575-1606)Online publication date: 2024
https://doi.org/10.11834/jig.240004
Moreno-Villamarin DHilsmann AEisert P(2024)Multi-Resolution Generative Modeling of Human Motion from Limited DataProceedings of 21st ACM SIGGRAPH Conference on Visual Media Production10.1145/3697294.3697309(1-10)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3697294.3697309
Jang DYang DJang DChoi BLee SShin D(2024)ELMO: Enhanced Real-time LiDAR Motion Capture through UpsamplingACM Transactions on Graphics10.1145/368799143:6(1-14)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687991
Abel LColotte VOuni S(2024)Towards interpretable co-speech gestures synthesis using STARGATECompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688819(138-146)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3686215.3688819
Chu YYang Z(2024)Real-time Diverse Motion In-betweening with Space-time ControlProceedings of the 17th ACM SIGGRAPH Conference on Motion, Interaction, and Games10.1145/3677388.3696327(1-8)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3677388.3696327
Xiao YShu KZhang HYin BCheang WWang HGao JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)EGGesture: Entropy-Guided Vector Quantized Variational AutoEncoder for Co-Speech Gesture GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681392(6113-6122)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681392
Liu FWang HGong JYi RZhou QLu XLu JMa LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture GenerationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680892(7027-7035)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680892
Yao HSong ZZhou YAo TChen BLiu L(2024)MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete RepresentationsACM Transactions on Graphics10.1145/365813743:4(1-21)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658137
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents