research-article

Dual Learning Music Composition and Dance Choreography

Authors:

Li ChengAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 3746 - 3754

https://doi.org/10.1145/3474085.3475180

Published: 17 October 2021 Publication History

Abstract

Music and dance have always co-existed as pillars of human activities, contributing immensely to the cultural, social, and entertainment functions in virtually all societies. Notwithstanding the gradual systematization of music and dance into two independent disciplines, their intimate connection is undeniable and one art-form often appears incomplete without the other. Recent research works have studied generative models for dance sequences conditioned on music. The dual task of composing music for given dances, however, has been largely overlooked. In this paper, we propose a novel extension, where we jointly model both tasks in a dual learning approach. To leverage the duality of the two modalities, we introduce an optimal transport objective to align feature embeddings, as well as a cycle consistency loss to foster overall consistency. Experimental results demonstrate that our dual learning framework improves individual task performance, delivering generated music compositions and dance choreographs that are realistic and faithful to the conditioned inputs.

Supplementary Material

MP4 File (MM21-fp0119.mp4)

Presentation Video

Download
21.53 MB

References

[1]

David Alvarez-Melis and Tommi Jaakkola. 2018. Gromov-Wasserstein Alignment of Word Embedding Spaces. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1881--1890.

[2]

Luis Astey et al. 1987. A cobordism obstruction to embedding manifolds. Illinois Journal of Mathematics, Vol. 31, 2 (1987), 344--350.

[3]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European conference on computer vision. Springer, 561--578.

[4]

Jean Bourgain. 1985. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics, Vol. 52, 1--2 (1985), 46--52.

[5]

Jean-Pierre Briot, Gaëtan Hadjeres, and Francc ois-David Pachet. 2017. Deep learning techniques for music generation--a survey. arXiv preprint arXiv:1709.01620 (2017).

[6]

Steven Brown and Lawrence M Parsons. 2008. The neuroscience of dance. Scientific American, Vol. 299, 1 (2008), 78--83.

[7]

Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[9]

Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. 2020. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341 (2020).

[10]

Sander Dieleman, A"aron van den Oord, and Karen Simonyan. 2018. The challenge of realistic music generation: modelling raw audio at scale. arXiv preprint arXiv:1806.10474 (2018).

Digital Library

[11]

Rukun Fan, Songhua Xu, and Weidong Geng. 2011. Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics, Vol. 18, 3 (2011), 501--515.

Digital Library

[12]

Jeremy Grifski. [n.d.]. JuxtaMidi. https://therenegadecoder.com/code/juxtamidi-a-midi-file-visualization-dashboard/. Accessed: 2021-04--17.

[13]

Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. Advances in neural information processing systems, Vol. 29 (2016), 820--828.

Digital Library

[14]

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018. Music transformer. arXiv preprint arXiv:1809.04281 (2018).

[15]

Ruozi Huang, Huang Hu, Wei Wu, Kei Sawada, Mi Zhang, and Daxin Jiang. 2021. Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning. In International Conference on Learning Representations .

[16]

Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia. 1180--1188.

Digital Library

[17]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125--1134.

[18]

Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, and Jan Kautz. 2019. Dancing to music. In Advances in Neural Information Processing Systems. 3586--3596.

Digital Library

[19]

Minho Lee, Kyogu Lee, and Jaeheung Park. 2013. Music similarity-based approach to generating dance motion sequence. Multimedia tools and applications, Vol. 62, 3 (2013), 895--912.

Digital Library

[20]

Daniel J Levitin and Anna K Tirovolas. 2009. Current advances in the cognitive neuroscience of music. Annals of the New York Academy of Sciences, Vol. 1156, 1 (2009), 211--231.

[21]

Jiaman Li, Yihang Yin, Hang Chu, Yi Zhou, Tingwu Wang, Sanja Fidler, and Hao Li. 2020. Learning to Generate Diverse Dance Motions with Transformer. arXiv preprint arXiv:2008.08171 (2020).

[22]

Ruilong Li, Shan Yang, David A Ross, and Angjoo Kanazawa. 2021. Learn to Dance with AIST+: Music Conditioned 3D Dance Generation. arXiv preprint arXiv:2101.08779 (2021).

[23]

Jianxin Lin, Yingce Xia, Tao Qin, Zhibo Chen, and Tie-Yan Liu. 2018. Conditional image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5524--5532.

[24]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG), Vol. 34, 6 (2015), 1--16.

Digital Library

[25]

Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, Vol. 8. 18--25.

[26]

Facundo Mémoli. 2011. Gromov--Wasserstein distances and the metric approach to object matching. Foundations of computational mathematics, Vol. 11, 4 (2011), 417--487.

Digital Library

[27]

Meinard Müller. 2015. Fundamentals of music processing: Audio, analysis, algorithms, applications .Springer.

Digital Library

[28]

Gabriel Peyré, Marco Cuturi, and Justin Solomon. 2016. Gromov-wasserstein averaging of kernel and distance matrices. In International Conference on Machine Learning. 2664--2672.

Digital Library

[29]

Xuanchi Ren, Haoran Li, Zijian Huang, and Qifeng Chen. 2020. Self-supervised Dance Video Synthesis Conditioned on Music. In Proceedings of the 28th ACM International Conference on Multimedia. 46--54.

Digital Library

[30]

Christopher Small. 1998. Musicking: The meanings of performing and listening .Wesleyan University Press.

[31]

Taoran Tang, Jia Jia, and Hanyang Mao. 2018. Dance with melody: An LSTM-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM international conference on Multimedia. 1598--1606.

Digital Library

[32]

Shuhei Tsuchida, Satoru Fukayama, Masahiro Hamasaki, and Masataka Goto. 2019. AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing. In ISMIR. 501--510.

[33]

Sean Vasquez and Mike Lewis. 2019. Melnet: A generative model for audio in the frequency domain. arXiv preprint arXiv:1906.01083 (2019).

[34]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).

Digital Library

[35]

Tianyan Wang. 2015. A hypothesis on the biological origins and social evolution of music and dance. Frontiers in neuroscience, Vol. 9 (2015), 30.

[36]

Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. Dual supervised learning. In International Conference on Machine Learning. PMLR, 3789--3798.

Digital Library

[37]

Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2018. Model-level dual learning. In International Conference on Machine Learning. PMLR, 5383--5392.

[38]

Min Xu, Ling-Yu Duan, Jianfei Cai, Liang-Tien Chia, Changsheng Xu, and Qi Tian. 2004. HMM-based audio keyword generation. In Pacific-Rim Conference on Multimedia. Springer, 566--574.

Digital Library

[39]

Zijie Ye, Haozhe Wu, Jia Jia, Yaohua Bu, Wei Chen, Fanbo Meng, and Yanfeng Wang. 2020. ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit. In Proceedings of the 28th ACM International Conference on Multimedia. 744--752.

Digital Library

[40]

Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision. 2849--2857.

[41]

Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. 2019. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5745--5753.

[42]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

Cited By

Wang ZLi JQin XSun SZhou SJia JLuo JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera SynthesisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680980(10200-10209)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680980
Liang XLi WHuang LGao C(2024)DanceComposer: Dance-to-Music Generation Using a Progressive Conditional Music GeneratorIEEE Transactions on Multimedia10.1109/TMM.2024.340573426(10237-10250)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3405734
Wu SLiu ZZhang BZimmermann RBa ZZhang XRen K(2024)Do as I Do: Pose Guided Human Motion CopyIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.337153021:6(5293-5307)Online publication date: Nov-2024
https://doi.org/10.1109/TDSC.2024.3371530
Show More Cited By

Index Terms

Dual Learning Music Composition and Dance Choreography
1. Applied computing
  1. Arts and humanities
    1. Media arts
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

"Digital Tap Dance": Tap Dance as Medium for Composition
DANCING, Dance ANd Choreography: an Intelligent Nondeterministic Generator
TAPIA '09: The Fifth Richard Tapia Celebration of Diversity in Computing Conference: Intellect, Initiatives, Insight, and Innovations

In this paper, we describe our proof of concept system that uses genetic algorithms to generate choreography for the waltz, a ballroom dance. We detail the representation of the dance steps and sequences our system manipulates, and our design of the ...
Music/lyrics composition system considering user's image and music genre
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

This paper proposes a music/lyrics composition system consisting of two sections, a lyric composing section and a music composing section, which considers user's image of a song and music genre. First of all, a user has an image of music/lyrics to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
329
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZLi JQin XSun SZhou SJia JLuo JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera SynthesisProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680980(10200-10209)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680980
Liang XLi WHuang LGao C(2024)DanceComposer: Dance-to-Music Generation Using a Progressive Conditional Music GeneratorIEEE Transactions on Multimedia10.1109/TMM.2024.340573426(10237-10250)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3405734
Wu SLiu ZZhang BZimmermann RBa ZZhang XRen K(2024)Do as I Do: Pose Guided Human Motion CopyIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.337153021:6(5293-5307)Online publication date: Nov-2024
https://doi.org/10.1109/TDSC.2024.3371530
Luo ZRen MHu XHuang YYao L(2024)POPDG: Popular 3D Dance Generation with PopDanceSet2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02548(26974-26983)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02548
Wang ZJia JSun SWu HHan RLi ZTang DZhou JLuo J(2024)DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00754(7892-7901)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00754
Kim SLee K(2024)Music-Driven Synchronous Dance Generation Considering K-Pop Musical and Choreographical CharacteristicsIEEE Access10.1109/ACCESS.2024.342043312(94152-94163)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3420433

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents