skip to main content
research-article

Motion-Sound Mapping through Interaction: An Approach to User-Centered Design of Auditory Feedback Using Machine Learning

Published: 13 June 2018 Publication History

Abstract

Technologies for sensing movement are expanding toward everyday use in virtual reality, gaming, and artistic practices. In this context, there is a need for methodologies to help designers and users create meaningful movement experiences. This article discusses a user-centered approach for the design of interactive auditory feedback using interactive machine learning. We discuss Mapping through Interaction, a method for crafting sonic interactions from corporeal demonstrations of embodied associations between motion and sound. It uses an interactive machine learning approach to build the mapping from user demonstrations, emphasizing an iterative design process that integrates acted and interactive experiences of the relationships between movement and sound. We examine Gaussian Mixture Regression and Hidden Markov Regression for continuous movement recognition and real-time sound parameter generation. We illustrate and evaluate this approach through an application in which novice users can create interactive sound feedback based on coproduced gestures and vocalizations. Results indicate that Gaussian Mixture Regression and Hidden Markov Regression can efficiently learn complex motion-sound mappings from few examples.

References

[1]
Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. ModelTracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). ACM, New York, NY, USA, 337--346.
[2]
Michael L. Anderson. 2003. Embodied cognition: A field guide. Artificial Intelligence 149, 1 (2003), 91--130.
[3]
Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 5 (May 2009), 469--483.
[4]
Gregory Beller. 2014. The synekine project. In Proceedings of the International Workshop on Movement and Computing (MOCO’14). ACM, Paris, France, 66--69.
[5]
Penelope A. Best, F. Levy, J. P. Fried, and Fern Leventhal. 1998. Dance and other expressive art therapies: When words are not enough. Dance Research: The Journal of the Society for Dance Research 16, 1 (jan 1998), 87.
[6]
Frédéric Bettens and Todor Todoroff. 2009. Real-time DTW-based gesture recognition external object for MAX/MSP and Pure Data. Proceedings of the SMC 2009 Conference 9, July (2009), 30--35.
[7]
Frédéric Bevilacqua, Eric O. Boyer, Jules Françoise, Olivier Houix, Patrick Susini, Agnès Roby-Brami, and Sylvain Hanneton. 2016. Sensori-motor learning with movement sonification: Perspectives from recent interdisciplinary studies. Frontiers in Neuroscience 10 (Aug. 2016), 385.
[8]
Frédéric Bevilacqua, Norbert Schnell, Nicolas Rasamimanana, Bruno Zamborlin, and Fabrice Guédy. 2011. Online gesture analysis and control of audio processing. In Musical Robots and Interactive Multimodal Systems. Springer, 127--142.
[9]
Frédéric Bevilacqua, Bruno Zamborlin, Anthony Sypniewski, Norbert Schnell, Fabrice Guédy, and Nicolas Rasamimanana. 2010. Continuous realtime gesture following and recognition. Gesture in Embodied Communication and Human-Computer Interaction (2010), 73--84.
[10]
Jeffrey A. Bilmes. 1998. A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report. International Computer Science Institute, UC Berkeley.
[11]
Matthew Brand and Aaron Hertzmann. 2000. Style machines. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). ACM Press, New York, New York, USA, 183--192.
[12]
Bernd Bruegge, Christoph Teschner, Peter Lachenmaier, Eva Fenzl, Dominik Schmidt, and Simon Bierbaum. 2007. Pinocchio. In Proceedings of the International Conference on Advances in Computer Entertainment Technology (ACE’07). ACM Press, Salzburg, Austria, 294.
[13]
Sylvain Calinon. 2007. Continuous Extraction of Task Constraints in a Robot Programming by Demonstration Framework. PhD Dissertation. École Polytechnique Fédéral de Lausanne.
[14]
Sylvain Calinon, F. D’halluin, E. L. Sauser, D. G. Caldwell, and Aude Billard. 2010. Learning and reproduction of gestures by imitation: An approach based on hidden Markov model and Gaussian mixture regression. Robotics 8 Automation Magazine, IEEE 17, 2 (2010), 44--54.
[15]
Sylvain Calinon, Florent Guenter, and Aude Billard. 2007. On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 37, 2 (April 2007), 286--298.
[16]
Baptiste Caramiaux, Frédéric Bevilacqua, Tommaso Bianco, Norbert Schnell, Olivier Houix, and Patrick Susini. 2014. The role of sound source perception in gestural sound description. ACM Transactions on Applied Perception 11, 1 (apr 2014), 1--19.
[17]
Baptiste Caramiaux, Nicola Montecchio, Atau Tanaka, and Frédéric Bevilacqua. 2014. Adaptive gesture recognition with variation estimation for interactive systems. ACM Transactions on Interactive Intelligent Systems 4, 4 (2014), 18:1--18:34.
[18]
Baptiste Caramiaux, Norbert Schnell, Jules Françoise, Frédéric Bevilacqua, Norbert Schnell, and Frédéric Bevilacqua. 2014. Mapping through listening. Computer Music Journal 38, 34--48 (2014), 34--48.
[19]
Baptiste Caramiaux and Atau Tanaka. 2013. Machine learning of musical gestures. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME’13). Seoul, South Korea.
[20]
Edwin Chan, Teddy Seyed, Wolfgang Stuerzlinger, Xing-Dong Yang, and Frank Maurer. 2016. User elicitation on single-hand microgestures. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI’16). ACM, San Jose, CA, USA, 3403--3414.
[21]
Tsuhan Chen. 2001. Audiovisual speech processing. Signal Processing Magazine, IEEE 18, 1 (Jan. 2001), 9--21.
[22]
Kyoungho Choi, Ying Luo, and Jenq-neng Hwang. 2001. Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. The Journal of VLSI Signal Processing 29, 1 (2001), 51--61.
[23]
Paul Dourish. 2004. Where the Action Is: The Foundations of Embodied Interaction. The MIT Press.
[24]
Jerry Alan Fails and Dan R. Olsen. 2003. Interactive machine learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI’03). 39--45.
[25]
Sidney Fels and Geoffrey Hinton. 1993. Glove-talkII: A neural network interface between a data-glove and a speech synthesizer. Neural Networks, IEEE Transactions on 4, 1 (1993), 2--8.
[26]
Rebecca Fiebrink. 2011. Real-time Human Interaction with Supervised Learning Algorithms for Music Composition and Performance. Ph.D. Dissertation. Faculty of Princeton University.
[27]
Rebecca Fiebrink, Perry R. Cook, and Dan Trueman. 2009. Play-along mapping of musical controllers. In Proceedings of the International Computer Music Conference. 61--64.
[28]
Rebecca Fiebrink, Perry R. Cook, and Dan Trueman. 2011. Human model evaluation in interactive supervised learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’11). ACM, Vancouver, BC, Canada, 147.
[29]
James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: Interactive concept learning in image search. In Proceedings of the 26th Annual CHI Conference on Human Factors in Computing Systems (CHI’08). 29.
[30]
Jules Françoise. 2015. Motion-Sound Mapping by Demonstration. PhD Dissertation. Université Pierre et Marie Curie. http://julesfrancoise.com/phdthesis.
[31]
Jules Françoise, Frédéric Bevilacqua, and Thecla Schiphorst. 2016. GaussBox: Prototyping movement interaction with interactive visualizations of machine learning. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA’16). ACM, San Jose, CA, 3667--3670.
[32]
Jules Françoise, Baptiste Caramiaux, and Frédéric Bevilacqua. 2012. A hierarchical approach for the design of gesture-to-sound mappings. In Proceedings of the 9th Sound and Music Computing Conference. Copenhagen, Denmark, 233--240.
[33]
Jules Françoise, Sarah Fdili Alaoui, Thecla Schiphorst, and Frédéric Bevilacqua. 2014. Vocalizing dance movement for interactive sonification of laban effort factors. In Proceedings of the 2014 Conference on Designing Interactive Systems (DIS’14). ACM, Vancouver, Canada, 1079--1082.
[34]
Jules Françoise, Norbert Schnell, and Frédéric Bevilacqua. 2013. A multimodal probabilistic model for gesture-based control of sound synthesis. In Proceedings of the 21st ACM International Conference on Multimedia (MM’13). Barcelona, Spain, 705--708.
[35]
Jules Françoise, Norbert Schnell, and Frédéric Bevilacqua. 2014. MaD: Mapping by demonstration for continuous sonification. In ACM SIGGRAPH 2014 Emerging Technologies (SIGGRAPH’14). ACM, Vancouver, Canada, 16.
[36]
Jules Françoise, Norbert Schnell, Riccardo Borghesi, and Frédéric Bevilacqua. 2015. MaD. Interactions 22, 3 (2015), 14--15.
[37]
Karmen Franinović and Stefania Serafin. 2013. Sonic Interaction Design. MIT Press.
[38]
Shengli Fu, Ricardo Gutierrez-Osuna, Anna Esposito, Praveen K. Kakumanu, and Oscar N. Garcia. 2005. Audio/visual mapping with cross-modal hidden Markov models. IEEE Transactions on Multimedia 7, 2 (April 2005), 243--252.
[39]
Zoubin Ghahramani and Michael I. Jordan. 1994. Supervised learning from incomplete data via an EM approach. In Advances in Neural Information Processing Systems.
[40]
Nicholas Gillian and Joseph A. Paradiso. 2014. The gesture recognition toolkit. Journal of Machine Learning Research 15 (2014), 3483--3487. http://jmlr.org/papers/v15/gillian14a.html.
[41]
Nicholas Edward Gillian. 2011. Gesture Recognition for Musician Computer Interaction. PhD Dissertation. Faculty of Arts, Humanities and Social Sciences.
[42]
Marco Gillies, Harry Brenton, and Andrea Kleinsmith. 2015. Embodied design of full bodied interaction with virtual humans. In Proceedings of the 2nd International Workshop on Movement and Computing (MOCO’15). ACM, Vancouver, British Columbia, Canada, 1--8.
[43]
Rolf Inge Godøy, Egil Haga, and A.R. Jensenius. 2006. Exploring music-related gestures by sound-tracing - a preliminary study. In Proceedings of the 2nd ConGAS International Symposium on Gesture Interfaces for Multimedia Systems. 9--10.
[44]
Rolf Inge Godøy, Egil Haga, and A. Jensenius. 2006. Playing air instruments İ: Mimicry of sound-producing gestures by novices and experts. Gesture in Human-Computer Interaction and Simulation (2006), 256--267.
[45]
Vincent Goudard, Hugues Genevois, Émilien Ghomi, and Boris Doval. 2011. Dynamic intermediate models for audiographic synthesis. In Proceedings of the Sound and Music Computing Conference (SMC’11). 1--6.
[46]
Camille Goudeseune. 2002. Interpolated mappings for musical instruments. Organised Sound 7, 2 (2002), 85--96.
[47]
Thomas Hermann, John G. Neuhoff, and Andy Hunt. 2011. The Sonification Handbook. Logos Verlag, Berlin, Germany.
[48]
Kristina Höök, Martin P. Jonsson, Anna Ståhl, and Johanna Mercurio. 2016. Somaesthetic appreciation design. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI’16). ACM, San Jose, CA, USA, 3131--3142.
[49]
Thomas Hueber and Pierre Badin. 2011. Statistical mapping between articulatory and acoustic data, application to silent speech interface and visual articulatory feedback. Proceedings of the 1st International Workshop on Performative Speech and Singing Synthesis (P3S’11).
[50]
Andy Hunt, Marcelo M. Wanderley, and Ross Kirk. 2000. Towards a model for instrumental mapping in expert musical interaction. In Proceedings of the 2000 International Computer Music Conference. 209--212.
[51]
Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. 2013. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation 25, 2 (feb 2013), 328--73.
[52]
Andrew Johnston. 2009. Interfaces for Musical Expression Based on Simulated Physical Models. Ph.D. Dissertation. University of Technology, Sydney.
[53]
Eamonn Keogh and Chotirat Ann Ratanamahatana. 2004. Exact indexing of dynamic time warping. Knowledge and Information Systems 7, 3 (may 2004), 358--386.
[54]
David Kirsh. 2013. Embodied cognition and the magical future of interaction design. ACM Transactions on Computer-Human Interaction 20, 111 (2013), 3:1--3:30.
[55]
David Kirsh, Dafne Muntanyola, and R. J. Jao. 2009. Choreographic methods for creating novel, high quality dance. In Proceedings of the 5th International Workshop on Design and Semantics of Form and Movement.
[56]
Andrea Kleinsmith and Marco Gillies. 2013. Customizing by doing for responsive video game characters. International Journal of Human Computer Studies 71, 7--8 (2013), 775--784.
[57]
Paul Kolesnik and Marcelo M. Wanderley. 2005. Implementation of the discrete hidden Markov model in Max/MSP environment. In Proceedings of the FLAIRS Conference. 68--73.
[58]
Todd Kulesza, Margaret Burnett, Weng-keen Wong, and Simone Stumpf. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI’15). Atlanta, GA, USA., 126--137.
[59]
Michael Lee, Adrian Freed, and David Wessel. 1992. Neural networks for simultaneous classification and parameter estimation in musical instrument control. Adaptive and Learning Systems 1706 (1992), 244--55.
[60]
Marc Leman. 2008. Embodied Music Cognition and Mediation Technology. The MIT Press.
[61]
Moira Logan. 1984. Dance in the schools: A personal account. Theory Into Practice 23, 4 (sep 1984), 300--302.
[62]
Elena Márquez Segura, Laia Turmo Vidal, Asreen Rostami, and Annika Waern. 2016. Embodied sketching. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI’16). ACM, San Jose, CA, USA, 6014--6027.
[63]
Eduardo R. Miranda and Marcelo M. Wanderley. 2006. New Digital Musical Instruments: Control and Interaction Beyond the Keyboard. AR Editions, Inc.
[64]
Paul Modler. 2000. Neural networks for mapping hand gestures to sound synthesis parameters. Trends in Gestural Control of Music (2000), 301--314.
[65]
Ali Momeni and Cyrille Henry. 2006. Dynamic independent mapping layers for concurrent control of audio and video synthesis. Computer Music Journal 30, 1 (2006), 49--66.
[66]
Kevin Patrick Murphy. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, Massachusetts.
[67]
Uran Oh and Leah Findlater. 2013. The challenges and potential of end-user gesture customization. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM Press, Paris, France, 1129.
[68]
Dirk Ormoneit and Volker Tresp. 1996. Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (Eds.). MIT Press, 542--548.
[69]
Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, and James Landay. 2010. Gestalt: Integrated support for implementation and analysis in machine learning. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology. 37--46.
[70]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[71]
Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257--286.
[72]
Nicolas Rasamimanana, Frédéric Bevilacqua, Norbert Schnell, Emmanuel Fléty, and Bruno Zamborlin. 2011. Modular musical objects towards embodied control of digital music real time musical interactions. In Proceedings of the 5th International Conference on Tangible, Embedded, and Embodied Interaction (TEI’11). Funchal, Portugal, 9--12.
[73]
Davide Rocchesso, Guillaume Lemaitre, Patrick Susini, Sten Ternström, and Patrick Boussard. 2015. Sketching sound with voice and gesture. Interactions 22, 1 (2015), 38--41.
[74]
Joseph B. Rovan, Marcelo M. Wanderley, Shlomo Dubnov, and Philippe Depalle. 1997. Instrumental gestural mapping strategies as expressivity determinants in computer music performance. In Proceedings of the AIMI International Workshop. 68--73.
[75]
Stefan Schaal. 1999. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3, 6 (1999), 233--242.
[76]
Stefan Schaal, Auke Ijspeert, and Aude Billard. 2003. Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 358, 1431 (Mar. 2003), 537--47.
[77]
Thecla Schiphorst. 2009. soft(n). In Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA’09). ACM, Boston, MA, USA, 2427.
[78]
Diemo Schwarz. 2007. Corpus-based concatenative synthesis. IEEE Signal Processing Magazine 24, 2 (2007), 92--104.
[79]
Hsi Guang Sung. 2004. Gaussian Mixture Regression and Classification. PhD Dissertation. Rice University, Houston, TX.
[80]
Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S. Tan. 2009. EnsembleMatrix: Interactive visualization to support machine learning with multiple classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’09). Boston, USA, 1283--1292.
[81]
Joëlle Tilmanne. 2013. Data-Driven Stylistic Humanlike Walk Synthesis. PhD Dissertation. University of Mons.
[82]
Joëlle Tilmanne, Alexis Moinet, and Thierry Dutoit. 2012. Stylistic gait synthesis based on hidden Markov models. EURASIP Journal on Advances in Signal Processing 2012, 1 (2012), 72.
[83]
Tomoki Toda, Alan W. Black, and Keiichi Tokuda. 2004. Acoustic-to-articulatory inversion mapping with Gaussian mixture model. In 8th International Conference on Spoken Language Processing (INTERSPEECH - ICSLP'04). Jeju Island, Korea, 1--4.
[84]
Tomoki Toda, Alan W. Black, and Keiichi Tokuda. 2008. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication 50, 3 (2008), 215--227.
[85]
Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura. 2013. Speech synthesis based on hidden Markov models. Proc. IEEE 101, 5 (2013), 1234--1252.
[86]
Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura. 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’00). Vol. 3. 1315--1318.
[87]
Doug Van Nort, Marcelo M. Wanderley, and Philippe Depalle. 2004. On the choice of mappings based on geometric properties. In Proceedings of International Conference on New Interfaces for Musical Expression (NIME’04). National University of Singapore, 87--91.
[88]
Doug Van Nort, Marcelo M. Wanderley, and Philippe Depalle. 2014. Mapping control structures for sound synthesis: Functional and topological perspectives. Comput. Music J. 38, 3 (2014), 6--22.
[89]
Freya Vass-Rhee. 2010. Dancing music: The intermodality of the Forsythe Company. In William Forsythe and the Practice of Choreography, Steven Spier (Ed.). Routledge, 73--89.
[90]
Jacob O. Wobbrock, Meredith Ringel Morris, and Andrew D. Wilson. 2009. User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’09). ACM, 1083--1092.
[91]
Bruno Zamborlin. 2015. Studies on Customisation-Driven Digital Music Instruments. PhD Dissertation. Goldsmith University of London and Université Pierre et Marie Curie.
[92]
Heiga Zen, Yoshihiko Nankaku, and Keiichi Tokuda. 2011. Continuous stochastic feature mapping based on trajectory HMMs. IEEE Transactions on Audio, Speech, and Language Processing 19, 2 (Feb. 2011), 417--430.
[93]
Heiga Zen, Keiichi Tokuda, and Tadashi Kitamura. 2007. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech 8 Language 21, 1 (Jan. 2007), 153--173.
[94]
Le Zhang and Steve Renals. 2008. Acoustic-articulatory modeling with the trajectory HMM. IEEE Signal Processing Letters 15 (2008), 245--248.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Interactive Intelligent Systems
ACM Transactions on Interactive Intelligent Systems  Volume 8, Issue 2
Special Issue on Human-Centered Machine Learning
June 2018
259 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/3232718
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2018
Accepted: 01 March 2018
Revised: 01 March 2018
Received: 01 December 2016
Published in TIIS Volume 8, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Interactive machine learning
  2. movement
  3. programming-by-demonstration
  4. sonification
  5. sound and music computing
  6. user-centered design

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media