skip to main content
10.1145/3382507.3418874acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Bring the Environment to Life: A Sonification Module for People with Visual Impairments to Improve Situation Awareness

Published: 22 October 2020 Publication History

Abstract

Digital navigation tools for helping people with visual impairments have become increasingly popular in recent years. While conventional navigation solutions give routing instructions to the user, systems such as GoogleMaps, BlindSquare, or Soundscape offer additional information about the surroundings and, thereby, improve the orientation of people with visual impairments. However, these systems only provide information about static environments, while dynamic scenes comprising objects such as bikes, dogs, and persons are not considered. In addition, both the routing and the information about the environment are usually conveyed by speech. We address this gap and implement a mobile system that combines object identification with a sonification interface. Our system can be used in three different scenarios of macro and micro navigation: orientation, obstacle avoidance, and exploration of known and unknown routes. Our proposed system leverages popular computer vision methods to localize 18 static and dynamic object classes in real-time. At the heart of our system is a mixed reality sonification interface which is adaptable to the user's needs and is able to transmit the recognized semantic information to the user. The system is designed in a user-centered approach. An exploratory user study conducted by us showed that our object-to-sound mapping with auditory icons is intuitive. On average, users perceived our system as useful and indicated that they want to know more about their environment, apart from wayfinding and points of interest.

Supplementary Material

MP4 File (3382507.3418874.mp4)
In this presentation, we talk about our research project at the Karlsruhe Institute of Technology in Germany. We created an assistive prototype based on computer vision and sonification that helps people with visual impairments improve their awareness of the environment. Our prototype can localize 18 different classes of objects at once in real time. For each recognized object, the prototype transmits to the user, through parameterized auditory icons or spearcons, the information about object class, distance and position on the horizontal axis. We show that: (1) our system can be used for both micro and macro navigation; (2) the chosen interface is suited to convey information about the environment; (3) configurability of the system is paramount; (4) the objects rated most useful are those relevant for the user`s safety.

References

[1]
Dragan Ahmetovic, Federico Avanzini, Cristian Bernareggi, Gabriele Galimberti, Luca Ludovico, Sergio Mascetti, and Giorgio Presti. 2019. Sonification of Rotation Instructions to Support Navigation of People with Visual Impairment. 1--10. https://doi.org/10.1109/PERCOM.2019.8767407
[2]
Aipoly Inc. 2020. Aipoly Vision: Sight for Blind Visually Impaired. Retrieved August 8, 2020 from https://apps.apple.com/us/app/aipoly-vision-sight-for-blindvisually-impaired/id1069166437
[3]
Aipoly Inc. 2020. Orcam MyEye 2. Retrieved August 8, 2020 from https: //www.orcam.com/en/myeye2/
[4]
Malika Auvray, Sylvain Hanneton, and J Kevin O'Regan. 2007. Learning to Perceive with a Visuo - Auditory Substitution System: Localisation and Object Recognition with 'The Voice'. Perception 36, 3 (2007), 416--430.
[5]
Nida Aziz, Tony Stockman, and Rebecca Stewart. 2019. An Investigation Into Customisable Automatically Generated Auditory Route Overviews for Pre-navigation. https://doi.org/10.21785/icad2019.029
[6]
Michael Banf and Volker Blanz. 2012. A Modular Computer Vision Sonification Model for the Visually Impaired. In The 18th International Conference on Auditory Display (ICAD2012).
[7]
Nikola Banovic, Rachel L. Franz, Khai N. Truong, Jennifer Mankoff, and Anind K. Dey. 2013. Uncovering Information Needs for Independent Spatial Learning for Users Who Are Visually Impaired. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2513383.2513445
[8]
Fereshteh S. Bashiri, Eric LaRose, Jonathan C. Badger, Roshan M. D'Souza, Zeyun Yu, and Peggy Peissig. 2018. Object Detection to Assist Visually Impaired People: A Deep Neural Network Adventure. In Advances in Visual Computing, George Bebis, Richard Boyle, Bahram Parvin, Darko Koracin, Matt Turek, Srikumar Ramalingam, Kai Xu, Stephen Lin, Bilal Alsallakh, Jing Yang, Eduardo Cuervo, and Jonathan Ventura (Eds.). Springer International Publishing, Cham, 500--510.
[9]
Brandon Biggs, James M. Coughlan, and Peter Coppin. 2019. Design and evaluation of an audio game-inspired auditory map interface. In Proceedings of the 25th International Conference on Auditory Display (ICAD2019).
[10]
João Paulo Cabral and Gerard Bastiaan Remijn. 2019. Auditory icons: Design and physical characteristics. Applied Ergonomics 78 (2019), 224 -- 239. https: //doi.org/10.1016/j.apergo.2019.02.008
[11]
Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. 2018. Coco-stuff: Thing and stuff classes in context. In IEEE Conference on Computer Vision and Pattern Recognition.
[12]
Fabio Castelli. 2020. Gespeaker. Retrieved August 8, 2020 from http://www. muflone.com/gespeaker/english
[13]
Angela Constantinescu and Tanja Schultz. 2011. Redundancy Versus Complexity in Auditory Displays for Object Localization - A Pilot Study. In International Conference on Auditory Display, ICAD 2011. 222--235.
[14]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In IEEE conference on computer vision and pattern recognition.
[15]
Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In IEEE Conference on Computer Vision and Pattern Recognition.
[16]
Alireza Darvishy, Hans-Peter Hutter, and Alexander Seifert. 2016. Altersgerechte mobile Applikationen. Grundlagen und Empfehlungen. Zürcher Hochschule für Angewandte Wissenschaften. https://www.zhaw.ch/storage/engineering/ institute-zentren/init/HII_-_Human_Information_Interaction/IAL_-_ICTAccessibility_Lab/AppBrochure_20160822_GzD2.pdf.
[17]
Dictionary by Merriam-Webster. America's most-trusted online dictionary. 2020. intuitive. Definition of intuitive. Retrieved August 8, 2020 from https://www. merriam-webster.com/dictionary/intuitive
[18]
Tilman Dingler, Jeffrey Lindsay, and Bruce N. Walker. 2008. Learnability of Sound Cues for Environmental Features: Auditory Icons, Earcons, Spearcons, and Speech. In Proceedings of the 14th International Conference on Auditory Display (ICAD).
[19]
Alistair D.N. Edwards. 1991. Speech Synthesis: Technology for Disabled People. Paul Chapman Educational Publishing.
[20]
Mexhid Ferati, Steve Mannheimer, and Davide Bolchini. 2011. Usability Evaluation of Acoustic Interfaces for the Blind. In Proceedings of the 29th ACM International Conference on Design of Communication (SIGDOC '11). Association for Computing Machinery, New York, NY, USA, 9--16. https://doi.org/10.1145/ 2038476.2038479
[21]
Emerson Foulke. 2006. Listening Comprehension as a Function of Word Rate. Journal of Communication 18, 3 (02 2006), 198--206.
[22]
Free Software Foundation, Inc. 2020. eSpeak text to speech. Retrieved August 8, 2020 from http://espeak.sourceforge.net
[23]
William W. Gaver. 1986. Auditory Icons: Using Sound in Computer Interfaces. Human--Computer Interaction 2, 2 (1986), 167--177.
[24]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research (2013).
[25]
Michele Geronazzo, Alberto Bedin, Luca Brayda, Claudio Campus, and Federico Avanzini. 2016. Interactive spatial sonification for non-visual exploration of virtual maps. International Journal of Human-Computer Studies 85 (2016), 4--15. https://doi.org/10.1016/j.ijhcs.2015.08.004 Data Sonification and Sound Design in Interactive Systems.
[26]
GMA Games. 2020. ShadesOfDoom. Retrieved August 8, 2020 from https: //www.audiogames.net/db.php?id=shadesofdoom
[27]
Clément Godard, Oisin Mac Aodha, and Gabriel J Brostow. 2017. Unsupervised monocular depth estimation with left-right consistency. In IEEE Conference on Computer Vision and Pattern Recognition.
[28]
Thomas Hermann, Andy Hunt, and John G. Neuhoff (Eds.). 2011. The Sonification Handbook. Logos Publishing House, Berlin, Germany. 1--586 pages. http://sonification.de/handbook
[29]
HFC Human-Factors-Consult GmbH 2019. OIWOB - Orientieren, Informieren, Warnen. Orientierungshilfe für Blinde. HFC Human-Factors-Consult GmbH. http://www.oiwob.de/start.
[30]
Andreas Hub, Joachim Diepstraten, and Thomas Ertl. 2003. Design and Development of an Indoor Navigation and Object Identification System for the Blind. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/ 1028630.1028657
[31]
Impression Games. 2020. CaesarIII. Retrieved August 8, 2020 from https: //de.wikipedia.org/wiki/Caesar_(Spieleserie)#Caesar_III
[32]
Interaction Design Foundation. 2020. I-Like-I-Wish-What-If. Retrieved August 8, 2020 from https://public-media.interaction-design.org/pdf/I-Like-I-Wish-What-If.pdf
[33]
Brian Katz, Slim Kammoun, Gaëtan Parseihian, Olivier Gutierrez, Adrien Brilhault, Malika Auvray, Philippe Truillet, Michel Denis, Simon Thorpe, and Christophe Jouffrais. 2012. NAVIG: Augmented reality guidance system for the visually impaired: Combining object localization, GNSS, and spatial audio. Virtual Reality 16 (01 2012), 17.
[34]
Seita Kayukawa, Keita Higuchi, João Guerreiro, Shigeo Morishima, Yoichi Sato, Kris Kitani, and Chieko Asakawa. 2019. BBeep: A Sonic Collision Avoidance System for Blind Travellers and Nearby Pedestrians. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3290605.3300282
[35]
Kreis Soest. 2020. m4guide - mobile multi-modal mobility guide. Retrieved August 8, 2020 from https://www.kreis-soest.de/guide4blind/ueberuns/folge/m4guide/m4guide.php
[36]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer.
[37]
M. Martinez, A. Constantinescu, B. Schauerte, D. Koester, and R. Stiefelhagen. 2014. Cognitive Evaluation of Haptic and Audio Feedback in Short Range Navigation Tasks. In Proc. 14th Int. Conf. Computers Helping People with Special Needs (ICCHP). Springer, Paris, France.
[38]
Sergio Mascetti, Lorenzo Picinali, Andrea Gerino, Dragan Ahmetovic, and Cristian Bernareggi. 2016. Sonification of guidance data during road crossing for people with visual impairments or blindness. International Journal of HumanComputer Studies 85 (2016), 16--26. https://doi.org/10.1016/j.ijhcs.2015.08.003 Data Sonification and Sound Design in Interactive Systems.
[39]
Microsoft. 2020. Seeing AI. Turning the visual world into an audible experience. Retrieved August 8, 2020 from https://www.microsoft.com/en-us/ai/seeing-ai
[40]
R. Nagarajan, S. Yaacob, and G. Sainarayanan. 2003. Role of object identification in sonification system for visually impaired. In TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, Vol. 2. 735--739 Vol.2. https: //doi.org/10.1109/TENCON.2003.1273276
[41]
Jakob Nielsen. 1994. Usability engineering. Elsevier.
[42]
Nielsen Norman Group 2019. How Many Test Users in a Usability Study? Nielsen Norman Group. https://www.nngroup.com/articles/how-many-test-users/.
[43]
G. Parseihian, C. Gondre, M. Aramaki, S. Ystad, and R. Kronland-Martinet. 2016. Comparison and Evaluation of Sonification Strategies for Guidance Tasks. IEEE Transactions on Multimedia 18, 4 (2016), 674--686.
[44]
Kara Pernice and Jakob Nielsen. [n. d.]. How to Conduct Usability Studies for Accessibility. Methodology Guidelines for Testing Websites and Intranets With Users Who Use Assistive Technology. Nielsen Norman Group.
[45]
Giorgio Presti, Dragan Ahmetovic, Mattia Ducci, Cristian Bernareggi, Luca Ludovico, Adriano Baratè, Federico Avanzini, and Sergio Mascetti. 2019. WatchOut: Obstacle Sonification for People with Visual Impairment or Blindness. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3308561.
[46]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[47]
F. Ribeiro, D. Floréncio, P. A. Chou, and Z. Zhang. 2012. Auditory augmented reality: Object sonification for the visually impaired. In 2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP). 319--324. https: //doi.org/10.1109/MMSP.2012.6343462
[48]
Sebastian Ritterbusch, Patrick Frank, and Vanessa Petrausch. 2018. Modular Human-Computer Interaction Platform on Mobile Devices for Distributed Development. In International Conference on Computers Helping People with Special Needs. Springer, 75--78.
[49]
Manaswi Saha, Alexander J. Fiannaca, Melanie Kneisel, Edward Cutrell, and Meredith Ringel Morris. 2019. Closing the Gap: Designing for the Last-FewMeters Wayfinding Problem for People with Visual Impairments. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3308561.3353776
[50]
G. Sainarayanan, R. Nagarajan, and Sazali Yaacob. 2007. Fuzzy Image Processing Scheme for Autonomous Navigation of Human Blind. 7, 1 (2007). https://doi.org/10.1016/j.asoc.2005.06.005
[51]
Boris Schauerte, Manel Martinez, Angela Constantinescu, and Rainer Stiefelhagen. 2012. An Assistive Vision System for the Blind That Helps Find Lost Things. In Computers Helping People with Special Needs, Klaus Miesenberger, Arthur Karshmer, Petr Penaz, and Wolfgang Zagler (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 566--572.
[52]
Boris Schauerte, Torsten Wörtwein, and Rainer Stiefelhagen. 2015. A Web-based Platform for Interactive Image Sonificationn. In Accessible Interaction for Visually Impaired People (AI4VIP). Stuttgart, Germany.
[53]
Joram Schito. 2012. Effizienzanalyse der akustischen Wahrnehmung einer Parameter Mapping Sonification eines digitalen Höhenmodells durch interaktive Datenexploration: Masterarbeit über die Verwendung von Sonifikation in der GIScience. Master's thesis. https://doi.org/10.13140/RG.2.1.4316.2326
[54]
Eldon Schoop, James Smith, and Bjoern Hartmann. 2018. HindSight: Enhancing Spatial Awareness by Sonifying Detected Objects in Real-Time 360-Degree Video. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI 18). Association for Computing Machinery, New York, NY, USA, Article Paper 143, 12 pages. https://doi.org/10.1145/3173574.3173717
[55]
B. Shneiderman, C. Plaisant, M. Cohen, S. Jacobs, N. Elmqvist, and N. Diakopoulos. 2018. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Pearson Education.
[56]
Thomas Strothotte, Steffi Fritz, Rainer Michel, Andreas Raab, Helen Petrie, Valerie Johnson, Lars Reichert, and Axel Schalt. 1996. Development of Dialogue Systems for a Mobility Aid for Blind People: Initial Design and Usability Testing. In Proceedings of the Second Annual ACM Conference on Assistive Technologies (Assets '96). Association for Computing Machinery, New York, NY, USA, 139--144. https: //doi.org/10.1145/228347.228369
[57]
Titus J. J. Tang and Wai Ho Li. 2014. An Assistive EyeWear Prototype That Interactively Converts 3D Object Locations into Spatial Audio. In Proceedings of the 2014 ACM International Symposium on Wearable Computers (ISWC '14). Association for Computing Machinery, New York, NY, USA, 119--126. https: //doi.org/10.1145/2634317.2634318
[58]
Kay Tislar, Zackery Duford, Madeline Peabody Brittany Nelson, and Myounghoon Jeon. 2018. Examining the Learnability of Auditory Displays: Music, Earcons, Spearcons, and Lyricon. In Proceedings of the 24th International Conference on Auditory Display (ICAD2018). 197--202. https://doi.org/10.21785/icad2018.029
[59]
Christoph Urbanietz, Gerald Enzner, Alexander Orth, Patrick Kwiatkowski, and Nils Pohl. 2019. A Radar-based Navigation Assistance Device With Binaural Sound Interface for Vision-impaired People. In Proceedings of the 25th International Conference on Auditory Display (ICAD 2019). Department of Computer and Information Sciences, Northumbria University. https://doi.org/10.21785/ icad2019.023
[60]
Antonio Vasilijevic, Kristian Jambrosic, and Zoran Vukic. 2018. Teleoperated path following and trajectory tracking of unmanned vehicles using spatial auditory guidance system. Applied Acoustics 129 (2018), 72 -- 85. https://doi.org/10.1016/j. apacoust.2017.07.001
[61]
Karen Vines, Chris Hughes, Laura Alexander, Carol Calvert, Chetz Colwell, Hilary Holmes, Claire Kotecki, Kaela Parks, and Victoria Pearson. 2019. Sonification of numerical data for education. Open Learning: The Journal of Open, Distance and e-Learning 34, 1 (2019), 19--39. https://doi.org/10.1080/02680513.2018.1553707
[62]
Mark Vollrath, Kathrin Leske, Berhard Friedrich, and Steffen Axer. 2015. Innerstädtische Mobilitätsunterstützung für Blinde und Sehbehinderte: InMoBS: Schlussbericht: Teilvorhaben Technische Universität Braunschweig. Retrieved August 8, 2020 from https://doi.org/10.2314/GBV:856433810
[63]
Bruce Walker and Lisa Mauney. 2010. Universal Design of Auditory Graphs: A Comparison of Sonification Mappings for Visually Impaired and Sighted Listeners. ACM Transactions on Accessible Computing (TACCESS) 2 (03 2010), 12. https: //doi.org/10.1145/1714458.1714459
[64]
Michele A. Williams, Amy Hurst, and Shaun K. Kane. 2013. 'Pray before You Step out?: Describing Personal and Situational Blind Navigation Behaviors. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2513383.2513449
[65]
J. Wilson, B. N. Walker, J. Lindsay, C. Cambias, and F. Dellaert. 2007. SWAN: System for Wearable Audio Navigation. In 2007 11th IEEE International Symposium on Wearable Computers. 91--98. https://doi.org/10.1109/ISWC.2007.4373786
[66]
Limin Zeng, Markus Simros, and Gerhard Weber. 2017. Camera-Based Mobile Electronic Travel Aids Support for Cognitive Mapping of Unknown Spaces. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3098279.3098563
[67]
Tim Ziemer and Holger Schultheis. 2018. Psychoacoustic auditory display for navigation: an auditory assistance system for spatial orientation tasks. Journal on Multimodal User Interfaces 13 (11 2018), 205--218.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
October 2020
920 pages
ISBN:9781450375818
DOI:10.1145/3382507
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. assistive technologies
  2. computer vision
  3. evaluations of intelligent user interfaces
  4. mixed reality
  5. object localization
  6. sonification

Qualifiers

  • Research-article

Conference

ICMI '20
Sponsor:
ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
October 25 - 29, 2020
Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)10
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media