skip to main content
10.1145/3640543.3645152acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Looking for a better fit? An Incremental Learning Multimodal Object Referencing Framework adapting to Individual Drivers

Published: 05 April 2024 Publication History

Abstract

The rapid advancement of the automotive industry towards automated and semi-automated vehicles has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle. Consequently, research has shifted toward gestural input (e.g., hand, gaze, and head pose gestures) as a more suitable mode of interaction during driving. However, due to the dynamic nature of driving and individual variation, there are significant differences in drivers’ gestural input performance. While, in theory, this inherent variability could be moderated by substantial data-driven machine learning models, prevalent methodologies lean towards constrained, single-instance trained models for object referencing. These models show a limited capacity to continuously adapt to the divergent behaviors of individual drivers and the variety of driving scenarios. To address this, we propose IcRegress, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects. We suggest a more personalized and adaptable solution for multimodal gestural interfaces, employing continuous lifelong learning to enhance driver experience, safety, and convenience. Our approach was evaluated using an outside-the-vehicle object referencing use case, highlighting the superiority of the incremental learning models adapted over a single trained model across various driver traits such as handedness, driving experience, and numerous driving conditions. Finally, to facilitate reproducibility, ease deployment, and promote further research, we offer our approach as an open-source framework at https://github.com/amrgomaaelhady/IcRegress.

References

[1]
2019. BMW Natural Interaction unveiled at MWC 2019. https://discover.bmw.co.uk/article/bmw-natural-interaction-unveiled-at-mwc-2019
[2]
2021. Mercedes-Benz presents the MBUX Hyperscreen at CES 2021. https://media.daimler.com/marsMediaSite/instance/ko.xhtml?oid=48617114&filename=Mercedes-Benz-presents-the-MBUX-Hyperscreen-at-CES
[3]
Abdul Rafey Aftab, Michael von der Beeck, and Michael Feld. 2020. You have a point there: object selection inside an automobile using gaze, head pose and finger pointing. In Proceedings of the 22nd International Conference on Multimodal Interaction. ACM, 595–603.
[4]
Abdul Rafey Aftab, Michael Von Der Beeck, Steven Rohrhirsch, Benoit Diotte, and Michael Feld. 2021. Multimodal Fusion Using Deep Learning Applied to Driver’s Referencing of Outside-Vehicle Objects. In 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1108–1115.
[5]
Bashar I. Ahmad, Chrisminder Hare, Harpreet Singh, Arber Shabani, Briana Lindsay, Lee Skrypchuk, Patrick Langdon, and Simon Godsill. 2018. Selection facilitation schemes for predictive touch with mid-air pointing gestures in automotive displays. In Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 21–32.
[6]
Pierre Baldi and Peter J Sadowski. 2013. Understanding dropout. Advances in neural information processing systems 26 (2013).
[7]
Christopher M Bishop. 2006. Pattern Recognition and Machine Learning. Springer.
[8]
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
[9]
Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence 44, 7 (2021), 3366–3385.
[10]
Henrik Detjen, Sarah Faltaous, Bastian Pfleging, Stefan Geisler, and Stefan Schneegass. 2021. How to increase automated vehicles’ acceptance through in-vehicle interaction design: A review. International Journal of Human–Computer Interaction 37, 4 (2021), 308–330.
[11]
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia. 1459–1462.
[12]
Hessam Jahani Fariman, Hasan J. Alyamani, Manolya Kavakli, and Len Hamey. 2016. Designing a user-defined gesture vocabulary for an in-vehicle climate control system. In Proceedings of the 28th Australian Computer-Human Interaction Conference. ACM, 391–395.
[13]
Tobias Fischer, Hyung Jin Chang, and Yiannis Demiris. 2018. Rt-gene: Real-time eye gaze estimation in natural environments. In Proceedings of the European conference on computer vision (ECCV). 334–352.
[14]
Kikuo Fujimura, Lijie Xu, Cuong Tran, Rishabh Bhandari, and Victor Ng-Thow-Hing. 2013. Driver queries using wheel-constrained finger pointing and 3-D head-up display visual feedback. In Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 56–62.
[15]
Milan Gnjatović, Jovica Tasevski, Milutin Nikolić, Dragiša Mišković, Branislav Borovac, and Vlado Delić. 2012. Adaptive multimodal interaction with industrial robot. In 2012 IEEE 10th Jubilee International Symposium on Intelligent Systems and Informatics. IEEE, 329–333.
[16]
Amr Gomaa. 2022. Adaptive User-Centered Multimodal Interaction towards Reliable and Trusted Automotive Interfaces. In Proceedings of the 2022 International Conference on Multimodal Interaction. 690–695.
[17]
Amr Gomaa and Michael Feld. 2023. Towards Adaptive User-Centered Neuro-Symbolic Learning for Multimodal Interaction with Autonomous Systems. In Proceedings of the 25th International Conference on Multimodal Interaction(ICMI ’23). Association for Computing Machinery, New York, NY, USA, 689–694. https://doi.org/10.1145/3577190.3616121
[18]
Amr Gomaa, Guillermo Reyes, Alexandra Alles, Lydia Rupp, and Michael Feld. 2020. Studying person-specific pointing and gaze behavior for multimodal referencing of outside objects from a moving vehicle. In Proceedings of the 22nd International Conference on Multimodal Interaction. ACM, 501–509.
[19]
Amr Gomaa, Guillermo Reyes, and Michael Feld. 2021. ML-PersRef: A Machine Learning-Based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle. In Proceedings of the 23rd International Conference on Multimodal Interaction. ACM, New York, NY, USA, 318–327.
[20]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
[21]
Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. 2013. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013).
[22]
John A Groeger. 2000. Understanding driving: Applying cognitive psychology to a complex everyday task. Psychology Press.
[23]
Lisa Hassel and Eli Hagen. 2005. Adaptation of an automotive dialogue system to users’ expertise. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue. 222–226.
[24]
Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and their applications 13, 4 (1998), 18–28.
[25]
Srinivasan Janarthanam and Oliver Lemon. 2014. Adaptive generation in dialogue systems using dynamic user modeling. Computational Linguistics 40, 4 (2014), 883–920.
[26]
Qiang Ji and Xiaojie Yang. 2002. Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-time imaging 8, 5 (2002), 357–377.
[27]
Shinjae Kang, Byungjo Kim, Sangrok Han, and Hyogon Kim. 2015. Do you see what I see: Towards a gaze-based surroundings query processing system. In Proceedings of the 7th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 93–100.
[28]
Young-Ho Kim and Teruhisa Misu. 2014. Identification of the driver’s interest point using a head pose trajectory for situated dialog systems. In Proceedings of the 16th International Conference on Multimodal Interaction. ACM, 92–95.
[29]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
[30]
Mario Köppen. 2000. The curse of dimensionality. In 5th online world conference on soft computing in industrial applications (WSC5), Vol. 1. 4–8.
[31]
Udara E Manawadu, Mitsuhiro Kamezaki, Masaaki Ishikawa, Takahiro Kawano, and Shigeki Sugano. 2017. A multimodal human-machine interface enabling situation-Adaptive control inputs for highly automated vehicles. In 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1195–1200.
[32]
Rafael Math, Angela Mahr, Mohammad M Moniri, and Christian Müller. 2013. OpenDS: A new open-source driving simulator for research. GMM-Fachbericht-AmE 2013 (2013).
[33]
M Jehanzeb Mirza, Marc Masana, Horst Possegger, and Horst Bischof. 2022. An efficient domain-incremental learning approach to drive in all weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3001–3011.
[34]
Teruhisa Misu, Antoine Raux, Rakesh Gupta, and Ian Lane. 2014. Situated language understanding at 25 miles per hour. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue. ACL, 22–31.
[35]
Mohammad Mehdi Moniri and Christian Müller. 2012. Multimodal reference resolution for mobile spatial interaction in urban environments. In Proceedings of the 4th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 241–248.
[36]
Santiago Montiel-Marín, Carlos Gómez-Huélamo, Javier de la Peña, Miguel Antunes, Elena López-Guillén, and Luis M Bergasa. 2022. Towards LiDAR and RADAR Fusion for Object Detection and Multi-object Tracking in CARLA Simulator. In ROBOT2022: Fifth Iberian Robotics Conference: Advances in Robotics, Volume 2. Springer, 552–563.
[37]
Prajval Kumar Murali, Mohsen Kaboli, and Ravinder Dahiya. 2022. Intelligent In-Vehicle Interaction Technologies. Advanced Intelligent Systems 4, 2 (2022), 2100122.
[38]
Robert Neßelrath, Mohammad Mehdi Moniri, and Michael Feld. 2016. Combining speech, gaze, and micro-gestures for the multimodal control of in-car functions. In Proceedings of the 12th International Conference on Intelligent Environments. IEEE, 190–193.
[39]
Natalia Neverova, Christian Wolf, Graham Taylor, and Florian Nebout. 2015. Moddrop: adaptive multi-modal gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8 (2015), 1692–1706.
[40]
Eshed Ohn-Bar, Sujitha Martin, Ashish Tawari, and Mohan M Trivedi. 2014. Head, eye, and hand patterns for driver activity recognition. In Proceedings of the 22nd International Conference on Pattern Recognition. IEEE, 660–665.
[41]
Tony Poitschke, Florian Laquai, Stilyan Stamboliev, and Gerhard Rigoll. 2011. Gaze-based interaction on multiple displays in an automotive environment. In Proceedings of the International Conference on Systems, Man, and Cybernetics. IEEE, 543–548.
[42]
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001–2010.
[43]
Guillermo Reyes, Amr Gomaa, and Michael Feld. 2023. It’s All about You: Personalized in-Vehicle Gesture Recognition with a Time-of-Flight Camera(AutomotiveUI ’23). Association for Computing Machinery, New York, NY, USA, 234–243. https://doi.org/10.1145/3580585.3607153
[44]
Seth Rogers, C-N Fiechter, and Cynthia Thompson. 2000. Adaptive user interfaces for automotive environments. In Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No. 00TH8511). IEEE, 662–667.
[45]
Florian Roider and Tom Gross. 2018. I See Your Point: Integrating Gaze to Enhance Pointing Gesture Accuracy While Driving. In Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 351–358.
[46]
Florian Roider, Sonja Rümelin, Bastian Pfleging, and Tom Gross. 2017. The effects of situational demands on gaze, speech and gesture input in the vehicle. In Proceedings of the 9th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 94–102.
[47]
Sonja Rümelin, Chadly Marouane, and Andreas Butz. 2013. Free-hand pointing for identification and interaction with distant objects. In Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. ACM, 40–47.
[48]
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. 2018. How does batch normalization help optimization?Advances in neural information processing systems 31 (2018).
[49]
Tevfik Metin Sezgin, Ian Davies, and Peter Robinson. 2009. Multimodal inference for driver-vehicle interaction. In Proceedings of the 11th International Conference on Multimodal Interfaces. ACM, 193–198.
[50]
Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242–264.
[51]
John Urry. 2016. Mobilities: new perspectives on transport and society. Routledge.
[52]
Gido M Van de Ven and Andreas S Tolias. 2019. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734 (2019).
[53]
Borhan Vasli, Sujitha Martin, and Mohan Manubhai Trivedi. 2016. On driver gaze estimation: Explorations and fusion of geometric and data driven approaches. In Proceedings of the 19th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 655–660.
[54]
Francisco Vicente, Zehua Huang, Xuehan Xiong, Fernando De la Torre, Wende Zhang, and Dan Levi. 2015. Driver gaze tracking and eyes off the road detection system. IEEE Transactions on Intelligent Transportation Systems 16, 4 (2015), 2014–2027.
[55]
Sourabh Vora, Akshay Rangesh, and Mohan M Trivedi. 2017. On generalizing driver gaze zone estimation using convolutional neural networks. In Proceedings of the Intelligent Vehicles Symposium (IV). IEEE, 849–854.
[56]
Hans Weytjens and Jochen De Weerdt. 2020. Process outcome prediction: CNN vs. LSTM (with attention). In Business Process Management Workshops: BPM 2020 International Workshops, Seville, Spain, September 13–18, 2020, Revised Selected Papers 18. Springer, 321–333.
[57]
Yutian Wu, Yueyu Wang, Shuwei Zhang, and Harutoshi Ogai. 2020. Deep 3D object detection networks using LiDAR data: A review. IEEE Sensors Journal 21, 2 (2020), 1152–1171.
[58]
Yanxia Zhang, Sophie Stellmach, Abigail Sellen, and Andrew Blake. 2015. The costs and benefits of combining gaze and hand gestures for remote interaction. In Human-Computer Interaction – INTERACT 2015. Springer, 570–577.
[59]
Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, and Ziwei Liu. 2023. Deep class-incremental learning: A survey. arXiv preprint arXiv:2302.03648 (2023).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '24: Proceedings of the 29th International Conference on Intelligent User Interfaces
March 2024
955 pages
ISBN:9798400705083
DOI:10.1145/3640543
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 April 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Adaptive Models
  2. Gaze Tracking
  3. Human-Centered Artificial Intelligence
  4. Incremental Learning
  5. Object Referencing
  6. Online Learning
  7. Personalization
  8. Pointing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

IUI '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 254
    Total Downloads
  • Downloads (Last 12 months)254
  • Downloads (Last 6 weeks)12
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media