skip to main content
10.1145/3382507.3421156acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference

Published: 22 October 2020 Publication History

Abstract

A socially acceptable robot needs to make correct decisions and be able to understand human intent in order to interact with and navigate around humans safely. Although research in computer vision and robotics has made huge advance in recent years, today's robotics systems still need better understanding of human intent to be more effective and widely accepted. Currently such inference is typically done using only one mode of perception such as vision, or human movement trajectory. In this extended abstract, I describe my PhD research plan of developing a novel multimodal and context-aware framework, in which a robot infers human navigational intentions through multimodal perception comprised of human temporal facial, body pose and gaze features, human motion feature as well as environmental context. To facility this framework, a data collection experiment is designed to acquire multimodal human-robot interaction data. Our initial design of the framework is based on a temporal neural network model with human motion, body pose and head orientation features as input. And we will increase the complexity of the neural network model as well as the input features along the way. In the long term, this framework can benefit a variety of settings such as autonomous driving, service and household robots.

Supplementary Material

MP4 File (3382507.3421156.mp4)
This is video presentation of paper ?Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference?. The presentation covers three major parts: motivation, research plan and preliminary result of the author?s PhD research. \r\n

References

[1]
Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--971.
[2]
Yu Cao, Daniel Barrett, Andrei Barbu, Siddharth Narayanaswamy, Haonan Yu, Aaron Michaux, Yuewei Lin, Sven Dickinson, Jeffrey Mark Siskind, and Song Wang. 2013. Recognize human activities from partially observed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2658--2665.
[3]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multiperson 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.
[4]
Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, and Dinesh Manocha. 2019. Traphic: Trajectory prediction in dense and heterogeneous traffic using weighted interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8483--8492.
[5]
Mo Chen and Claire J Tomlin. 2018. Hamilton--Jacobi reachability: Some recent theoretical advances and applications in unmanned airspace management. Annual Review of Control, Robotics, and Autonomous Systems 1 (2018), 333--358.
[6]
Nemanja Djuric, Vladan Radosavljevic, Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, and Jeff Schneider. 2018. Short-term motion prediction of traffic actors for autonomous driving using deep convolutional networks. arXiv preprint arXiv:1808.05819 (2018).
[7]
David Fridovich-Keil, Sylvia L Herbert, Jaime F Fisac, Sampada Deglurkar, and Claire J Tomlin. 2018. Planning, fast and slow: A framework for adaptive real-time safe trajectory planning. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 387--394.
[8]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[9]
Chang Huang, Bo Wu, and Ramakant Nevatia. 2008. Robust object tracking by hierarchical association of detection responses. In European Conference on Computer Vision. Springer, 788--801.
[10]
Siyu Huang, Xi Li, Zhongfei Zhang, Zhouzhou He, Fei Wu, Wei Liu, Jinhui Tang, and Yueting Zhuang. 2016. Deep learning driven visual path prediction from a single image. IEEE Transactions on Image Processing 25, 12 (2016), 5892--5904.
[11]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. In Advances in neural information processing systems. 2017-- 2025.
[12]
Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, and Ashutosh Saxena. 2016. Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3118--3125.
[13]
Durk P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in neural information processing systems. 3581--3589.
[14]
Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In European Conference on Computer Vision. Springer, 201--214.
[15]
Julian Francisco Pieter Kooij, Nicolas Schneider, Fabian Flohr, and Dariu M Gavrila. 2014. Context-based pedestrian path prediction. In European Conference on Computer Vision. Springer, 618--633.
[16]
Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. 2007. Crowds by example. In Computer graphics forum, Vol. 26. Wiley Online Library, 655--664.
[17]
Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G Hauptmann, and Li Fei-Fei. 2019. Peeking into the future: Predicting future person activities and locations in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5725--5734.
[18]
Kostya Linou. 2016. NBA-Player-Movements. https://github.com/linouk23/NBAPlayer-Movements.
[19]
Shugao Ma, Leonid Sigal, and Stan Sclaroff. 2016. Learning activity progression in lstms for activity detection and early detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1942--1950.
[20]
Wei-Chiu Ma, De-An Huang, Namhoon Lee, and Kris M Kitani. 2017. Forecasting interactive dynamics of pedestrians with fictitious play. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 774--782.
[21]
Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. 2009. You'll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the 12th IEEE International Conference on Computer Vision. IEEE, 261--268.
[22]
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: an open-source Robot Operating System. In ICRA workshop on open source software, Vol. 3. Kobe, Japan, 5.
[23]
Claire Rivoire and Angelica Lim. 2016. Habit detection within a long-term interaction with a social robot: an exploratory study. In Proceedings of the International Workshop on Social Learning and Multimodal Interaction for Designing Artificial Agents. 1--6.
[24]
Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. 2016. Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vision. Springer, 549--565.
[25]
Michael S Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In 2011 International Conference on Computer Vision. IEEE, 1036--1043.
[26]
Nicolas Schneider and Dariu M Gavrila. 2013. Pedestrian path prediction with recursive bayesian filters: A comparative study. In German Conference on Pattern Recognition. Springer, 174--183.
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[28]
Dan Xie, Tianmin Shu, Sinisa Todorovic, and Song-Chun Zhu. 2017. Learning and inferring 'dark matter' and predicting human intents and trajectories in videos. IEEE transactions on pattern analysis and machine intelligence 40, 7 (2017), 1639--1652.
[29]
Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. 2018. Future person localization in first-person videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7593--7602.

Cited By

View all

Index Terms

  1. Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
        October 2020
        920 pages
        ISBN:9781450375818
        DOI:10.1145/3382507
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 22 October 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. human-robot interaction
        2. intent inference
        3. machine learning
        4. multimodal perception

        Qualifiers

        • Short-paper

        Conference

        ICMI '20
        Sponsor:
        ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
        October 25 - 29, 2020
        Virtual Event, Netherlands

        Acceptance Rates

        Overall Acceptance Rate 453 of 1,080 submissions, 42%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)17
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 03 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media