short-paper

Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference

Author:

Zhitian ZhangAuthors Info & Claims

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

Pages 738 - 742

https://doi.org/10.1145/3382507.3421156

Published: 22 October 2020 Publication History

Abstract

A socially acceptable robot needs to make correct decisions and be able to understand human intent in order to interact with and navigate around humans safely. Although research in computer vision and robotics has made huge advance in recent years, today's robotics systems still need better understanding of human intent to be more effective and widely accepted. Currently such inference is typically done using only one mode of perception such as vision, or human movement trajectory. In this extended abstract, I describe my PhD research plan of developing a novel multimodal and context-aware framework, in which a robot infers human navigational intentions through multimodal perception comprised of human temporal facial, body pose and gaze features, human motion feature as well as environmental context. To facility this framework, a data collection experiment is designed to acquire multimodal human-robot interaction data. Our initial design of the framework is based on a temporal neural network model with human motion, body pose and head orientation features as input. And we will increase the complexity of the neural network model as well as the input features along the way. In the long term, this framework can benefit a variety of settings such as autonomous driving, service and household robots.

Supplementary Material

MP4 File (3382507.3421156.mp4)

This is video presentation of paper ?Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference?. The presentation covers three major parts: motivation, research plan and preliminary result of the author?s PhD research. \r\n

Download
47.30 MB

References

[1]

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--971.

[2]

Yu Cao, Daniel Barrett, Andrei Barbu, Siddharth Narayanaswamy, Haonan Yu, Aaron Michaux, Yuewei Lin, Sven Dickinson, Jeffrey Mark Siskind, and Song Wang. 2013. Recognize human activities from partially observed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2658--2665.

Digital Library

[3]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multiperson 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.

[4]

Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, and Dinesh Manocha. 2019. Traphic: Trajectory prediction in dense and heterogeneous traffic using weighted interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8483--8492.

[5]

Mo Chen and Claire J Tomlin. 2018. Hamilton--Jacobi reachability: Some recent theoretical advances and applications in unmanned airspace management. Annual Review of Control, Robotics, and Autonomous Systems 1 (2018), 333--358.

[6]

Nemanja Djuric, Vladan Radosavljevic, Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, and Jeff Schneider. 2018. Short-term motion prediction of traffic actors for autonomous driving using deep convolutional networks. arXiv preprint arXiv:1808.05819 (2018).

[7]

David Fridovich-Keil, Sylvia L Herbert, Jaime F Fisac, Sampada Deglurkar, and Claire J Tomlin. 2018. Planning, fast and slow: A framework for adaptive real-time safe trajectory planning. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 387--394.

[8]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

[9]

Chang Huang, Bo Wu, and Ramakant Nevatia. 2008. Robust object tracking by hierarchical association of detection responses. In European Conference on Computer Vision. Springer, 788--801.

Digital Library

[10]

Siyu Huang, Xi Li, Zhongfei Zhang, Zhouzhou He, Fei Wu, Wei Liu, Jinhui Tang, and Yueting Zhuang. 2016. Deep learning driven visual path prediction from a single image. IEEE Transactions on Image Processing 25, 12 (2016), 5892--5904.

Digital Library

[11]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. In Advances in neural information processing systems. 2017-- 2025.

[12]

Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, and Ashutosh Saxena. 2016. Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3118--3125.

[13]

Durk P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in neural information processing systems. 3581--3589.

[14]

Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. 2012. Activity forecasting. In European Conference on Computer Vision. Springer, 201--214.

Digital Library

[15]

Julian Francisco Pieter Kooij, Nicolas Schneider, Fabian Flohr, and Dariu M Gavrila. 2014. Context-based pedestrian path prediction. In European Conference on Computer Vision. Springer, 618--633.

[16]

Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. 2007. Crowds by example. In Computer graphics forum, Vol. 26. Wiley Online Library, 655--664.

[17]

Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G Hauptmann, and Li Fei-Fei. 2019. Peeking into the future: Predicting future person activities and locations in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5725--5734.

[18]

Kostya Linou. 2016. NBA-Player-Movements. https://github.com/linouk23/NBAPlayer-Movements.

[19]

Shugao Ma, Leonid Sigal, and Stan Sclaroff. 2016. Learning activity progression in lstms for activity detection and early detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1942--1950.

[20]

Wei-Chiu Ma, De-An Huang, Namhoon Lee, and Kris M Kitani. 2017. Forecasting interactive dynamics of pedestrians with fictitious play. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 774--782.

[21]

Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. 2009. You'll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the 12th IEEE International Conference on Computer Vision. IEEE, 261--268.

[22]

Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: an open-source Robot Operating System. In ICRA workshop on open source software, Vol. 3. Kobe, Japan, 5.

[23]

Claire Rivoire and Angelica Lim. 2016. Habit detection within a long-term interaction with a social robot: an exploratory study. In Proceedings of the International Workshop on Social Learning and Multimodal Interaction for Designing Artificial Agents. 1--6.

Digital Library

[24]

Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. 2016. Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vision. Springer, 549--565.

[25]

Michael S Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In 2011 International Conference on Computer Vision. IEEE, 1036--1043.

Digital Library

[26]

Nicolas Schneider and Dariu M Gavrila. 2013. Pedestrian path prediction with recursive bayesian filters: A comparative study. In German Conference on Pattern Recognition. Springer, 174--183.

[27]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[28]

Dan Xie, Tianmin Shu, Sinisa Todorovic, and Song-Chun Zhu. 2017. Learning and inferring 'dark matter' and predicting human intents and trajectories in videos. IEEE transactions on pattern analysis and machine intelligence 40, 7 (2017), 1639--1652.

[29]

Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. 2018. Future person localization in first-person videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7593--7602.

Cited By

Zhang ZRhim JLim AChen M(2021)A Multimodal and Hybrid Framework for Human Navigational Intent Inference2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9635900(993-1000)Online publication date: 27-Sep-2021
https://doi.org/10.1109/IROS51168.2021.9635900

Index Terms

Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Probabilistic Human Intent Recognition for Shared Autonomy in Assistive Robotics

Effective human-robot collaboration in shared autonomy requires reasoning about the intentions of the human partner. To provide meaningful assistance, the autonomy has to first correctly predict, or infer, the intended goal of the human collaborator. In ...
A Survey of Multimodal Perception Methods for Human–Robot Interaction in Social Environments
Human–robot interaction (HRI) in human social environments (HSEs) poses unique challenges for robot perception systems, which must combine asynchronous, heterogeneous data streams in real time. Multimodal perception systems are well-suited for HRI in HSEs ...
A Multimodal and Hybrid Framework for Human Navigational Intent Inference
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Understanding human navigational intent is essential for robots to be able to interact with and navigate around humans safely and naturally. Current methods typically perform inference through only one mode of perception such as human motion trajectory, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

October 2020

920 pages

ISBN:9781450375818

DOI:10.1145/3382507

General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICMI '20

Sponsor:

SIGCHI

ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 25 - 29, 2020

Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
182
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZRhim JLim AChen M(2021)A Multimodal and Hybrid Framework for Human Navigational Intent Inference2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9635900(993-1000)Online publication date: 27-Sep-2021
https://doi.org/10.1109/IROS51168.2021.9635900

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents