research-article

Prediction-CGAN: Human Action Prediction with Conditional Generative Adversarial Networks

Authors:

Zhenjiang Miao,

Qiang JiAuthors Info & Claims

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 611 - 619

https://doi.org/10.1145/3343031.3351073

Published: 15 October 2019 Publication History

Abstract

The underlying challenge of human action prediction, i.e. maintaining prediction accuracy at very beginning of an action execution, is still not well handled. In this paper, we propose a Prediction Conditional Generative Adversarial Network (Prediction-CGAN) for predicting action, which shares information between completely observed and partially observed videos. Instead of generating future frames, we aim at completing visual representations of unfinished video, which can be directly utilized to predict action label no matter at any progress levels. The Prediction-CGAN incorporates the completion constraint to learn a transformation from incomplete actions to complete actions; the adversarial constraint to ensure the generation has similar discriminative power to complete representation; the label consistency constraint to encourage label consistency between each segment and its corresponding complete video; and the confidence monotonically increasing constraint to yield increasingly accurate predictions as observing more frames. Meanwhile, we introduce a novel adversarial criterion especially for prediction task, which requires the generation is more discriminative than its corresponding incomplete representation, while the generation is less discriminative than its real complete representation. In experiments, we present adequate evaluations to show that the proposed Prediction-CGAN outperforms state-of-the-art methods in action prediction.

References

[1]

Mohammad Sadegh Aliakbarian, F Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, and Lars Andersson. 2017. Encouraging lstms to anticipate actions very early. In IEEE International Conference on Computer Vision (ICCV), Vol. 1.

[2]

Yu Cao, Daniel Barrett, Andrei Barbu, Siddharth Narayanaswamy, Haonan Yu, Aaron Michaux, Yuewei Lin, Sven Dickinson, Jeffrey Mark Siskind, and Song Wang. 2013. Recognize human activities from partially observed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2658--2665.

Digital Library

[3]

Jo?o Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Computer Vision and Pattern Recognition. 4724--4733.

[4]

Lei Chen, Jiwen Lu, Zhanjie Song, and Jie Zhou. 2018. Part-Activated Deep Reinforcement Learning for Action Prediction. In Proceedings of the European Conference on Computer Vision (ECCV). 421--436.

[5]

Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Sha Fei. 2012. Marginalized Denoising Autoencoders for Domain Adaptation. Computer Science (2012).

[6]

Tran Du, Lubomir Bourdev, Rob Fergus, and Lorenzo Torresani. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In IEEE International Conference on Computer Vision. 4489--4497.

[7]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

[8]

Minh Hoai and Fernando De la Torre. 2014. Max-margin early event detectors. International Journal of Computer Vision, Vol. 107, 2 (2014), 191--202.

Digital Library

[9]

Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, and Ashutosh Saxena. 2016. Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 3118--3125.

Digital Library

[10]

Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J Black. 2013. Towards understanding action recognition. In Proceedings of the IEEE international conference on computer vision. 3192--3199.

Digital Library

[11]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).

[12]

Y. Kong and Y. Fu. 2016. Max-Margin Action Prediction Machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, 9 (2016), 1844--1858.

Digital Library

[13]

Yu Kong and Yun Fu. 2018. Human Action Recognition and Prediction: A Survey. arXiv preprint arXiv:1806.11230 (2018).

[14]

Yu Kong, Shangqian Gao, Bin Sun, and Yun Fu. 2018. Action Prediction From Videos via Memorizing Hard-to-Predict Samples. In AAAI .

[15]

Yu Kong, Yunde Jia, and Yun Fu. 2014a. Interactive phrases: Semantic descriptionsfor human interaction recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 9 (2014), 1775--1788.

[16]

Yu Kong, Dmitry Kit, and Yun Fu. 2014b. A discriminative model with multiple temporal scales for action prediction. In European conference on computer vision. Springer, 596--611.

[17]

Yu Kong, Zhiqiang Tao, and Yun Fu. 2017. Deep sequential context networks for action prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1473--1481.

[18]

Tian Lan, Tsung-Chuan Chen, and Silvio Savarese. 2014. A hierarchical representation for future action prediction. In European Conference on Computer Vision. Springer, 689--704.

[19]

Van Der Maaten Laurens. 2014. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, Vol. 15, 1 (2014), 3221--3245.

[20]

Kang Li, Jie Hu, and Yun Fu. 2012. Modeling complex temporal composition of actionlets for activity prediction. In European conference on computer vision. Springer, 286--299.

Digital Library

[21]

Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, and Alex C Kot. 2018. SSNet: scale selection network for online 3D action prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8349--8358.

[22]

Shugao Ma, Leonid Sigal, and Stan Sclaroff. 2016. Learning Activity Progression in LSTMs for Activity Detection and Early Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 1942--1950.

[23]

Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015).

[24]

Jie Qin, Li Liu, Ling Shao, Bingbing Ni, Chen Chen, Fumin Shen, and Yunhong Wang. 2017. Binary coding for partial action analysis with limited observation ratios. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 146--155.

[25]

Fahimeh Rezazadegan, Sareh Shirazi, Mahsa Baktashmotlagh, and Larry S Davis. 2018. On encoding temporal evolution for real-time action prediction. International Journal of Computer Vision (2018).

[26]

Yu Runsheng, Shi Zhenyu, Ma Qiongxiong, and Qing Laiyun. 2017. Predictive Learning: Using Future Representation Learning Variantial Autoencoder for Human Action Prediction. (2017).

[27]

Michael S Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 1036--1043.

Digital Library

[28]

Michael S Ryoo and JK Aggarwal. 2010. UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). In IEEE International Conference on Pattern Recognition Workshops, Vol. 2. 4.

[29]

Yuge Shi, Basura Fernando, and Richard Hartley. 2018. Action anticipation with rbf kernelized feature mapping rnn. In Proceedings of the European Conference on Computer Vision (ECCV). 301--317.

Digital Library

[30]

Khurram Soomro, Haroon Idrees, and Mubarak Shah. 2016a. Online Localization and Prediction of Actions and Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PP, 99 (2016), 1--1.

[31]

Khurram Soomro, Haroon Idrees, and Mubarak Shah. 2016b. Predicting the where and what of actors and actions through online action localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2648--2657.

[32]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).

[33]

Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Anticipating visual representations from unlabeled video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 98--106.

[34]

Carl Vondrick and Antonio Torralba. 2017. Generating the Future with Adversarial Transformers. In IEEE Conference on Computer Vision and Pattern Recognition. 2992--3000.

[35]

Jingwei Xu, Bingbing Ni, Zefan Li, Shuo Cheng, and Xiaokang Yang. 2018. Structure Preserving Video Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1460--1469.

[36]

Zhen Xu, Laiyun Qing, and Jun Miao. 2016. Activity Auto-Completion: Predicting Human Activities from Partial Videos. In IEEE International Conference on Computer Vision. 3191--3199.

Cited By

Xu WMiao ZTian YCen YWan LXiaole MCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Probabilistic Distillation Transformer: Modelling Uncertainties for Visual Abductive ReasoningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680791(8865-8873)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680791
Liu XYin JGuo DLiu H(2024)Rich Action-Semantic Consistent Knowledge for Early Action PredictionIEEE Transactions on Image Processing10.1109/TIP.2023.334573733(479-492)Online publication date: 2024
https://doi.org/10.1109/TIP.2023.3345737
Li TLuo YZhang WDuan LLiu J(2024)HARDer-Net: Hardness-Guided Discrimination Network for 3D Early Activity PredictionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342918234:12(12112-12126)Online publication date: Dec-2024
https://doi.org/10.1109/TCSVT.2024.3429182
Show More Cited By

Index Terms

Prediction-CGAN: Human Action Prediction with Conditional Generative Adversarial Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Part-Activated Deep Reinforcement Learning for Action Prediction
Computer Vision – ECCV 2018
Abstract
In this paper, we propose a part-activated deep reinforcement learning (PA-DRL) method for action prediction. Most existing methods for action prediction utilize the evolution of whole frames to model actions, which cannot avoid the noise of the ...
Action Prediction During Human-Object Interaction Based on DTW and Early Fusion of Human and Object Representations
Computer Vision Systems
Abstract
Action prediction is defined as the inference of an action label while the action is still ongoing. Such a capability is extremely useful for early response and further action planning. In this paper, we consider the problem of action prediction ...
What to Do and Where to Go Next? Action Prediction in Soccer Using Multimodal Co-Attention Transformer
MMSports '24: Proceedings of the 7th ACM International Workshop on Multimedia Content Analysis in Sports

Approximately 3,000 on-ball actions occur per match in soccer, and evaluation of individual player actions in a match is essential for strategic decision support and recruitment processes. Previous studies on such evaluation have been conducted to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

October 2019

2794 pages

ISBN:9781450368896

DOI:10.1145/3343031

General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Technology R&D Program of China
National Natural Science Foundation of China
China Postdoctoral Science Foundation

Conference

MM '19

Sponsor:

SIGMM

MM '19: The 27th ACM International Conference on Multimedia

October 21 - 25, 2019

Nice, France

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
517
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)9

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu WMiao ZTian YCen YWan LXiaole MCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Probabilistic Distillation Transformer: Modelling Uncertainties for Visual Abductive ReasoningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680791(8865-8873)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680791
Liu XYin JGuo DLiu H(2024)Rich Action-Semantic Consistent Knowledge for Early Action PredictionIEEE Transactions on Image Processing10.1109/TIP.2023.334573733(479-492)Online publication date: 2024
https://doi.org/10.1109/TIP.2023.3345737
Li TLuo YZhang WDuan LLiu J(2024)HARDer-Net: Hardness-Guided Discrimination Network for 3D Early Activity PredictionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342918234:12(12112-12126)Online publication date: Dec-2024
https://doi.org/10.1109/TCSVT.2024.3429182
Gavali AKakarwal S(2024)Enhancing early action prediction in videos through temporal composition of sub-actionsMultimedia Tools and Applications10.1007/s11042-024-18870-0Online publication date: 18-Mar-2024
https://doi.org/10.1007/s11042-024-18870-0
Jaramillo IChola CJeong JOh JJung HLee JLee WKim T(2023)Human Activity Prediction Based on Forecasted IMU Activity Signals by Sequence-to-Sequence Deep Neural NetworksSensors10.3390/s2314649123:14(6491)Online publication date: 18-Jul-2023
https://doi.org/10.3390/s23146491
Wang RLiu JKe QPeng DLei Y(2023)Dear-Net: Learning Diversities for Skeleton-Based Early Action RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.313976825(1175-1189)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3139768
Stergiou ADamen D(2023)The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01413(14709-14719)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01413
Bhagat PBhalchandra A(2023)Gesture Analysis Using Image Processing: For Detection of Suspicious Human ActionsThird Congress on Intelligent Systems10.1007/978-981-19-9225-4_38(515-530)Online publication date: 12-Mar-2023
https://doi.org/10.1007/978-981-19-9225-4_38
Rakhimberdina ZLiu XMurata T(2022)Strengthening Robustness Under Adversarial Attacks Using Brain Visual CodesIEEE Access10.1109/ACCESS.2022.320499510(96149-96158)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3204995
Foo LLi TRahmani HKe QLiu J(2022)ERA: Expert Retrieval and Assembly for Early Action PredictionComputer Vision – ECCV 202210.1007/978-3-031-19830-4_38(670-688)Online publication date: 22-Oct-2022
https://doi.org/10.1007/978-3-031-19830-4_38
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents