skip to main content
10.1145/3343031.3351073acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Prediction-CGAN: Human Action Prediction with Conditional Generative Adversarial Networks

Published: 15 October 2019 Publication History

Abstract

The underlying challenge of human action prediction, i.e. maintaining prediction accuracy at very beginning of an action execution, is still not well handled. In this paper, we propose a Prediction Conditional Generative Adversarial Network (Prediction-CGAN) for predicting action, which shares information between completely observed and partially observed videos. Instead of generating future frames, we aim at completing visual representations of unfinished video, which can be directly utilized to predict action label no matter at any progress levels. The Prediction-CGAN incorporates the completion constraint to learn a transformation from incomplete actions to complete actions; the adversarial constraint to ensure the generation has similar discriminative power to complete representation; the label consistency constraint to encourage label consistency between each segment and its corresponding complete video; and the confidence monotonically increasing constraint to yield increasingly accurate predictions as observing more frames. Meanwhile, we introduce a novel adversarial criterion especially for prediction task, which requires the generation is more discriminative than its corresponding incomplete representation, while the generation is less discriminative than its real complete representation. In experiments, we present adequate evaluations to show that the proposed Prediction-CGAN outperforms state-of-the-art methods in action prediction.

References

[1]
Mohammad Sadegh Aliakbarian, F Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, and Lars Andersson. 2017. Encouraging lstms to anticipate actions very early. In IEEE International Conference on Computer Vision (ICCV), Vol. 1.
[2]
Yu Cao, Daniel Barrett, Andrei Barbu, Siddharth Narayanaswamy, Haonan Yu, Aaron Michaux, Yuewei Lin, Sven Dickinson, Jeffrey Mark Siskind, and Song Wang. 2013. Recognize human activities from partially observed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2658--2665.
[3]
Jo?o Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Computer Vision and Pattern Recognition. 4724--4733.
[4]
Lei Chen, Jiwen Lu, Zhanjie Song, and Jie Zhou. 2018. Part-Activated Deep Reinforcement Learning for Action Prediction. In Proceedings of the European Conference on Computer Vision (ECCV). 421--436.
[5]
Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Sha Fei. 2012. Marginalized Denoising Autoencoders for Domain Adaptation. Computer Science (2012).
[6]
Tran Du, Lubomir Bourdev, Rob Fergus, and Lorenzo Torresani. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In IEEE International Conference on Computer Vision. 4489--4497.
[7]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[8]
Minh Hoai and Fernando De la Torre. 2014. Max-margin early event detectors. International Journal of Computer Vision, Vol. 107, 2 (2014), 191--202.
[9]
Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, and Ashutosh Saxena. 2016. Recurrent neural networks for driver activity anticipation via sensory-fusion architecture. In Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 3118--3125.
[10]
Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J Black. 2013. Towards understanding action recognition. In Proceedings of the IEEE international conference on computer vision. 3192--3199.
[11]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).
[12]
Y. Kong and Y. Fu. 2016. Max-Margin Action Prediction Machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, 9 (2016), 1844--1858.
[13]
Yu Kong and Yun Fu. 2018. Human Action Recognition and Prediction: A Survey. arXiv preprint arXiv:1806.11230 (2018).
[14]
Yu Kong, Shangqian Gao, Bin Sun, and Yun Fu. 2018. Action Prediction From Videos via Memorizing Hard-to-Predict Samples. In AAAI .
[15]
Yu Kong, Yunde Jia, and Yun Fu. 2014a. Interactive phrases: Semantic descriptionsfor human interaction recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 9 (2014), 1775--1788.
[16]
Yu Kong, Dmitry Kit, and Yun Fu. 2014b. A discriminative model with multiple temporal scales for action prediction. In European conference on computer vision. Springer, 596--611.
[17]
Yu Kong, Zhiqiang Tao, and Yun Fu. 2017. Deep sequential context networks for action prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1473--1481.
[18]
Tian Lan, Tsung-Chuan Chen, and Silvio Savarese. 2014. A hierarchical representation for future action prediction. In European Conference on Computer Vision. Springer, 689--704.
[19]
Van Der Maaten Laurens. 2014. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, Vol. 15, 1 (2014), 3221--3245.
[20]
Kang Li, Jie Hu, and Yun Fu. 2012. Modeling complex temporal composition of actionlets for activity prediction. In European conference on computer vision. Springer, 286--299.
[21]
Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, and Alex C Kot. 2018. SSNet: scale selection network for online 3D action prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8349--8358.
[22]
Shugao Ma, Leonid Sigal, and Stan Sclaroff. 2016. Learning Activity Progression in LSTMs for Activity Detection and Early Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 1942--1950.
[23]
Michael Mathieu, Camille Couprie, and Yann LeCun. 2015. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015).
[24]
Jie Qin, Li Liu, Ling Shao, Bingbing Ni, Chen Chen, Fumin Shen, and Yunhong Wang. 2017. Binary coding for partial action analysis with limited observation ratios. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 146--155.
[25]
Fahimeh Rezazadegan, Sareh Shirazi, Mahsa Baktashmotlagh, and Larry S Davis. 2018. On encoding temporal evolution for real-time action prediction. International Journal of Computer Vision (2018).
[26]
Yu Runsheng, Shi Zhenyu, Ma Qiongxiong, and Qing Laiyun. 2017. Predictive Learning: Using Future Representation Learning Variantial Autoencoder for Human Action Prediction. (2017).
[27]
Michael S Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 1036--1043.
[28]
Michael S Ryoo and JK Aggarwal. 2010. UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). In IEEE International Conference on Pattern Recognition Workshops, Vol. 2. 4.
[29]
Yuge Shi, Basura Fernando, and Richard Hartley. 2018. Action anticipation with rbf kernelized feature mapping rnn. In Proceedings of the European Conference on Computer Vision (ECCV). 301--317.
[30]
Khurram Soomro, Haroon Idrees, and Mubarak Shah. 2016a. Online Localization and Prediction of Actions and Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PP, 99 (2016), 1--1.
[31]
Khurram Soomro, Haroon Idrees, and Mubarak Shah. 2016b. Predicting the where and what of actors and actions through online action localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2648--2657.
[32]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
[33]
Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Anticipating visual representations from unlabeled video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 98--106.
[34]
Carl Vondrick and Antonio Torralba. 2017. Generating the Future with Adversarial Transformers. In IEEE Conference on Computer Vision and Pattern Recognition. 2992--3000.
[35]
Jingwei Xu, Bingbing Ni, Zefan Li, Shuo Cheng, and Xiaokang Yang. 2018. Structure Preserving Video Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1460--1469.
[36]
Zhen Xu, Laiyun Qing, and Jun Miao. 2016. Activity Auto-Completion: Predicting Human Activities from Partial Videos. In IEEE International Conference on Computer Vision. 3191--3199.

Cited By

View all

Index Terms

  1. Prediction-CGAN: Human Action Prediction with Conditional Generative Adversarial Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '19: Proceedings of the 27th ACM International Conference on Multimedia
    October 2019
    2794 pages
    ISBN:9781450368896
    DOI:10.1145/3343031
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. action prediction
    2. completion transformation
    3. conditional generative adversarial networks
    4. label consistency preservation
    5. temporal evolution preservation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '19
    Sponsor:

    Acceptance Rates

    MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media