The Power of Next-Frame Prediction for Learning Physical Laws

Winterbottom, Thomas; Hudson, G. Thomas; Kluvanec, Daniel; Slack, Dean; Sterling, Jamie; Shentu, Junjie; Xiao, Chenghao; Zhou, Zheming; Moubayed, Noura Al

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.17450 (cs)

[Submitted on 21 May 2024]

Title:The Power of Next-Frame Prediction for Learning Physical Laws

Authors:Thomas Winterbottom, G. Thomas Hudson, Daniel Kluvanec, Dean Slack, Jamie Sterling, Junjie Shentu, Chenghao Xiao, Zheming Zhou, Noura Al Moubayed

View PDF

Abstract:Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to which next-frame prediction serves as a strong foundational learning strategy (analogous to language modelling) for inducing an understanding of the visual world. In order to quantify the specific visual understanding induced by next-frame prediction, we introduce six diagnostic simulation video datasets derived from fundamental physical laws created by varying physical constants such as gravity and mass. We demonstrate that our models trained only on next-frame prediction are capable of predicting the value of these physical constants (e.g. gravity) without having been trained directly to learn these constants via a regression task. We find that the generative training phase alone induces a model state that can predict physical constants significantly better than that of a random model, improving the loss by a factor of between 1.28 to 6.24. We conclude that next-frame prediction shows great promise as a general learning strategy to induce understanding of the many `laws' that govern the visual domain without the need for explicit labelling.

Comments:	7 Figures, 12 Pages, 1 Table
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
MSC classes:	68T45
ACM classes:	I.2.6; I.2.10
Cite as:	arXiv:2405.17450 [cs.CV]
	(or arXiv:2405.17450v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.17450

Submission history

From: Tom Winterbottom [view email]
[v1] Tue, 21 May 2024 17:55:54 UTC (1,941 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:The Power of Next-Frame Prediction for Learning Physical Laws

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:The Power of Next-Frame Prediction for Learning Physical Laws

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators