Skip to main content

Showing 1–26 of 26 results for author: van Steenkiste, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10920  [pdf, other

    cs.CV cs.AI cs.CL

    Benchmarking Vision Language Models for Cultural Understanding

    Authors: Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, Sjoerd van Steenkiste, Lisa Anne Hendricks, Karolina Stańczak, Aishwarya Agrawal

    Abstract: Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering… ▽ More

    Submitted 18 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2406.09292  [pdf, other

    cs.CV cs.AI cs.LG

    Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

    Authors: Ziyi Wu, Yulia Rubanova, Rishabh Kabra, Drew A. Hudson, Igor Gilitschenski, Yusuf Aytar, Sjoerd van Steenkiste, Kelsey R. Allen, Thomas Kipf

    Abstract: We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are train… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Additional details and video results are available at https://neural-assets-paper.github.io/

  3. arXiv:2311.17946  [pdf, other

    cs.CV cs.AI cs.CL

    DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

    Authors: Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

    Abstract: Despite their wide-spread success, Text-to-Image models (T2I) still struggle to produce images that are both aesthetically pleasing and faithful to the user's input text. We introduce DreamSync, a model-agnostic training algorithm by design that improves T2I models to be faithful to the text input. DreamSync builds off a recent insight from TIFA's evaluation framework -- that large vision-language… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  4. arXiv:2311.00445  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

    Authors: Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen

    Abstract: A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate such human biases, or are they able to overcome them? Focusing on the case of sy… ▽ More

    Submitted 11 April, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  5. arXiv:2310.19956  [pdf, other

    cs.CL

    The Impact of Depth on Compositional Generalization in Transformer Language Models

    Authors: Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

    Abstract: To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by theoretical and empirical work, that deeper transformers generalize more compositionally. Simply adding layers increases the total number o… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NAACL 2024

  6. arXiv:2310.06020  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

    Authors: Maximilian Seitzer, Sjoerd van Steenkiste, Thomas Kipf, Klaus Greff, Mehdi S. M. Sajjadi

    Abstract: Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene Transformer (DyST) model leverages recent work in neural scene representation to learn a latent decomposition of monocular real-world videos into scene content… ▽ More

    Submitted 15 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 spotlight. Project website: https://dyst-paper.github.io/

  7. arXiv:2306.08068  [pdf, other

    cs.CV cs.AI cs.LG

    DORSal: Diffusion for Object-centric Representations of Scenes et al

    Authors: Allan Jabri, Sjoerd van Steenkiste, Emiel Hoogeboom, Mehdi S. M. Sajjadi, Thomas Kipf

    Abstract: Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically… ▽ More

    Submitted 2 May, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted to ICLR 2024. Project page: https://www.sjoerdvansteenkiste.com/dorsal

  8. arXiv:2305.18890  [pdf, other

    cs.CV cs.LG

    Sensitivity of Slot-Based Object-Centric Models to their Number of Slots

    Authors: Roland S. Zimmermann, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Thomas Kipf, Klaus Greff

    Abstract: Self-supervised methods for learning object-centric representations have recently been applied successfully to various datasets. This progress is largely fueled by slot-based methods, whose ability to cluster visual scenes into meaningful objects holds great promise for compositional generalization and downstream learning. In these methods, the number of slots (clusters) $K$ is typically chosen to… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  9. arXiv:2302.05442  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Vision Transformers to 22 Billion Parameters

    Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

    Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  10. arXiv:2302.04973  [pdf, other

    cs.CV cs.AI cs.LG

    Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

    Authors: Ondrej Biza, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gamaleldin F. Elsayed, Aravindh Mahendran, Thomas Kipf

    Abstract: Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress in this direction. However, they typically fall short at adequately capturing spatial symmetries present in the visual world, which leads to sample inefficiency… ▽ More

    Submitted 20 July, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023. Project page: https://invariantsa.github.io/

  11. arXiv:2211.10282  [pdf, other

    cs.LG

    Exploring through Random Curiosity with General Value Functions

    Authors: Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Efficient exploration in reinforcement learning is a challenging problem commonly addressed through intrinsic rewards. Recent prominent approaches are based on state novelty or variants of artificial curiosity. However, directly applying them to partially observable environments can be ineffective and lead to premature dissipation of intrinsic rewards. Here we propose random curiosity with general… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS 2022

  12. arXiv:2206.07764  [pdf, other

    cs.CV cs.LG

    SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos

    Authors: Gamaleldin F. Elsayed, Aravindh Mahendran, Sjoerd van Steenkiste, Klaus Greff, Michael C. Mozer, Thomas Kipf

    Abstract: The visual world can be parsimoniously characterized in terms of distinct entities with sparse interactions. Discovering this compositional structure in dynamic visual scenes has proven challenging for end-to-end computer vision approaches unless explicit instance-level supervision is provided. Slot-based models leveraging motion cues have recently shown great promise in learning to represent, seg… ▽ More

    Submitted 23 December, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Project page at https://slot-attention-video.github.io/savi++/

  13. arXiv:2206.06922  [pdf, other

    cs.CV cs.AI cs.LG

    Object Scene Representation Transformer

    Authors: Mehdi S. M. Sajjadi, Daniel Duckworth, Aravindh Mahendran, Sjoerd van Steenkiste, Filip Pavetić, Mario Lučić, Leonidas J. Guibas, Klaus Greff, Thomas Kipf

    Abstract: A compositional understanding of the world in terms of objects and their geometry in 3D space is considered a cornerstone of human cognition. Facilitating the learning of such a representation in neural networks holds promise for substantially improving labeled data efficiency. As a key step in this direction, we make progress on the problem of learning 3D-consistent decompositions of complex scen… ▽ More

    Submitted 12 October, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted at NeurIPS '22. Project page: https://osrt-paper.github.io/

  14. arXiv:2203.13573  [pdf, other

    cs.LG cs.AI cs.NE

    Unsupervised Learning of Temporal Abstractions with Slot-based Transformers

    Authors: Anand Gopalakrishnan, Kazuki Irie, Jürgen Schmidhuber, Sjoerd van Steenkiste

    Abstract: The discovery of reusable sub-routines simplifies decision-making and planning in complex reinforcement learning problems. Previous approaches propose to learn such temporal abstractions in a purely unsupervised fashion through observing state-action trajectories gathered from executing a policy. However, a current limitation is that they process each trajectory in an entirely sequential manner, w… ▽ More

    Submitted 22 November, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: accepted to Neural Computation journal

  15. arXiv:2203.11194  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Test-time Adaptation with Slot-Centric Models

    Authors: Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki

    Abstract: Current visual detectors, though impressive within their training distribution, often fail to parse out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the ta… ▽ More

    Submitted 27 June, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted at ICML 2023. Project website at https://slot-tta.github.io/

  16. arXiv:2012.05208  [pdf, other

    cs.NE cs.AI cs.LG

    On the Binding Problem in Artificial Neural Networks

    Authors: Klaus Greff, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Contemporary neural networks still fall short of human-level generalization, which extends far beyond our direct experiences. In this paper, we argue that the underlying cause for this shortcoming is their inability to dynamically and flexibly bind information that is distributed throughout the network. This binding problem affects their capacity to acquire a compositional understanding of the wor… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    ACM Class: I.2.6

  17. arXiv:2011.12930  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Unsupervised Object Keypoint Learning using Local Spatial Predictability

    Authors: Anand Gopalakrishnan, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: We propose PermaKey, a novel approach to representation learning based on object keypoints. It leverages the predictability of local image regions from spatial neighborhoods to identify salient regions that correspond to object parts, which are then converted to keypoints. Unlike prior approaches, it utilizes predictability to discover object keypoints, an intrinsic property of objects. This ensur… ▽ More

    Submitted 8 March, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

    Comments: Accepted to ICLR 2021

  18. arXiv:2010.03635  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Relational Inference

    Authors: Aleksandar Stanić, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms of the complex behaviors they support. To address this, we propose a novel approach to physical reasoning that models objects as hierarchies of parts that may… ▽ More

    Submitted 14 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to AAAI 2021

    ACM Class: I.2.6

  19. arXiv:2010.02066  [pdf, other

    cs.NE cs.AI cs.LG

    Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

    Authors: Róbert Csordás, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to li… ▽ More

    Submitted 6 March, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

  20. arXiv:1910.04098  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Improving Generalization in Meta Reinforcement Learning using Learned Objectives

    Authors: Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Biological evolution has distilled the experiences of many learners into the general learning algorithms of humans. Our novel meta reinforcement learning algorithm MetaGenRL is inspired by this process. MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that decides how future individuals will learn. Unlike recent meta-RL algorithms,… ▽ More

    Submitted 14 February, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: Accepted to ICLR 2020

    ACM Class: I.2.6

  21. arXiv:1906.01035  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    A Perspective on Objects and Systematic Generalization in Model-Based RL

    Authors: Sjoerd van Steenkiste, Klaus Greff, Jürgen Schmidhuber

    Abstract: In order to meet the diverse challenges in solving many real-world problems, an intelligent agent has to be able to dynamically construct a model of its environment. Objects facilitate the modular reuse of prior knowledge and the combinatorial construction of such models. In this work, we argue that dynamically bound features (objects) do not simply emerge in connectionist models of the world. We… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: Accepted to the ICML 2019 workshop on Workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI

    ACM Class: I.2.6

  22. arXiv:1905.12506  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Are Disentangled Representations Helpful for Abstract Visual Reasoning?

    Authors: Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, Olivier Bachem

    Abstract: A disentangled representation encodes information about the salient factors of variation in the data independently. Although it is often argued that this representational format is useful in learning to solve many real-world down-stream tasks, there is little empirical evidence that supports this claim. In this paper, we conduct a large-scale study that investigates whether disentangled representa… ▽ More

    Submitted 7 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted to NeurIPS 2019

    MSC Class: I.2.6 ACM Class: I.2.6

  23. arXiv:1812.01717  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    Towards Accurate Generative Models of Video: A New Metric & Challenges

    Authors: Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, Sylvain Gelly

    Abstract: Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presen… ▽ More

    Submitted 27 March, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

  24. Investigating Object Compositionality in Generative Adversarial Networks

    Authors: Sjoerd van Steenkiste, Karol Kurach, Jürgen Schmidhuber, Sylvain Gelly

    Abstract: Deep generative models seek to recover the process with which the observed data was generated. They may be used to synthesize new samples or to subsequently extract representations. Successful approaches in the domain of images are driven by several core inductive biases. However, a bias to account for the compositional way in which humans structure a visual scene in terms of objects has frequentl… ▽ More

    Submitted 24 July, 2020; v1 submitted 17 October, 2018; originally announced October 2018.

    Comments: A preliminary version of this work (arXiv v1) appeared under the title "A Case for Object Compositionality in Deep Generative Models of Images" as a workshop paper at the NeurIPS2018 workshop on "Modeling the Physical World: Perception, Learning, and Control", and at the NeurIPS2018 workshop on "Relational Representation Learning"

    MSC Class: I.2.6 ACM Class: I.2.6

  25. arXiv:1802.10353  [pdf, other

    cs.LG cs.AI cs.NE

    Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

    Authors: Sjoerd van Steenkiste, Michael Chang, Klaus Greff, Jürgen Schmidhuber

    Abstract: Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently unobserved. In order to match real-world conditions this causal knowledge must be learned without access to supervised data. To address this problem we present a nove… ▽ More

    Submitted 28 February, 2018; originally announced February 2018.

    Comments: Accepted to ICLR 2018

    ACM Class: I.2.6

  26. arXiv:1708.03498  [pdf, other

    cs.LG cs.NE stat.ML

    Neural Expectation Maximization

    Authors: Klaus Greff, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on… ▽ More

    Submitted 4 November, 2017; v1 submitted 11 August, 2017; originally announced August 2017.

    Comments: Accepted to NIPS 2017

    ACM Class: I.2.6