Object-centric representation learning

Object-centric representation learning

Recently, there has been an increased interest in unsupervised learning of object-centric representations [1, 2, 3, 4, 5]. The key insight of these methods is that the compositionality of visual scenes can be used to both discover objects in images and videos without supervision and represent them independently of each other but with a common representational structure. To aid comparability and evaluation of these models, a first step is to build a unified framework that allows to swap components of the different models and train them in a unified way.


[1] Greff et al., 2019: Multi-object representation learning with iterative variational inference.
[2] Burgess et al., 2019: MONet: Unsupervised scene decomposition and representation.
[3] Veerapaneni et al., 2019: Entity abstraction in visual model-based reinforcement learning.
[4] Jiang et al., 2020: Scalor: Generative world models with scalable object representations.
[5] Locatello et al., 2020: Object-centric learning with slot attention.


  • Good mathematical understanding (in particular statistics and linear algebra)
  • Python programming (including Pytorch)
  • Experience in machine learning


Marissa Weis

Neural Data Science Group
Institute of Computer Science
University of Goettingen