Background-free latents for improved generalization

Background-free latents for improved generalization

CLIP model

Source: https://openai.com/index/clip/

Motivation

Background-bias, where the model learns spurios correlations instead of the intended classification task, is an important and very common issue in computer vision. In recent years, it became more and more common to use latent representations from large pretrained models like CLIP/SigLIP2 or DINOv2 instead of raw pixel inputs as input to models to solve various computer vision tasks. This project asks the question: Can we reduce background bias and improve generalization by removing background information from latent representations before the actual model sees them?

Project

We are exploring options to remove background information from CLIP (and possibly DINOv2) latent representations. For this we try to exploit textual information about the background and foreground information (CLIP embeddings are aligned with a text encoder) and/or paired background-only images. We will likely work with the PanAf-FGBG dataset and similar background bias datasets.

This project will involve: Extracting latent representation for datasets using large pretrained models; Manipulating those latent representations using PyTorch or NumPy; Training Transformer-based classification models that take the modified latent representations as input.

Prerequisites

The following skills are required:

  • Familiar with Python (Pytorch, Numpy, Scipy would be helpful)
  • Basic machine learning and deep learning knowledge

Contact

To apply please email Felix Müller stating your interest in this project and detailing your relevant skills. A part of this project could be also a lab rotation.

Neural Data Science Group
Institute of Computer Science
University of Goettingen