Approach 2D Multi-Object Tracking in 3D

Approach 2D Multi Object Tracking in 3D

Motivation

One of the main challenges in multi-object tracking (MOT) is handling occlusions—situations where one object temporarily blocks another from view. When an object reappears after being occluded, maintaining its correct identity becomes difficult. Most current MOT approaches combine motion prediction and re-identification (ReID) to address this issue. However, because image data is a 2D projection of a 3D world, important geometric information is lost, making accurate motion estimation challenging. Scenarios that are trivial to resolve in 3D often become ambiguous in 2D space. Recent advances in monocular depth estimation offer a promising direction: by reconstructing approximate 3D geometry from single images, it may be possible to perform tracking directly in 3D (or 2.5D) space, potentially reducing occlusion-related errors and improving association robustness.

Project

This project explores the integration of depth information into multi-object tracking systems to improve association performance in complex visual scenes. The key idea is that estimated depth provides valuable cues for distinguishing overlapping objects and predicting motion in 3D space. While several emerging methods have begun incorporating monocular depth estimation into MOT, their performance improvements have so far been limited, often because the available depth information is not effectively exploited. The project will begin with a simple heuristic baseline that uses depth to enhance object association. From there, the work can branch into different directions depending on the student’s interests and findings:

  • Zero-shot methods (e.g., McByte)
  • Learnable association methods (e.g., CamelTrack)
  • Tracking-by-propagation methods (e.g., MOTE)

Thesis

Within the context of this project, several research questions could be explored. Depending on personal interests or your own ideas, the focus of the thesis can be shifted accordingly.

  • In which scenarios do monocular depth estimation methods excel or fail?
  • What could be a simple baseline for depth-based MOT?
  • Can meaningful performance gains be achieved by incorporating depth information?
  • Is it possible to leverage depth effectively without introducing learnable components?
  • Can depth information enhance tracking-by-propagation models in detecting and maintaining object trajectories?

Prerequisites

The following skills are required:

  • Familiar with Python (Pytorch, Numpy, Scipy would be helpful)
  • Basic machine learning and deep learning knowledge

Contact

To apply please email Jan Frederik Meier stating your interest in this project and detailing your relevant skills. A part of this project could be also a lab rotation.

Neural Data Science Group
Institute of Computer Science
University of Goettingen