Few-Shot Object Detection for the Primate Domain

Few-Shot Object Detection for the Primate Domain

Motivation

Training object detection models typically requires large amounts of annotated data, which is time-consuming and often impractical. Recent advances in zero-shot object detection allow models to detect previously unseen classes using only text prompts. While these models perform impressively in general domains, they often struggle when applied to specialized or visually distinct settings. Domain shifts, caused by factors like low resolution, unusual camera angles, or overexposure in camera trap videos, can lead to missed detections or misclassifications. Moreover, zero-shot model performance depends heavily on the choice of text prompt. For instance, significant performance differences can arise depending on whether the prompt uses “ape,” “primate,” or “chimpanzee”. Current state-of-the-art models also fail to detect based on semantically more complex descriptions, e.g. “lemur without tail” leads to bounding boxes around lemurs and their tail.

Project

This project explores few-shot object detection (FSOD) as a way to adapt zero-shot models more effectively to the primate domain. The idea is to fine-tune a pre-trained zero-shot model using only a small number of labeled examples from the target domain, thereby aligning the model to specific visual and semantic concepts. The work will focus on the Grounding DINO model, a state-of-the-art zero-shot detector, and investigate parameter-efficient fine-tuning (PEFT) techniques such as Visual Prompt Tuning. Extensions like Language Prompt Tuning may also be explored to optimize text-based alignment. The developed approaches will be evaluated across diverse open-source primate datasets as well as internal datasets from our lab.

A trainable Grounding DINO implementation will be provided, so the main task will consist of designing, implementing, and evaluating the proposed modifications.

Thesis

Within the context of this project, several research questions could be explored. Depending on personal interests or your own ideas, the focus of the thesis can be shifted accordingly.

  • In which scenarios do zero-shot methods excel or fail?
  • How should the labeled few-shot examples be selected for optimal adaptation?
  • How should Visual Prompt Tuning be applied for DETR-based architectures
  • Can few-shot learning improve the performance on primate data?
  • Is it possible to generalize across multiple datasets using labeled examples from only one?
  • How does few-shot adaptation compare to full fine-tuning in terms of performance and efficiency?

Prerequisites

The following skills are helpful:

  • Familiar with Python (Pytorch, Numpy, Scipy would be helpful)
  • Basic machine learning and deep learning knowledge

Contact

To apply please email Jan Frederik Meier stating your interest in this project and detailing your relevant skills. A part of this project could be also a lab rotation.

Neural Data Science Group
Institute of Computer Science
University of Goettingen