Self-supervised learning on video
Many computer vision tasks rely on pre-trained feature extractors (implemented by a convolutional neural network). Contrastive methods have recently emerged as a promising way for unsupervised feature learning [1,2]. In contrast to generative approaches, no reconstruction from a latent representation needs to be carried out.
A computational challenging, yet fairly under-explorer setting is to learn video features. While  uses video data to learn image representations.  is a first example of an approach that uses video features. Training could be conducted on Kinectis or HowTo100M  dataset.
 T. Chen et al.: A simple framework for contrastive learning of visual representations
 X. Chen et al.: Improved baselines with momentum contrastive learning
 Gordon et al.: Watching the World Go By: Representation Learning from Unlabeled Videos
 Knights et al.: Temporally Coherent Embeddings for Self-Supervised Video Representation Learning
 Miech et al.: End-to-End Learning of Visual Representations from Uncurated Instructional Videos
- Good mathematical understanding (in particular statistics and linear algebra)
- Python programming
- Experience in machine learning