In the framework of a collaboration with the University of Oxford, we are looking for a PhD candidate to join this fall on the topics of open-ended visual knowledge and self-supervised learning.The student would be jointly supervised by Andrea Vedaldi from the University of Oxford, and Diane Larlus from the Computer Vision group of Naver Labs Europe.​
Machines are increasingly good at recognizing pre-programmed patterns in data such as text, sounds, and images. For example, modern deep learning methods have been demonstrated to achieve superhuman performance in certain problems that involve the recognition of object categories in images. However, in a more fundamental sense, machine perception is still far inferior to the level of perception in humans. The advantage of machines is that they can acquire expert-level domain-specific knowledge, such as the breeds of dogs or the species of flowers, that the average human may not possess, and exploit this advantage to outperform most humans in artificial benchmarks.

However, despite the strength of specialization and memorization, perception in machines is not nearly as insightful and flexible as vision in humans for many standard tasks. 

In this PhD project, we will thus investigate the problem of designing more flexible machine vision systems that are not limited to a baggage of pre-programmed knowledge, but can automatically extend their understanding of images in time and along different complementary understanding dimensions, related to both low-level and high-level properties.

We will do so by incrementally learning from unlabelled data streams such as images and videos available e.g. on the Internet. The goal is to densely label images and videos with complementary types of information (e.g. category labels, object instances, and keypoints). The supervisory signal will be extracted from the data itself in a self-supervised manner, for example in order to predict the audio stream from the video stream (or a subtitle/caption stream if available) or future frames from past ones using conditional image generation, all techniques that have been demonstrated suitable for self-supervision in smaller contexts. The goal will be to learn a dense image embedding shared between different and variable task heads that can serve pretext problems for self-supervision as well as “production” analysis. 


  • Knowledge in computer vision and machine learning techniques
  • Good programming background in Python and experienced with deep learning and associated frameworks (preferably PyTorch).
  • Highly motivated, good team player skills

Start date

Fall 2019



Application instructions

To submit an application, please send your CV and cover letter to and

NAVER LABS Europe has positions, Ph.D and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, NIPS, EMNLP etc. Internships are posted on a separate page.

The Labs are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, data analytics, natural language processing, ethnography and UX working together to create next generation ambient intelligence technology and services that deeply understand users and their contexts.  


Read this blog by Florent Perronnin about what makes NAVER LABS Europe a great place to be part of.