Attribute prediction and image captioning for outdoor scenes
Maps have become a useful companion in our daily lives, as they provide a convenient and searchable representation of our physical world. Typically, a map displays the localization of a number of Points of Interest (POIs) within the mapped area, and provide navigation instructions on how to reach such POIs. These POIs can be shops or stores of all kinds, including restaurants, cafes, banks, and so forth.
Currently, most of the information regarding POIs (i.e. summary descriptions, opening/closing hours, type of goods being sold) is gathered and entered manually in a map database, a process that is tedious and expensive. The goal of this internship is therefore to investigate ways in which this kind of information could be discovered automatically through computer vision. Doing so would involve developing or adapting models that could provide automatic captioning, generation of textual descriptions, and attribute prediction of images containing shop façades in outdoor scenarios.
We are looking for candidates highly motivated in research and innovation, who are enrolled in a PhD program. Knowledge about deep learning and computer vision is required. Working knowledge of Python and PyTorch is a plus. During the internship, the candidate will acquire a significant knowledge in computer vision techniques, and the development of those using recent libraries and frameworks, while working closely with researchers and engineers.
- Enrolled in a graduate program. PhD students are welcomed to complement their training.
- Knowledge in computer vision and machine learning techniques
- Good programming background in Python and experienced with deep learning and associated frameworks (preferably PyTorch).