David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi
CVPR 2018, Salt Lake City, USA, 18 - 22 June 2018
One of the most promising directions of deep learning is the development of self-supervised methods that can substantially reduce the quantity of manually-labeled training data required to learn a model. Several recent contributions, in particular, have proposed self supervision techniques suitable for tasks such as image classification. In this work, we look instead at self-supervision for geometrically-oriented tasks such as semantic matching and part detection. We develop a new approach that combines the strength of recent methods to discover object landmarks automatically. This approach learns dense distinctive visual descriptors that are invariant to synthetic image transformations from an unlabeled dataset of images of object categories. It does so by means of a robust probabilistic formulation that can introspectively determine which image regions are likely to result in stable matching. We show empirically that a network pre-trained in this manner requires significantly less supervision to learn semantic object parts compared to numerous pre-training alternatives. We also show that the pre-trained representation is excellent for semantic object matching.
Report number: