José A. Rodriguez, Diane Larlus, Zhenwen Dai
Published on IEEE Transactions on Pattern Analysis and Machine Intelligence.
Full paper available on IEEE Xplore digital library:
This article is concerned with the detection of prominent objects in images. As opposed to the standard approaches
based on sliding windows, we study a fundamentally different solution by formulating the supervised prediction of a bounding box
as an image retrieval task. Indeed, given a global image descriptor, we find the most similar images in an annotated dataset, and
transfer the object bounding boxes. We refer to this approach as data-driven detection (DDD). The key novelty of the work is to
design or learn image similarities that explicitly optimize the accuracy of the transfer – as opposed to previous work which uses
generic representations and unsupervised similarities. This is done in two senses: first, we explicitly learn to transfer, by adapting
a metric learning approach to work with image and bounding box pairs. Second, we use an image representations designed to
be more consistent with the objective of transferring bounding boxes: a representation of images as object probability maps
computed from low-level patch classifiers. We show experimentally that these two contributions are crucial enablers of DDD as a
very competitive method for promiment object detection, in some cases yielding comparable or better results than state-of-the-art
detectors – despite its conceptual simplicity and efficiency at runtime. Our third contribution is an application of prominent object
detection, where we improve fine-grained categorization by pre-cropping images with the proposed approach. We also discuss
and evaluate experimentally an extension of the proposed approach to detect multiple parts of rigid objects.
Report number: