Jorge Sanchez, Florent Perronnin, Teofilo de Campos
ICCV, 13th International Conference on Computer Vision, Barcelona, Spain, 6-13 November 2011.
Several state-of-the-art image representations consist in
averaging local statistics computed from patch-level descriptors.
It has been shown by Boureau et al. that such
average statistics suffer from two sources of variance. The
first one comes from the fact that a finite set of local statistics
are averaged. The second one is due to the variation
in the proportion of object-dependent information between
different images of the same class. For the problem of object
classification, these sources of variance affect negatively
the accuracy since they increase the ovelap between classconditional
Our goal is to include information about the spatial layout
of images in image signatures based on average statistics.
We show that the traditional approach to including
the spatial layout �?? the Spatial Pyramid (SP) �?? increases
the first source of variance while only weakly reducing the
second one. We therefore propose two complementary approaches
to account for the spatial layout which are compatible
with our goal of variance reduction. The first one
models the spatial layout in an image-independent manner
(as is the case of the SP) while the second one adapts to
the image content. A significant benefit of these approaches
with respect to the SP is that they do not incur an increase
of the image signature dimensionality. We show on PASCAL
VOC 2007, 2008 and 2009 the benefits of our approach
Report number: