Naila Murray, Florent Perronnin
CVPR 2014 : 27th IEEE Conference on Computer Vision and Pattern Recognition, Colombus, USA, June 24-27, 2014.
State-of-the-art patch-based image representations involve
a pooling operation that aggregates statistics computed
from local descriptors. Standard pooling operations
include sum- and max-pooling. Sum-pooling lacks discriminability
because the resulting representation is strongly influenced
by frequent yet often uninformative descriptors,
but only weakly influenced by rare yet potentially highlyinformative
ones. Max-pooling equalizes the influence of
frequent and rare descriptors but is only applicable to representations
that rely on count statistics, such as the bag-ofvisual-
words (BOV) and its soft- and sparse-coding extensions.
We propose a novel pooling mechanism that achieves
the same effect as max-pooling but is applicable beyond
the BOV and especially to the state-of-the-art Fisher Vector
– hence the name Generalized Max Pooling (GMP). It involves
equalizing the similarity between each patch and the
pooled representation, which is shown to be equivalent to
re-weighting the per-patch statistics. We show on five public
image classification benchmarks that the proposed GMP
can lead to significant performance gains with respect to
heuristic alternatives.
Report number: