Florent Perronnin
Will appear in the IEEE Transactions on Pattern analysis and Machine.
Several state-of-art Generic visual categorization (GVC) systems use a vocabulary of visual terms - a codebook of local image features - to characterize images with a histogram of visual word counts. We propose a novel parctical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. The main novelty is that an image is characterized by a set of histograms - one per class - where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. This framework is applied to two types of local image features : low-level descriptiors such as the popular SIFT and high-level histograms of word co-occurences in a spatial neighborhood. It is shown experimentally on two challenging datasets (an in-house database of 19 categories and the recently released PASCAL VOC 2006) that the proposed approach exhibits state-of-the-art performance at a modest computational cost.
Report number: