The vast majority of the visual content which is created, whether we are talking about family photographs shared on social websites or videos captured by cctv cameras for surveillance purposes, comes with very little annotation (besides a GPS location or a time stamp). Which means that organizing and accessing this content is almost impossible, except if humans are willing to annotate it. However, human annotation is a costly, tedious and error-prone task. Automatic visual classification -- also referred to as visual annotation or categorization -- is the problem that involves associating to an input image or video a set of output keywords describing its content. Such keyrwords can be subsequently used to access the visual data or to make decisions based on the data content.
One major research challenge in visual classification is handling large datasets, meaning datasets containing millions of images and depicting thousands of concepts. Our group has made several key contributions to this topic. To handle a large number of images, we proposed to make use explicit input embedding that enables learning from vast quantities of data at an affordable cost. To handle a very large number of output keywords, we also proposed to make use of the concept of an output embedding. In 2011, we were among the first labs to report publicly image classification results on several millions of images and on the order of 10,000 keywords.
Another challenge that has recently emerged in the computer vision community is fine-grained visual classification, the problem which involves recognizing the sub-ordinate categories of a base-level category. Example fine-grained classifying problems include recognizing car models, brands of clothes or bird species. This is a very challenging research problem because the considered object classes can be visually very similar and distinguishing the most similar classes requires taking into account minute details. We have developed state-of-the-art technology, and especially visual representations, that can take into account such fine-grained details, leading to state-of-the-art results in the recent Fine-Grained Competition.
“Fisher Vectors Meet Neural Networks: A Hybrid Classification Architecture”, Florent Perronnin, Diane Larlus, CVPR, 2015.
“Good Practice in Large-Scale Learning for Image Classification”, Zeynep Akata, Florent Perronnin, Zaïd Harchaoui, Cordelia Schmid, IEEE TPAMI, 2014.
“Image Classification with the Fisher Vector: Theory and Practice”, Jorge Sánchez, Florent Perronnin, Thomas Mensink, Jakob J. Verbeek, IJCV, 2013.
“Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost”, Thomas Mensink, Jakob J. Verbeek, Florent Perronnin, Gabriela Csurka, IEEE TPAMI, 2013.
“Tree-Structured CRF Models for Interactive Image Labeling”, Thomas Mensink, Jakob J. Verbeek, Gabriela Csurka, IEEE TPAMI, 2013.
“Label-Embedding for Attribute-Based Classification”, Zeynep Akata, Florent Perronnin, Zaïd Harchaoui, Cordelia Schmid, CVPR, 2013.
“Towards good practice in large-scale learning for image classification”, Florent Perronnin, Zeynep Akata, Zaïd Harchaoui, Cordelia Schmid, CVPR, 2012.
“Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost”, Thomas Mensink, Jakob J. Verbeek, Florent Perronnin, Gabriela Csurka, ECCV, 2012.
“Learning structured prediction models for interactive image labeling”, Thomas Mensink, Jakob J. Verbeek, Gabriela Csurka, CVPR, 2011.
"High-dimensional signature compression for large-scale image classification", Jorge Sánchez, Florent Perronnin, CVPR, 2011.
“Improving the Fisher Kernel for Large-Scale Image Classification”, Florent Perronnin, Jorge Sánchez, Thomas Mensink, ECCV, 2010.
“Large-scale image categorization with explicit data embedding”, Florent Perronnin, Jorge Sánchez, Yan Liu, CVPR, 2010.
“Trans Media Relevance Feedback for Image Autoannotation”, Thomas Mensink, Jakob J. Verbeek, Gabriela Csurka, BMVC, 2010.
More like this
- ICCV 2017, Venice, Italy. 22nd - 29th October
- Self-supervised Learning of Geometrically Stable Features Through Probabilistic Introspection
- Rafael Sampaio De Rezende
- A deep architecture for unified aesthetic prediction
- Domain Adaptation in Computer Vision Applications, ed. Gabriela Csurka, NAVER LABS Europe