DEEP IMAGE RETRIEVAL

In this project, we aim at learning end-to-end deep visual representations specifically for the task of instance-level image retrieval

Deep Image Retrieval illustrating image

Principle:

The several papers listed below tackle the problem of image retrieval and explore different ways to learn deep visual representations for this task. In all cases, a CNN is used to extract a feature map that is aggregated into a compact, fixed-length representation by a global-aggregation layer. Finally, this representation is first projected with a fully-connected layer, and then L2 normalized so images can be efficiently compared with the dot product.

image architecture figure

All components in this network, including the aggregation layer, are differentiable, which makes it end-to-end trainable. In the first papers [2,3], a Siamese architecture that combines three streams with a triplet loss was proposed to train this network.

In the more recent [1], this work was extended by replacing the triplet loss with a new loss that directly optimizes for Average Precision, that is illustrated below.

map loss figure

Relevant papers:

[1] Learning with Average Precision: Training Image Retrieval with a Listwise Loss Jerome Revaud, Rafael S. Rezende, Cesar de Souza, Jon Almazan:
ICCV 2019 [PDF]

This most recent reference corresponds to the code and models shared below.

@inproceedings{revaud2019learning,
author = {Jerome Revaud and Jon Almazan and Rafael Sampaio de Rezende and Cesar Roberto de Souza},
title = {Learning with Average Precision: Training Image Retrieval with a Listwise Loss},
booktitle={ICCV},
year={2019}}

[2] End-to-end Learning of Deep Visual Representations for Image Retrieval. Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus:
International Journal of Computer Vision. Volume 124, Issue 2, September 2017.
arXiv preprint [pdf] [Publication entry]

@article{gordo2017end,
author = {Gordo, Albert and Almaz\'{a}n, Jon and Revaud, Jerome and Larlus, Diane},
title = {End-to-End Learning of Deep Visual Representations for Image Retrieval},
journal = {International Journal of Computer Vision},
issue_date = {September 2017},
volume = {124},
number = {2},
month = sep,
year = {2017},
pages = {237--254},
}

[3] Deep Image Retrieval: Learning global representations for image search. Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus:
ECCV 2016 [pdf]

@inproceedings{gordo2016deep,
author = {Albert Gordo and Jon Almaz{\'{a}}n and J{\'{e}}rome Revaud and Diane Larlus},
title = {Deep Image Retrieval: Learning global representations for image search},
booktitle={ECCV},
year={2016}}

Library:

This github repository links to a library that implements in Python3 and Pytorch 1.0 the two following papers:

[1] Learning with Average Precision: Training Image Retrieval with a Listwise Loss Jerome Revaud, Rafael S. Rezende, Cesar de Souza, Jon Almazan, ICCV 2019 [PDF]

[2] End-to-end Learning of Deep Visual Representations for Image Retrieval Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus, IJCV 2017 [PDF]

Please note that, originally, [2] used R-MAC pooling [4] as the global-aggregation layer. However, due to its efficiency and better performance we have replaced the R-MAC pooling layer with the Generalized-mean pooling layer (GeM) proposed in [5].

If you’d like to compare to older versions of the work, the exact models used in [2] are still available in Caffe format (Download old model, evaluation script and dataset)

[4] Particular object retrieval with integral max-pooling of CNN activations. Tolias, G., Sicre, R., Jegou, H., ICLR 2016

[5] Fine-tuning CNN Image Retrieval with No Human Annotation. Radenovic, F., Tolias, G., Chum, O., TPAMI 2018

Visual Localization:

The image retrieval method described in [1] is used in the visual localization benchmark introduced in [6].
The benchmark and evaluation protocols are available at github.com/naver/kapture-localization.
[6] Benchmarking image retrieval for visual localization. Noe Pion, Martin Humenberger, Gabriela Csurka Khedari, Yohann Cabon, Torsten Sattler.  3DV 2020. [PDF]

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.

blank

Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.

blank