Gabriela Csurka, Stéphane Clinchant, Guillaume Jacquet
Conference on Multilingual and Multimodal Information Access Evaluation, Amsterdam, Netherlands, 19-22 September 2011.
The aim of this document is to describe our methods used in the Medical Image
Modality Classification and Ad-hoc Image Retrieval Tasks of ImageClef 2011.
The main novelty in medical image modality classification this year was, that there
were more classes (18 modalities) organized in a hierarchy and for some categories
only few annotated examples were available. Therefore, our strategy in image categorization
was to use a semi-supervised approach. In our experiments, we investigated
mono-modal (text and image) and mixed modality based classification. The image
classification was based on Fisher Vectors built on SIFT-like local orientation histograms
and local color statistics. For text representation we used a binarized bag-ofwords
representation where each element indicated whether the term appeared in the
image caption or not. In the case of multi-modal classification, we simply averaged
the text and image classification scores.
For the ad-hoc retrieval task, we used the image captions for text retrieval and Fisher
Vectors for visual similarity and modality detection. Our text runs were based on a
late fusion of different state of the art text experts and the Lexical Entailment model.
This Lexical Entailement model used the last year articles to compute similarities between
terms and rank first at the previous challenge.
Concerning the submitted runs, we realized that we forgot by inadvertance3, to submit
our best run from last year [3]. We did not submit either improvement over this
run, which was proposed in [6]. Overall, this explain the medium performance of our
submitted runs. In this document, we show that our system from last year and its
improvements would have achieve top performance.We have not tuned the parameter
of this system for this year task, we have just evaluated the runs we did not submit
Finally, we experimented with different fusion strategies of our textual expert, visual
expert and image modality classification scores, which gives consistent results to last
year results and to our analysis presented in [6].
Report number: