Gabriela Csurka, Jean-Michel Renders, Guillaume Jacquet
Conference on Multilingual and Multimodal Information Access Evaluation, Amsterdam, Netherlands, 19-22 September 2011.
The aim of this document is to describe the methods we used in the Patent Image
Classification and Image-based Patent Retrieval tasks of the Clef-IP 2011 track.
The patent image classification task consisted in categorizing patent images into predefined
categories such as abstract drawing, graph, flowchart, table, etc. Our main
aim in participating in this sub-task was to test how our image categorizer performs
on this type of categorization problem. Therefore, we used SIFT-like local orientation
histograms as low level features and on the top of that we built a visual vocabularies
specific to patent images using Gaussian mixture model (GMM). This allowed us to
represent images with Fisher Vectors and to use linear classifiers to train one-versusall
classifiers. As the results show, we obtain very good classification performance.
Concerning the Image-based Patent Retrieval task, we kept the same image representation
as for the Image Classification task and used dot product as similarity
measure. Nevertheless, in the case of patents the aim was to rank patents based on
patent similarities, which in the case of pure image-based retrieval implies to be able
to compare a set of images versus another set of images. Therefore, we investigated
different strategies such as averaging Fisher Vector representation of an image set or
considering the maximum similarity between pairs of images. Finally, we also built
runs where the predicted image classes were considered in the retrieval process.
For the text-based patent retrieval, we decided simply to weight differently the different
fields of the patent, giving more weight to some of them, before concatenating
the different fields. Monolingually, we then used the standard cosine measure, after
applying the tf-idf weighting scheme, to compute the similarity between the query
and the documents of the collection. To handle the multi-lingual aspect, we either
used late fusion of monolingual similarities (French / English / German) or translated
non-English fields into English (and then computed simple monolingual similarities).
In addition to these standard textual similarities, we also computed similarities between
patents based on the IPC-categories they share and similarities based on the
patent citation graph; we used late fusion to merge these new similarities with the
former ones.
Finally to combine the image-based and the text-based rankings, we normalized the
ranking scores and used again weighted late fusion strategy. As our expectation for the
visual expert was low, we used a much stronger weight for the textual expert, than for
the visual one. We have shown that while indeed the visual expert performed poorly,
combined with text experts the multi-modal system outperformed the corresponding
text-only based retrieval system.
Report number: