Boris Chidlovskii, Stéphane Clinchant, Gabriela Csurka
22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, California, August 13-17, 2016.
Full paper available on <a href=> KDD 2016 Website </a>
The overwhelming majority of existing domain adaptation methods
makes an assumption of freely available source domain data.
An equal access to both source and target data makes it possible
to measure the discrepancy between their distributions and to build
representations common to both target and source domains. In reality,
such a simplifying assumption rarely holds, since source data
are routinely a subject of legal and contractual constraints between
data owners and data customers. When source domain data can not
be accessed, decision making procedures are often available for
adaptation nevertheless. These procedures are often presented in
the form of classification, identification, ranking etc. rules trained
on source data and made ready for a direct deployment and later
reuse. In other cases, the owner of a source data is allowed to share
a few representative examples such as class means.
In this paper we address the domain adaptation problem in real
world applications, where the reuse of source domain data is limited
to classification rules or a few representative examples. We extend
the recent techniques of feature corruption and their marginalization,
both in supervised and unsupervised settings. We test and
compare them on private and publicly available source datasets and
show that significant performance gains can be achieved despite the absence of source data and shortage of labelled target data.
Report number: