Ergun Bicici, Marc Dymetman
CICLing Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel, February 17-23, 2008.
Professional translators of technical documents often use translation memory (TM) systems in order to capitalize on the frequent repetitions observed in these documents. TM systems typically exploit not only full matches between the source sentence to be translated and some previously translated sentence, but also so-called fuzzy matches, where the source sentence has some substantial intersection with a previously translated sentence. These fuzzy matches can be very worthwhile as a starting point for the human translator, but the translator then needs to manually edit the associated TM translation in order to account for the differences with the actual source sentence to be translated. If part of this process could be automated, the cost of human translation could be significantly reduced. This paper proposes a way to perform this automation: the TM fuzzy match is combined with an SMT system trained on a bilingual corpus in the same domain, by biasing the candidate translations proposed by the SMT system towards a translation compatible with the fuzzy match. We report experiments that show a significant improvement in terms of BLEU and NIST scores over both the translation produced by the stand-alone SMT system and the fuzzy-match translation proposed by the stand-alone TM system.
Report number: