2015/001 - Statistical machine translation adaptation through terminological enrichment based on virtual phrase generation
- Christophe Servan,Marc Dymetman
TALN Conference, Caen, France, June 22-25, 2015.
We propose a technique for adding bilingual terms to a phrase-based SMT system which includes not only individual words, but also induces phrasal contexts around these words. We first generate these contexts by generalizing patterns observed for similar words in a bilingual corpus, but then filter out those contexts that fall below a certain confidence threshold, based on an original phrase-pair selection process inspired by existing sentence selection techniques.