Caroline Brun, Shachar Mirkin, Scott Nowson, Julien Perez
EMNLP, Lisboa, Portugal, September 17-21, 2015.
Language use is known to be influenced
by personality traits as well as by sociodemographic
characteristics such as age or
mother tongue. As a result, it is possible to automatically
identify these traits of the author
from her texts. It has recently been shown that
knowledge of such dimensions can improve
performance in NLP tasks such as topic and
sentiment modeling. We posit that machine
translation is another application that should
be personalized. In order to motivate this, we
explore whether translation preserves demographic
and psychometric traits. We show that,
largely, both translation of the source training
data into the target language, and the target test
data into the source language has a detrimental
effect on the accuracy of predicting author
traits. We argue that this supports the need for
personal and personality-aware machine translation models.
Report number: