Farhad Nooralahzadeh, Caroline Brun, Claude Roux
Coling, Dublin, Ireland, August 23-29, 2014.
In the context of Social Media Analytics, Natural Language Processing tools face new challenges
on on-line conversational text, such as microblogs, chat, or text messages, because of the
specificity of the language used in these channels. This work addresses the problem of Part-
Of-Speech tagging (initially for French but also for English) on noisy language usage from the
popular social media services like Twitter, Facebook and forums. We employ a linear-chain conditional
random fields (CRFs) model, enriched with several morphological, orthographic, lexical
and large-scale word clustering features. Our experiments used different feature configurations
to train the model. We achieved a higher tagging performance with these features, compared to
baseline results on French social media bank. Moreover, experiments on English social media
content show that our model improves over previous works on these data.
Report number: