Predicting user demographics and personality from text: the online world, the offline world and the science-fiction
Lucie Flekova, doctoral candidate at Technische Universität Darmstadt, Darmstadt, Germany
Abstract: Recent progress in computational NLP has given rise to the field of user profiling - automated classification of personality and demographic traits based on written, verbal and multimodal behavior of an individual. Such research builds upon findings from classical personality psychology and has applications in a wide range of areas from recommendation systems and targeted advertising to medicine and security. In this talk I will introduce our research on user trait prediction across various data sets. We explore innovative options to obtain ground truth data for supervised machine learning techniques, including a study using fictional characters to predict personality. We test multiple classification models and find that features based on sense-level semantic concepts from lexical-semantic resources bring promising improvements in predicting personality traits. Additionally, the results of our experiments on online data show that stylistic choices of users depend on variables such as age, gender, income and personality traits, but also on temporal factors such as the working hours. Our annotation studies also suggest that even the traits of annotators may be relevant for their accuracy in judging traits of others.