Applying ML-based text classification in domains such as e-discovery, sensitivity review and compliance monitoring often faces an important technological barrier, due to the lack of well-defined target classes: the “relevant” positive class is only vaguely defined, can take multiple facets not identifiable in advance and, practically, evolves and drifts as more documents are explored and reviewed. Moreover, labelled data is usually scarce and the “relevant” class is rare.
For this family of complex tasks, the internship will focus on developing and implementing interactive learning methods and graphical user interface components, that will broaden the ways human annotators enter in the loop. In particular, the internship will investigate new types of feedback (e.g. partial feedback, directional feedback, feedback on passages, terms or other linguistically-derived features, clues expressed in natural language, …) and new ways of exploring the corpus and its task-specific structure. 

Ideal Candidate:

- Master Student in Computer Science, or preferably Ph.D. student.

- Experience in Machine Learning and Natural Language Processing.

- Experience in design and implementation of graphical user interfaces will be considered a plus

- Very good programming skills (preferentially in Python, Julia or Matlab/Octave)

Start Date:
June / July 2017
4-6 Months
Apply instructions:

Please send your application to Jean-Michel Renders.