CAP 2015: 5 July 2015, Lille, France

Xavier Carreras: Low-rank Matrix Learning for Compositional Objects, Strings and Trees

Abstract: Compositional structures abound in NLP, ranging from bilexical relations, entities in context, sequences and trees. The focus of this talk is in latent-variable compositional models for structured objects that: (1) induce a latent n-dimensional representation for each element of the structure; and (2) learn operators for composing such elements into structures. I will present a framework based on formulating the learning problem as low-rank matrix learning. The main ingredient of the framework is what we call the Hankel matrix: this collects the necessary statistics of our model, and its factorization mimicks the compositional nature of the model. We use Hankel matrices to reduce the problem of learning compositional models to a problem of estimating low-rank Hankel matrices. I will illustrate some convex formulations for different applications, from classification of entities in context, to learning sequence taggers, to unsupervised induction of context-free grammars.