Salah Ait-Mokhtar, Jean-Pierre Chanod
Proceedings of ACL workshop on Automatic Information Extraction and Building of Lexical Semantic
Resources for NLP Applications. 1997, Madrid
We describe and evaluate an approach for fast automatic recognition and extraction of subject and object
dependency relations from large French corpora, using a sequence of finite-state transducers. The extraction
is performed in three major steps: POS disambiguation; incremental finite-state parsing; extraction of
subject/verb and object/verb relations. Our incremental and cautious approach during the second phase allows
the system to deal successfully with complex phenomena such as embeddings, coordination of VPs and NPs
or non-standard word order. The extraction requires no subcategorisation information. It relies on POS
information only. After describing the three steps, we give the results of an evaluation on various types of
unrestricted corpora. Precision is around 90-97% for subjects (84-88% for objects) and recall around 86-92%
for subjects (80-90% for objects). The global speed is about 150 words per second on a SPARC 10 machine,
including preprocessing (POS tagging). We also provide some error analysis; in particular, we evaluate the
impact of POS tagging errors on subject/object dependency extraction.
Report number: