Jean-Pierre Chanod, Pasi Tapanainen
MLTT Technical reports (Jan 96)
This report describes the rule system of a robust finite state parser implenented for French.
The parser attaches syntactic tags to cach word as well as part of speech and morphological tags,
and determines clause boundaries. It is reductionist parser i.e. it removes readings from the originally
ambiguous text. The underlying parser is based on finite state networks and their intersection.
We describe essential elements of the rule writing system, and show how it is actually applied to solve
various phenomena, such as argument uniqueness, argument or apposition. We show some results which
indicate that, even in its current stage, the parser can parse technical manuals with high accuracy
(in a test sample 95% of part of speech and functional tags were correct). The average number of parses per
sentence is very low, more than 90% of sentences produce less than 4 parses, including the correct one.
A test on very long sentences form newspaper corpora and a discussion or errors provides some examples
of parses as well as indications on how to run the grammar.
Report number: