Anette Frank
Proceedings of the LFG 2000 Conference, University of California at Berkley, 19.-20. July (to appear)
We describe a method that automatically induces LFG f-structures from treebank tree representations, given a
set of f-structure annotation principles that define partial, modular c- to f-structure correspondences in a
linguistically informed, principle-based way. These principles are applied to treebank tree representations,
using an existing term rewriting system. Due to the disambiguated tree input, the resulting f-structures require
only minimal manual disambiguation. The annotation principles define partial, characteristic c- to f-structure
correspondences that abstract away from irrelevant c-structure contexts, and therefore apply to previously
unseen tree configurations. The method is fully automated, and inherently robust. It yields partial, unconnected
f-structures in the case of missing annotation rules. We describe the results of a first experiment where we
apply this method to the Susanne treebank, and extend the model to selective ambiguity filtering, using lexical
subcategorization information. Finally we address some conceptual issues, such as changes to treebank
encodings, and which type of encodings should be expoited for different applications: the construction of
f-structure banks, as opposed to more far-reaching goals, including rapid, corpus-based LFG grammar
development, and robust parsing architectures.
Report number: