Greg Grefenstette, Anne Schiller, Salah Ait-Mokhtar
F. Van Eynde, D. Gibbon (eds.): Lexicon Development for Speech and Language Processing. Kluwer Academic Publishers. 2000.
For most natural language processing tasks, the complexity and richness of the lexicon determines the
ultimate performance of the system. In this chapter we present a number of low-level natural language
processing techniques for recognizing lexical structures in a domain-specific corpus, concentrating on
techniques that precede a manual construction of the lexicon, or that can serve as a basis for an automatic
creation of a lexicon. Recognizing things in text is easier for a computer than recognizing things in images.
But in both domains recognizing means abstracting away surface difference in order to identify two variants of
the same object. A number of techniques have been developed by the computational linguistic community for
abstracting away surface difference in text: tokenization, lemmatization, part-of-speech tagging, and finite-state
pattern recognition. An overview of these techniques will be presented here.
Report number: