Greg Grefenstette
Information Extraction : A Multidisciplinary Approach to an Emerging Information Technolology. Editor: Maria Teresa Pazienza.Springer-Verlag, 1997.
The usual approach to finding information on the WWW via existing Web browsers is to use a one or two word
query. Browsers return a number of documents containing these words, and the user examines those
documents, or their abstracts, sees how the word or words in their query are being used and alters their initial
query accordingly. This contrasts markedly with the Information Retrieval models explored by researchers over
the past thirty-five years. These models were designed for longer queries and do not provide an adequate
response to the user needs. On the other hand, recent advances in natural language processing permit the
extraction of typed information that is axed on one or two words. We review a selection of this typed
information and describe how it could be used to present an intermediate structure for the user fitting between
their short queries and the documents found in a heterogeneous text collection such as the WWW.
Report number: