The Document Content Models research explored formalisms and techniques for specifying, manipulating and exploiting the semantic structures of documents, seen as global, cohesive, objects. Document representations focus on high-level communicative goals; they are specified through constraint mechanisms which may involve interaction with external knowledge bases. Applications include controlled authoring, interactive generation, natural language interfaces, global document content analysis, document normalization.

Multilingual Document Authoring

The MDA (Multilingual Document Authoring) project provides interactive tools, such as context-aware menus, for assisting monolingual writers in the production of multilingual documents. These tools extend conventional syntax-driven SGML or XML editors so that choices down to the word-level are possible when authoring the document content. In addition, dependencies between two distant parts of the document can be specified in such a way that a change in one part of the document is immediately reflected in a change in some other part of the document.

The author's choices have language-independent meanings (example: choosing between a solution and an emulsion in a drug description document), which are automatically rendered in any of the languages known to the system, along with their grammatical consequences on the surrounding text. Although the author is not explicitly following standards, the text produced by the system is implicitly controlled both:


  • Syntactically: the choice of the standard term for expressing a given notion is under system control, as is the choice between grammatical variants (such as active/passive sentences) for expressing a given information;
  • Semantically: the consequences of a choice somewhere are reflected across the whole document, the author cannot forget to provide some information that the system requires, dependencies between semantic parameters such as gender and pregnancy can be described.

Document Normalization

Document Normalization is the interactive process of legacy document analysis into some well-defined and controlled document content model and the generation of a corresponding normalized document.