Julien Ah-Pine, Guillaume Jacquet
EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics. Athens, Greece, March 30 - April 3, 2009
We propose a system that allows to build in an unsupervised manner a ressource that aims at annotating rare or corpus-specific named entites. This system is based on a distributional approach which uses syntactic dependencies for measuring similarities between named entities. The specificity of the presented method however, is to combine a clique-based approach and a
clustering technique that amounts to a soft clustering method which is well-adapted for the task we want to tackle. Our experiments show that the ressource constructed by using this clique-based clustering system allows to improve different named entity recognition systems.
The main goals of our system are both, to annotate rare or corpus-specific NEs and second, and to correctly annotate ambiguous NEs.
We present, in section 2, the global architecture of our system and we give details about each of its steps in §2.1 to §2.6. In section 3, we present
the evaluation of our approach when it is combined with other classic NER systems. We show that the resulting hybrid systems performs better in terms of F-measure. In the best case, the latter is increased by 4.84 points. Furthermore, we give examples of successful cases of NE disambiguation.
We discuss about related work in section 4. Finally we sum up the main points of this paper and give some extensions of this work in section 5.
Report number: