Julien Ah-Pine
The 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining - PAKDD 2010, 21-24 June, 2010 - Hyderabad, India
Measuring similarity between objects is a fundamental issue for numerous applications in data-mining and machine learning domains. In this paper, we are interested in kernels.We particularly focus on kernel normalization methods that aim at designing proximity measures that better fit the definition and the intuition of a similarity index. To this end, we introduce a new family of normalization techniques which extends the cosine normalization. Our approach aims at re fining the cosine measure between vectors in the feature space by considering another geometrical based score which is the mapped vectors norm ratio. We show that the designed normalized kernels satisfy the basic axioms of a similarity index unlike most unnormalized kernels. Furthermore, we prove that the proposed normalized kernels are also kernels. Finally, we assess these di erent similarity measures in the context of clustering tasks by using a kernel PCA based clustering approach. Our experiments employing several real-world datasets show the potential bene ts of normalized kernels over the cosine normalization and the Gaussian RBF kernel.
Report number: