Guillaume Bouchard, Sameer Singh, Théo Trouillon
AAAI Spring Symposium on KRR, Stanford University, CA, March 23-25, 2015
In relational databases, relations between objects, represented
by binary matrices or tensors, may be arbitrarily complex. In
practice however, there are recurring relational patterns such
as transitive, permutation, and sequential relationships, that
have a regular structure which is not captured by the classical
notion of matrix rank or tensor rank. In this paper, we
show that factorizing the relational tensor using a logistic or
hinge loss instead of the more standard squared loss is more
appropriate because it can accurately model many common
relations with a fixed-size embedding (depends sub-linearly
on the number of entities in the knowledge base). We illustrate
this fact empirically by being able to efficiently predict
missing links in several synthetic and real-world experiments.
Further, we provide theoretical justification for logistic loss
by studying its connection to a complexity measure from the
field of information complexity called sign rank. Sign rank is a
more appropriate complexity measure as it is low for transitive,
permutation, or sequential relationships, while being suitably
large, with a high probability, for uniformly sampled binary matrices/tensors.
Report number: