# RotatE: knowledge graph embding by relational rotation in complex space

## Abstraction

In this paper, we present a new approach for knowledge graph embedding called RotatE, which is able to model and infer various relation patterns including: `symmetry/antisymmetry, inversion, and composition`

. Specifically, the RotatE model defines each relation as a rotation from the source entity to the target entity in the `complex vector space`

. In addition, we propose a novel `self-adversarial negative sampling`

technique for efficiently and effectively training the RotatE model.

## Introduction

The general intuition of these methods is to model and infer the connectivity patterns in knowledge graphs according to the observed knowledge facts. For example, some relations are symmetric (e.g., marriage) while others are antisymmetric (e.g., filiation); some relations are the inverse of other relations (e.g., hypernym and hyponym); and some relations may be composed by others (e.g., my mother’s husband is my father). It is critical to find ways to model and infer these patterns, i.e., `symmetry/antisymmetry, inversion, and composition`

, from the observed facts in order to predict missing links.

For example,the TransE model (Bordes et al., 2011), which represents relations as translations, aims to `model the inversion and composition patterns`

; the DisMult model (Yang et al., 2014), which models the three-way interactions between head entities, relations, and tail entities, `aims to model the symmetry pattern`

. However, none of existing models is capable of modeling and inferring all the above patterns.

In this paper, we propose such an approach called RotatE for knowledge graph embedding. Our motivation is from Euler’s identity $e^{i\theta} = cos \theta + i sin \theta $, which indicates that a unitary complex number can be regarded as a rotation in the complex plane.

Given a triplet (h, r, t), we expect that $t = h \omicron r$, where $h, r, t \in C^k $ are the embeddings, the modulus $ |r_i|=1$ and $\omicron$ denotes the Hadamard (element-wise) product. Specifically, for each dimension in the complex space, we expect that

$$ t_i=h_ir_i ,where h_i,t_i,r_i \in C and |r_i|=1$$

It turns out that such a simple operation can effectively model all the three relation patterns: symmetric/antisymmetric, inversion, and composition.

- For example, a relation r is symmetric if and only if each element of its embedding r, i.e. $r_i$, satisfies $r_i = e_{0/i \pi}= \pm 1;
- two relations r1 and r2 are inverse if and only if their embeddings are conjugates: $r_2 = \overline{r_1}$;
- a relation $r_3 = e^{i \theta_3}$ is a combination of other two relations $r_1 = e^{i \theta_1}$ and $ r_2 = e^{i \theta_2 }$if and only if $r_3 = r_1 \omicron r_2 (i.e. \theta_3 = \theta_1+\theta_2)$.

To effectively optimizing the RotatE, we further propose a novel self-adversarial negative sampling technique, `which generates negative samples according to the current entity and relation embeddings`

## Relation rotation in complex vector space

### Modeling and inferring relation pattern

- Definition 1. A relation r is symmetric (antisymmetric) if $\forall $x, y

$$r(x, y) ) \Rightarrow r(y, x) ( r(x, y) ) \Rightarrow -r(y, x) )$$

A clause with such form is a symmetry (antisymmetry) pattern. - Definition 2. Relation r1 is inverse to relation r2 $\forall $x, y

$$r2(x, y) \Rightarrow r1(y, x)$$

A clause with such form is a inversion pattern. - Definition 3. Relation r1 is composed of relation r2 and relation r3 if $\forall $x,y, z

$$r2(x, y) \bigwedge r3(y; z) \Rightarrow r1(x, z)$$

### Modeling Relation as Rotation in Complex Vector space

Given a triplet (h, r, t), we expect that $t = h \omicron r$, where $h, r, t \in C^k $ are the embeddings, the modulus $ |r_i|=1$ and $\omicron$ denotes the Hadamard (element-wise) product. Specifically, for each dimension in the complex space, we expect that $ t_i=h_ir_i ,where h_i,t_i,r_i \in C and |r_i|=1$

By doing this, $r_i$ is of the form $e^{i \theta_{r,i}}$ , `which corresponds to a counterclockwise rotation`

by $\theta_{r,i}$ radians about the origin of the complex plane, and only affects the phases of the entity embeddings in the complex vector space. We refer to the proposed model as RotatE due to its rotational nature. According to the above definition, for each triple (h, r, t), we define the distance function of RotatE as:

$$d_r(h,t)=|h \omicron r-t|$$

### optimization

Negative sampling has been proved quite effective for both learning knowledge graph embedding (Trouillon et al., 2016) and word embedding (Mikolov et al., 2013). Here we use a loss function similar to the negative sampling loss (Mikolov et al., 2013) for effectively optimizing distance-basedmodels:

$$ L=-log \sigma (\gamma -d_r(h,t))- \displaystyle\sum_{i=1}^{n} \frac{1}{k} log \sigma (d_r(h_i^1,t_i^1)-\gamma )$$

where $\gamma$ is a fixed margin, $\sigma$ is the sigmoid function, and $(h_i^1,r,t_i^1)$ is the i-th negative triplet.

We also propose a new approach for drawing negative samples. The negative sampling loss samples the negative triplets in a uniform way. Such a uniform negative sampling suffers the problem of inefficiency since many samples are obviously false as training goes on, which does not provide any meaningful information. Therefore, `we propose an approach called self-adversarial negative sampling, which samples negative triples according to the current embedding model. `

Specifically, we sample negative triples from the following distribution:

$$p(h_j^1,r,t_j^1|{(h_i,r_i,t_i)})= \frac{exp \alpha f_r(h_j^1,t_j^1)}{\sum_{i} exp(\alpha f_r(h_i^1,t_i^1))} $$

where $\alpha$ is the temperature of sampling. Moreover, since the sampling procedure may be costly, we treat the above probability as the weight of the negative sample. Therefore, the final negative sampling loss with self-adversarial training takes the following form:

$$ L=-log \sigma (\gamma -d_r(h,t))- \displaystyle\sum_{i=1}^{n} p(h_i^1,r,t_i^1) log \sigma (d_r(h_i^1,t_i^1)-\gamma )$$