## KGE

knowledge graphs are symbolic and logical, where numerical machine learning methods could hardly be applied. This disadvantage is one of the most important challenges for the usage of knowledge graph
for each relation type r, we computed the averaged number $a_h$ of heads h for a pair (r, t) and the averaged number $a_t$ of tails t for a pair (h, r). If $a_h$ < 1.5 and $a_t$ < 1.5, then r is labeled 1-1. If $a_h>=1.5 and a_t < 1.5$, then r is labeled M-1. If $a_h < 1.5 and a_t >= 1.5$, then r is labeled as 1-M. If $a_h >= 1.5 and a_t >= 1.5$, then r is labeled as M-M.

github

# Trans系列

## TransE

### Introduction

TransE is effective and applies well to irreflexive and one-to-one relations.

### Algorithm ### Score Function

$$f_r(h,r) = ||h+r-t||^2_2$$

### Evaluation protocol

For each testing triplet (h,r,t), we replace the tail t by every entity e in the knowledge graph and calculate a dissimilarity score (according to the scoring function $f_r$) on the corrupted triple(h,r,e).

• Ranking the scores in ascending order, we then get the rank of the original correct triplet.This whole procedure is repeated while removing the tail instead of the head.
We report the mean of those predicted ranks(Mean Rank) and the Hits@10(%), the proportion of correct entities ranked in the top 10.
• These metrics are indicative but can be flawed when some corrupted triplets end up being valid ones, from the training set for instance. Notice that if a corrupted triplet exists in the knowledge graph, as it is also correct, ranking it before the original triplet is not wrong. To avoid such a misleading behavior, we propose to remove from the list of corrupted triplets all the triplets that appear either in the training, validation or test set (except the test triplet of interest).
the original (possibly flawed) one is termed raw, while we refer to the newer as
filtered (or filt).

a lower Mean is better while a higher Hits@10 is better.

## TransH

### Contribution

In this paper, we start by analyzing the problems of TransE on reflexive/one-to-many/many-to-one/many-tomany relations. Accordingly we propose a method named translation on hyperplanes (TransH) which interprets a relation as a translating operation on a hyperplane. In TransH, each relation is characterized by two vectors, the norm vector (wr) of the hyperplane, and the translation vector (dr) on the hyperplane.
Basically, we set different probabilities for replacing the head or tail entity when corrupting the triplet, which depends on the mapping property of the relation, i.e., one-to-many, many-to-one or many-to-many. We tend to give more chance to replacing the head entity if the relation is one-to-many and give more chance to replacing the tail entity if the relation is many-to-one.In this way, the chance of generating false negative labels is reduced.

we use “unif” to denote the traditional way of replacing head or tail with equal probability, and use “bern.” to denote reducing false negative labels by replacing head or tail with different probabilities.

### Score Function

$$f_r(h,r) = ||(h-w_r^Thw_r) +d_r -(t-w_r^Ttw_r) ||^2_2$$
and $||w_r||_2=1$, $w_r$ is the norm vector of hyperplane ,the $d_r$ is the translation vector on the hyperplane. and $||w_r^Td_r||/||d+r||_2 <=\epsilon$ ,

are enforced to ensure that $d_r,h_{\perp} and, t_{\perp}$ are on the hyperplane.

## TransD

### definition

In TransD, we define two vectors for each entity and relation. The first vector represents the meaning of an entity or a relation, the other one (called projection vector) represents the way that how to project a entity embedding into a relation vector space and it will be used to construct mapping matrices
TransD has no matrixby- vector operations which can be replaced by vectors operations

### contributions

We propose a novel model TransD, which constructs a dynamic mapping matrix for each entity-relation pair by considering the diversity of entities and relations simultaneously. It provides a flexible style to project entity representations to relation vector space.

## KG2E

### inspired

TranE、TransH、TransR，Although these models are proved to be effective in many scenarios, we notice thatdifferent entities and relations often share the same margin when separating a positive triplet and its corresponding negative triplet
and its corresponding negative triplet, and the (un)certainties of entities and relations in KGs are totally neglected.
In addition, TransE、TransH、TransR， have difficulty learning valid representations for reflexive relations((h,r,t)and(t,r,h) both hold) because they use the same operation for head and tail entities. Issues often exist in the “point” based embedding models, in which “point” vectors are typically compared by dot products, cosine-distance or L1,2 norm, all of which provide for symmetric comparison between instances

### definition

this paper switches to density-based(energy-based) embedding and propose KG2E for explicitly modeling the certainty of entities and relations, which learn the representations of KGs in the space of multi-dimensional Gaussian distributions.
we employ the KL-divergence for scoring triplets, which is a natural asymmetry function for effectively modeling multiple types of relations.

KL散度用来比较真实与预测概率分布之间的差异参考链接

### contribution

• we specially consider the (un)certainties of entities and relations the a KG.
• We model each entity/relation into a Gaussian distribution. In particular, the mean vector denote its position and the covariance matrix (currently diagonal for computing efficiency) are used to describe the (un)certainties of the entities and relations.
• find that the asymmetric method (KL-divergence between two Gaussian distributions) is more suitable for learning the representation of a KG with Gaussian embedding
• The experimental results demonstrate that our proposed method can effectively model the (un)certainty of entities and relations in a KG.

### conclusion

This is different from previous translation models, which focus on point-based embedding. The point-based embedding methods ignore that different entities and relationsmay contain different (un)certainties.
1)the covariance of Gaussian embedding can effectively model the (un)certainty of a relation;
2) relations with complex semantic (e.g., many_to_one (m-1) and many_to_many (m-n) relations) have larger uncertainty, and
3) the more unbalanced the head and tail entities, the larger the uncertainty.

word2vec参考

KG2E，每个实体和关系都由具有均值向量和协方差矩阵的高斯分布表示（当前具有对角协方差以提高计算效率），该模型旨在对KG中的实体和关系的不确定性进行建模

## TransG

### Contribution

• We propose a new issue in knowledge graph embedding, multiple relation semantics that a relation in knowledge graph may have different meanings revealed by the associated entity pairs, which has never been studied previously
• we propose a novel Bayesian non-parametric infinite mixture embedding model, TransG. The model can automatically discover semantic clusters of a relation, and leverage a mixture of multiple relation components for translating an entity pair.

Chinese Restaurant Process for this relation中国餐馆过程

## TransA

### Inspired

However, the loss metric in translation-based models is oversimplified. This flaw makes the current embedding methods incompetent to model various and complex entities/relations in knowledge base.
Firstly, due to the inflexibility of loss metric, current translation-based methods apply spherical equipotential hyper-surfaces with different plausibilities, where more near to the centre, more plausible the triple is.spherical equipotential hyper-surfaces are not flexible enough to characterise the topologies, making current translation-based methods incompetent for this task.

### View

we have a good reason to conjecture that a relation could only be affected by several specific dimensions while the other unrelated dimensions would be noisy. Treating all the dimensions identically involves much noises and degrades the performance.

### Contribution

we propose TransA, an embedding method by utilizing an adaptive and flexible metric.

• First, TransA applies elliptical surfaces instead of spherical surfaces. By this mean, complex embedding topologies induced by complex relations could be represented better.
• Then, as analysed in “Adaptive Metric Approach”, TransA could be treated as weighting transformed feature dimensions. Thus, the noise from unrelated dimensions is suppressed.
• the proposed TransA replaces inflexible Euclidean distance with adaptive Mahalanobis distance of absolute loss, because Mahalanobis distance is more flexible and more adaptive

### Score Function

$$f_r(h,r) = (|h+r-t|)^TW_r(|h+r-t|)$$
and $W_r$ is a relation-specific symmetric non-negative weight matrix that corresponds to the adaptive metric.

## TranSparse

### Inspired

All previous work including Trans(E, H, R, and D) ignore the heterogeneity (some relations link many entity pairs and others do not) and the imbalance (the number of head entities and that of tail entities in a relation could be different) of knowledge graphs.
Specifically, (1) some relations link many entity pairs (called complex relations) and others do not (called simple relations); (2) some relations link many head (tail) entities and fewer tail (head) entities.
Heterogeneity may lead to overfitting on simple relations or underfitting on
complex relations,Meanwhile, imbalance of the two sides (head and tail) indicates that it is unreasonable to treat them (the two sides) equally.

### Definition

In TranSparse, transfer matrices are replaced by adaptive sparse matrices, whose sparse degrees are determined by the number of entities (or entity pairs) linked by relations.
The fraction of zero elements over the total number of elements in a matrix is called sparse degree (denoted by θ).We use M(θ) to denote a matrixMwith sparse degree θ.

### Contribution

We introduced a model named TranSparse that embed knowledge graphs into continues vector space with adaptive sparse transfer matrices for their completion. It considers the heterogeneity and imbalance of data

### Score Function

$$f_r(h,r) = || h_p +r - t_p||^2_{l1/l2}$$
where $h_p = M_r( \theta_r )h$ and $t_p=M_r(\theta_r)t$ ，and $\theta_r=1-(1-\theta_{min})N_r/N_{r*}$
we set a sparse transfer matrix $M_r(\theta_r)$ and a translation vector r for each relation r. $N_r$ represents the number of entity pairs linked by relation r and $N_{r*}$ denotes the maximum number of them (relation $r^*$ links the most entity pairs). We set a minimum sparse degree $\theta_{min} (0 ≤ \theta_{min} ≤ 1)^3$ for the matrix M_{r*}.

## PTransE

### Inspired

It is known that there are also substantial multiple-step relation paths between entities indicating their semantic relationships. The relation paths reflect complicated inference patterns among relations in KBs.

### Definition

In this paper, we propose a path-constraint resource allocation algorithm to measure the reliability of relation paths. Afterwards, we select the reliable relation paths for representation learning.
In order to take relation paths into consideration, relation paths should also be represented in a low-dimensional space ### Contribution

This paper presents PTransE, a novel representation learning method for KBs, which encodes relation paths to embed both entities and relations in a low-dimensional space. To take advantages of relation paths, we propose path-constraint resource allocation to measure relation path reliability, and employ semantic composition of relations to represent paths for optimization.

### Score Function

$$G(h,r,t)=E(h,r,t)+E(h,p,t)$$
where E(h,r,t) is same as TransE, E(h, P, t) models the inference correlations between relations with multiplestep relation path triples,where $E(h,,t)= \frac1Z\sum_{p\epsilon P(h,t) }{R(p|h,t)E(h,p,t)}$
where R(p|h, t) indicates the reliability of the relation path p given the entity pair (h, t),$Z=\sum_{p\epsilon P(h,t) }{R(p|h,t)}$is a normalization factor,E(h, p, t) is the energy function of the triple(h, p, t)

## RTransE

### Definition

In this paper, we propose an extension of TRANSE 1 that focuses on improving its representation of the underlying graph of multi-relational data by trying to learn compositions of relationships as sequences of translations in the embedding
space.
In our approach, called RTRANSE, the training set is augmented with relevant examples of such compositions by performing constrained walks in the knowledge graph, and training so that sequences of translations lead to the desired result.

However, we show that there is a natural way to compose relationships by simply adding translation vectors and not requiring additional parameters, which makes it specially appealing because of its scalability.

## ManifoldE

### Definition

we propose a manifold-based embedding principle (ManifoldE) which could be treated as a well-posed algebraic system that expands point-wise modeling in current models to manifold-wise modeling.
The issue of precise link prediction is caused by two reasons: the ill-posed algebraic system and the over-strict geometric form.
Mathematically, an ill-posed algebraic system would commonly make the solutions imprecise and unstable.
Second, from the geometric perspective, the position of the golden facts in existing methods is almost one point, which is too strict for all relations and is more insufficient for complex relations such as many-to-many relations.

### Contribution

To summarize, our contributions are two-fold: (1)We have addressed the issue of precise link prediction and uncover the two reasons: the ill-posed algebraic system and the over-strict geometric form. To our best knowledge, this is the first time to address this issue formally. (2)We propose a manifoldbased principle to alleviate this issue and design a new model, ManifoldE, which achieves remarkable improvements over the state-of-the-art baselines in experiments, particularly for precise link prediction.
For instance, all tail entities for a 1-N relation could lay on a sphere, which applies h + r as the center and Dr as the radius. it would be more suitable in a manifold setting than in a point setting.

### Score Function

When a head entity and a relation are given, the tail entities lay in a high-dimensional manifold.
$$f_r(h,t)=||M(h,r,t)-D_r^2||^2$$
where $D_r$ is a relation-specific manifold parameter. $M: E\times L \times E \rightarrow R$ is the manifold function, where E, L are the entity set and relation set and R is the real number field.

## FTransE

### Definition

instead of strictly enforcing h + r = t, we only constrain that the direction of h + r (or t − r) is the same as that of t (or h), but allow flexible magnitude of resulting vectors. Therefore, unlike previous method assumes h + r ≈ t if (h, r, t) holds, FT takes h+r ≈ αt(or t−r ≈ αh), where the flexibility is reflected in α.

### Score Function

$$f_r(h,t)=(h+r)^Tt+h^T(t-r)$$

## TransA

Locally Adaptive Translation for Knowledge Graph Embedding

### Inspired

On one hand, existing embedding methods over one knowledge graph determine the optimal form of loss function during experiments. A critical problem is that the determination is made over a limited number of candidates.
On the other hand, existing embedding methods over two different knowledge graphs find their individual optimal loss functions over the same set of candidates.

### Contribution

We experimentally prove that knowledge graphs with different localities may correspond to different optimal loss functions, which differ in the setting of margins. Then we study how margin affects the performance of embedding by deducing a relation between them.
TransA It finds the optimal loss function by adaptively determining the margins over different knowledge graphs.

### Loss Function

$$\displaystyle\sum_{(h,r,t)\epsilon \Delta}\displaystyle\sum_{(h^1,r,t^1)\epsilon \Delta^1} max(0,f_r(h,t)+M_{opt}-f_r(h^1,t^1))$$
where max(x, y) returns the maximum between x and y, $M_{opt} = \mu M_{ent} + (1 − \mu)M_{rel}$ is the optimal margin, Δ is the set of correct triples and $Δ^1$ is the set of incorrect triple.

## lppTransE

### Inspired

The embedding space generated by existing translation-based embeddings do not represent transitive and symmetric relations precisely, because they ignore the role of entities in triples. Thus, we introduce a role-specific projection which maps an entity to distinct vectors according to its role in a triple.
TransE、TransH、TransR、TransD ，they all ignore logical properties of relation ,That is, transitive relations and symmetric relations lose their transitivity or symmetricity in the vector space generated by the translation-based models.

### Definition

The problems caused by wrong expression of transitive and symmetric relations are two-folds:
One fold is that the relations with logical properties are common in knowledge bases.
The other is that transitive or symmetric relations do not affect the entities that are directly connected by the relations, but affect also other entities shared by non-transitive and nonsymmetric relations through the entities

### Score Function

$$f_r^{lppE}(h,t)=||M_hh+r-M_tt ||_{l1/2}$$

## STransE

### Introduction

The primary contribution of this paper is that two very simple relation-prediction models, SE and TransE, can be combined into a single model, which we call STransE. Specifically.

### Score Function

$$f_r(h,t)=||W_{r,1}h+r-W_{r,2}t||_{l1/2}$$

## puTransE

### Inspired

In this paper, we propose a novel extension of the translational embedding model to solve three main problems of the current models.
Firstly, translational models are highly sensitive to hyperparameters such as margin and learning rate.
Secondly, the translation principle only allows one spot in vector space for each golden triplet. Thus, congestion of entities and relations in vector space may reduce precision.
Lastly, the current models are not able to handle dynamic data especially the introduction of new unseen entities/relations or removal of triplets.

### Contribution

In this paper, we propose puTransE (Parallel Universe TransE), an online and robust adaptation of TransE to solve the above mentioned problems.Our proposed approach explicitly generates multiple embedding spaces via semantically and structurally aware triplet selection scheme and non-parametrically estimates the energy score of a triplet.

## ITransF

### Inspired

(TransR,STranE)However, not all relations have abundant data to estimate the relation specific matrices as most of the training samples are associated with only a few relations, leading to the data sparsity problem for rare relations.

### Definition

Concretely, since the projection spaces are unique to each relation, projection matrices associated with rare relations can only be exposed to very few facts during training, resulting in poor generalization.
In this paper, we propose an interpretable knowledge transfer model (ITransF), which encourages the sharing of statistic regularities between the projection matrices of relations and alleviates the data sparsity problem.
In summary, the contributions of this work are: (i) proposing a novel knowledge embedding model which enables knowledge transfer by learning to discover shared regularities; (ii) introducing a learning algorithm to directly optimize a sparse representation from which the knowledge transferring procedure is interpretable.

### Score Function

$$f_r(h,r)=||\alpha_r^H \cdot D \cdot h +r -\alpha_r^T \cdot D \cdot r||_l$$

we stack all the concept projection matrices to a 3-dimensional tensor $D \epsilon R^{m×n×n}$, where m is the pre-specified number of concept projection matrices and n is the dimensionality of entity embeddings and relation embeddings.

where $\alpha_r^H ,\alpha_r^T \epsilon [0,1]^m,satisfying \sum_i\alpha_{r,i}^H=\sum_i\alpha_{r,i}^T=1$ ,are normalized attention vectors used to compose all concept projection matrices in D by a convex combination.

## TransE-RS

### Inspired

By the margin-based ranking loss, the score of a positive triplet(ℎ, 𝑟, 𝑡) is lower at least by 𝛾 than that of corresponding negative triplet, meanwhile a low score of correct triplet should be expected.However, we notice that, with the margin-based ranking loss it is also possible that the score of correct triplet is not small enough to hold (ℎ, 𝑟, 𝑡), and even that h + r maybe far away from t.
However, we find that the loss function cannot ensure the fact that the scoring of correct triplets must be low enough to fulfill the translation.

### Contribution

In order to enhance the expectation of low scoring for positive triplets, this paper presents an upper limit score of positive triplets by a limit-based scoring loss, and adds the limit-based scoring loss item into common loss function as new loss evaluation for optimizations.

### Lose Function

$$L_{RS}=\displaystyle\sum_{(h,r,t)\epsilon \Delta}\displaystyle\sum_{(h^1,r,t^1)\epsilon \Delta^1} [\gamma_1+f_r(h,t)-f_r(h^1,t^1)]+ + \lambda[f_r(h,t)-\gamma_2]_+$$

where $\gamma_2$ is the upper limit of the scoring for the correct triplet (ℎ, 𝑟, 𝑡),

## CombinE

### Inspired

Many existing knowledge graph embedding models either focus on learning rich features from entities but fail to extract good fea- tures of relations, or employ sophisticated models that have rather high time and memory-space complexities.

### Introduction

In this paper, we propose a novel knowledge graph embedding model, CombinE. It exploits entity features from two complemen-tary perspectives via the plus and minus combinations. We start with the plus combination, where we use shared features of entity pairs participating in a relation to convey its relation features. To also allow differences of each pairs of entities participating in a re- lation, we also use the minus combination, where we concentrate on individual entity features, and regard relations as a channel to offset the divergence and preserve the prominence between head and tail entities. ### Contribution

We identify the significance of capturing relation features that is largely overlooked by existing literature. In this direction, to appropriately model an abstract relation, we propose to leverage shared features of entity pairs having the relation to express the relation features.
CombinE, which comprises two complemen-tary aspects—in plus combination, relation is an abstrac-tion of related entity pairs; in minus combination, entity- specific features are used to offset the divergence and pre-serve the prominence between head and tail entities.
Plus and minus combinations together complement to each other on shared features and individual features, respec-tively,

### Score Function

$$f(e_h,e_t,r;\Theta)=||h_p+t_p-r_p||^2_{l1/l2}+||h_m-t_m-r_m||^2_{l1/l2}$$

where $\Theta$ is the set of parameters.The subscripts p and m denote plus and minus combination, respectively.
Hence, if we can derive all the features that are shared by all these entity pairs, this can be used to convey the unique features exhibited by relation r , which is unlikely to be possessed by other entity pairs not having relation r . Mathematically, $r_p \approx h_p + t_p$ .and $r_p$ is the abstract relation features that all of relevant entity pairs jointly have.
The embedding $r_m$ of relation r depicts the discrepancies between $e_h$ and $e_t$ by obtaining the differences between them. Mathematically,$r_m \approx h_m - t_m$. In this case, we expect that $r_m + t_m$ is close to $h_m$.

## TorusE

Knowledge Graph Embedding on a Lie Group

### Inspired

TransE employs the principle that the differences between entity embeddings represent their relations. The principle seems very simple, but it can effectively capture the rules of a knowledge graph. However, TransE has a problem with its regularization. TransE forces entity embeddings to be on a sphere in the embedding vector space. This regularization warps the embeddings and makes it difficult for them to fulfill the abovementioned principle.

### Introduction

This paper proposes a novel embedding model, TorusE, to solve the regularization problem. The principle of TransE can be defined on any Lie group. A torus, which is one of the compact Lie groups, can be chosen for the embedding space to avoid regularization.

### Contribution

TorusE, which embeds entities and relations without any regularization on a torus.
In this paper, we propose a model that does not require any regularization but has the same principle as TransE by embedding entities and relations on another embedding space, a torus. By choosing a compact Lie group as an embedding space, embeddings never diverge unlimitedly and regularization is no longer required

### TransE flaw

the principle and regularization conflict during training, because for each $e \in E$ and $r \in R, e + r \notin S^{n−1}$ almost always holds.where $S^{n−1}$ is an n-1 dimensional sphere.
Hence, the principle h + r = t is rarely realized in most cases, as shown in Figure 1. To avoid the problem of regularization shown in Figure 1, we need to change the embedding space from Rn, which is an open manifold, to a compact space, because any real value continuous functions on a compact space are bounded. It means embeddings never diverge unlimitedly because the scoring function is also bounded. This allows us to avoid regularization and solve the conflict between the principle and the regularization during training

### Lie Group

• Definition
A Lie group is a group that is also a finite dimensional smooth manifold, in which the group operations of multiplication and inversion are smooth maps.A Lie group is called an Abelian Lie group when the operation of multiplication is commutative. Although ComplEx is a bilinear model and TorusE is a translation-based model, they have strong similarity.Bilinear models are trained to maximize the scores of triples while translation-based models are trained to minimize them

### Score Function ## TransAt

Translating Embeddings for Knowledge Graph Completion with Relation Attention Mechanism

### Contribution

We point out that previous approaches can not learn an indeed attention mechanism and analyze the reasons;
In this paper, we propose a knowledge graph embedding method with attention mechanism named TransAt (Translation with Attention). The basic idea of TransAt is illustrated in Figure 2. We hold the view that when we consider whether a relationship described by a triple is true, it is done through a two-stage process. In the first stage, we will check whether the categories of head and tail entities with respect to a given relation make sense. Then, for those possible compositions, we need to consider whether the relationship holds based on the relation-related dimensions (attributes). We propose an asymmetric mechanism which can further improve the performance;

### Score Function

$$f_r(h,t)=\begin{cases} P_r(h)+r-P_r(t) & h \in H_r,t \in T_r \\ \infty & otherwise \\ \end{cases}$$
$H_r(T_r)$ is capable head (tail) candidate set for relation r, $P_r$ is a projection which only keeps dimensions related to r.

## TransC

Differentiating Concepts and Instances for Knowledge Graph Embedding

### Inspired

However, all these methods ignore to distinguish between concepts and instances, and regard both as entities to make a simplification. Therefore, the common simplification in previous work will lead to the following two drawbacks:
Insufficient concept representation;Lack transitivity of both isA relations: instanceOf and subClassOf (generally known as isA) are two special relations in knowledge graph.

### Definition

Concepts, which represent a group of different instances sharing common properties, are essential information in knowledge representation. ### Contribution

In this paper, we propose a novel knowledge graph embedding model named TransC by differentiating concepts and instances. Specifically, TransC encodes each concept in knowledge graph as a sphere and each instance as a vector in the same semantic space. and relative positions are employed to model the relations between concepts and instances.

## RotatE

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

## TransGate

TransGate: Knowledge Graph Embedding with Shared Gate Structure

### Inspired

Current models continue to improve embedding by focusing on discriminating relation-specific information from entities with increasingly complex feature engineering.We noted that they ignored the inherent relevance between relations
and tried to learn unique discriminate parameter set for each relation.
current discriminate models potentially suffer from following three problems:
First, current discriminate models potentially suffer from large parameters
Second, current discriminate models explore increasingly complex feature engineering to improve embedding, resulting in high time complexity.
Third, due to the large parameter size and complex design, current models have to introduce more hyper parameters and adopt pre-training to prevent overfitting

### Contribution

In this paper, we follow the thought of parameter sharing to simultaneously learn more expressive features, reduce parameters and avoid complex feature engineering. Based on gate structure from LSTM, we propose a novel model TransGate and develop
shared discriminate mechanism, resulting in almost same space complexity as indiscriminate models.
TransGate is a self-contained model and does not need any extra hyper parameter to prevent overfitting.

• We identify the significance of inherent relevance between relations, which is largely overlooked by existing literature.
• Based on gate structure from LSTM (Hochreiter and Schmidhuber 1997), we propose TransGate and develop shared discriminate mechanism to avoid large discriminate parameters.
• To develop a more effective and scalable model, we reconstruct standard gate with weight vectors ### ScoreFunction

$$h_r=h$$

## TransMs

Knowledge Graph Embedding for Complex Relations by Multidirectional Semantics
Our model has merely one additional parameter $\alpha$ than the simplest pioneered model (i.e. TransE)

### issues

Although these methods(trans E、H、R、D) show improvements compared with previous methods, they have difficulty in dealing with the link prediction for complex relations, which we attribute to the following reasons:
i) They transform the entities embedding only by transmitting the semantics from relations to entities but not transmitting the semantics from head/tail entities to tail/head entities at the same time.
ii) They do not transform the relation embedding by transmitting the semantics from entities to relations neither.
iii) They show poor scalability to the large-scale knowledge graph for having much more parameters than TransE.
iv) They transmit the semantics by linear transformation, which might be limited and shall be replaced by nonlinear transformation for better semantics translation.
v) From the mathematical point of view, when head/tail entity vector and relation vector are fixed, the left/right side of $h\perp +r\perp = t\perp$ is fixed, i.e. no matter how linearly transform to get tail/head entity vector, the final tail/head entity vectors are distributing around one center, resulting in that tail/head entity vectors are close to each other when head/tail entity vector and relation vector are fixed.

## CrossE

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Crossover interactions — bi-directional effects between entities and relations — help select related information when predicting a new triple, but haven’t been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions.
bi-directional effects between entities and relations including interactions from relations to entities and interactions from entities to relations. Crossover interactions are quite common and helpful for related information selection, related information selection is necessary when predicting new triples because there are various information about each entity and relation in KGs.

## HAKE

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

# Biliner Model

Bilinear models have more redundancy than translationbased models and so easily become overfitted. Hence, embedding spaces are limited to low-dimensional space.

## DistMult

### Contribution

We present a general framework for multirelational learning that unifies most multi-relational embedding models developed in the past, including NTN and TransE.
We propose and evaluate a novel approach that utilizes the learned embeddings to mine logical rules such as $BornInCitypa(a,b) \bigwedge CityOfCountry(b,c) \Rightarrow Nationality(a,c)$

### Introduction

The embeddings can be learned via a neural network. The first layer projects a pair of input entities to low dimensional vectors, and the second layer combines these two vectors to a scalar for comparison via a scoring function with relation-specific parameters.

## ComplEx

### Introduction

Binary relations in KBs exhibit various types of patterns: hierarchies and compositions like FatherOf, OlderThan or IsPartOf—with partial/total, strict/non-strict orders—and equivalence relations like IsSimilarTo. a relational model should (a) be able to learn all combinations of these properties, namely reflexivity/irreflexivity, symmetry/antisymmetry and transitivity.
Dot products of embeddings scale well and can naturally handle both symmetry and (ir-)reflexivity of relations; using an appropriate loss function even enables transitivity (Bouchard et al., 2015). However, dealing with antisymmetric relations has so far almost always implied an explosion of the number of parameters making models prone to overfitting.
Thus complex vectors can effectively capture antisymmetric relations while retaining the efficiency benefits of the dot product, that is linearity in both space and time complexity.

### Score Function

$$\phi(r,s,o;\Theta)=Re(<\omega_r,e_s,\overline{e_o}>)=Re(<\displaystyle\sum_{k=1}^{K}\omega_{rk}e_{sk},\overline{e_{ok}}>)$$
$=<Re(\omega_r),Re(e_s),Re(e_o)>+<Re(\omega_r),Im(e_s),Im(e_o)>+<Im(\omega_r),Re(e_s),Im(e_o)>-<Im(\omega_r),Im(e_s),Re(e_o)>$

# neural network embedding models

NTN、

## ProjE

### Definition

In this work, we present a shared variable neural network model called ProjE that fills-in missing information in a knowledge graph by learning joint embeddings of the knowledge graph’s entities and edges, and through subtle, but important, changes to the standard loss function.

### Contribution

ProjE has four parts that distinguish it from the related work:

1. Instead of measuring the distance between input triple <h, r, ?>and entity candidates on a unified or a relationshipspecific plane, we choose to project the entity candidates onto a target vector representing the input data.
2. Unlike existing models that use transformation matrices, we combine the embedding vectors representing the input data into a target vector using a learnable combination operator. This avoids the addition of a large number of transformation matrices by reusing the entity-embeddings.
3. Rather than optimizing the margin-based pairwise ranking loss, we optimize a ranking loss of the list of candidateentities (or relationships) collectively. We further use candidate sampling to handle very large data sets.
4. Unlike many of the related models that require pre-trained data from prerequisite models or explore expensive multihop paths through the knowledge graph, ProjE is a selfcontained model over length-1 edges

### Function

The combination operator is therefore defined as:
$$e\oplus r=D_ee+D_rr+b_c$$
where $D_e$ and $D_r$ are k × k diagonal matrices which serve as global entity and relationship weights respectively, and $b_c ∈ R^k$ is the combination bias.
Using this combination operator, we can define the embedding projection function as:
$$h(e,r)=g(W^cf(e \oplus r)+b_p)$$
where f and g are activation functions that we define later, $W_c ∈ R^{s×k}$ is the candidate-entity matrix, $b_p$ is the projection bias, and s is the number of candidate-entities. h(e, r) represents the ranking score vector, where each element represents the similarity between some candidate entity in $W^c$ and the combined input embedding $e \oplus r$.

#### Lose Function

• ProjE_pointwise
$$\zeta(e,r,y)=-\displaystyle\sum_{i \epsilon (i | y_i=1)} log(h(e,r)_i)$$

$$-\displaystyle\sum_{m} E_{j \sim P_y} log(1-h(e,r)j)$$
where e and r are the input embedding vectors of a training instance in S, y ∈ $R^s$ is a binary label vector where $y_i = 1$ means candidate i represents a positive label, m is the number of negative samples drawn from a negative candidate distribution $E_{j \sim P_y}$ (described in next section).

# External information

## RESCAL

tensor factorization methods