Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings
https://arxiv.org/abs/2002.05969
Abstract
Task: answering complex logical queries on large-scale incomplete knowledge graphs (KGs)
Motivation:
Prior work models queries as single points in the vector space: problematic because a complex query represents a large set of its answer entities
Prior work only handles queries with conjunctions and existential quantifiers, not disjunctions
Key insight: embed queries as boxes --> a set of points inside the box corresponds to a set of answer entities of the query
Able to: handles queries that use conjunctions, existential quantifiers, and logical disjunctions in massive and incomplete KGs
Introduction
First-order logical queries: can be represented as DAGs
Con
Limited scalability: computational complexity of subgraph matching (exp in query size)
Correctness: cannot correctly answer queries with missing relations
Imputation: make the graph denser, and limit scalability
Alternative: embed logical queries and KG entities to low-dim vector space
Pro
Robustly handle missing relations
Orders of magnitude faster
Con: single point in vector space
Answer logical query requires modeling a set of active entities while traversing the KG
Unnatural to define logical operators of points in the vector space
Can only handle conjunctive queries
This paper: embedding-based framework for reasoning over KGs that is capable of handling EPFO logical queries in scalable manner
Model a set of entities using a closed region instead of a single point (i.e. box)
Box naturally model sets of entities they enclose
Logical operators (e.g. set intersection) can be similarly defined over boxes
Operations are closed
EPFO --> DNF
Represent any EPFO query as a set of individual boxes, where each box is obtained for each conjunctive query in the DNF
Return NN to any of the boxes as the answers to the query
Then take the union of the answer entities
Query2Box
Objective function
Learn embeddings of entities in the KG
Learn parameterized geometric logical operators over boxes
Query answer process: arbitrary EPFO query q
Identify its computation graph
embed the query by executing a set of geometric operators over boxes
entities that are enclosed in the final box embedding are returned as answers to the query
3.1 Knowledge graphs and conjunctive queries
Conjunctive queries: subclass of first-order logical queries that uses existential and conjunction operations
Dependency graph
Nodes: variable or non-variable entities in q
Edges: relations in q
In order for the query to be valid, the corresponding dependency graph needs to be a DAG
Anchor entities: source nodes
Query target: unique sink node
Computation graph: projection, intersection
3.2 Reasoning over sets of entities using box embeddings
Intuitions: decompose complex query into a sequence of logical operations, and then execute these operations in the vector space
This way we will obtain the embedding of the query
Answers to the query will be entities that are enclosed in the final query embedding box
Two methodological advances
The use of box embeddings to model and reason over set of entities
How to tractably handle disjunction operators
Box embeddings
p = (Cen(p), Off(p))
Cen(p) is the center of the box
Off(p) is the positive offset of the box, modeling the size of the box
Initial boxes for source nodes: (v, 0), where v is the anchor entity vector and 0 is a d-dim all-zero vector
geometric projection operator
relation embedding: r = (Cen(r), Off(r))
Given input box embedding p, projection is modeled by p + r: sum the centers and sum the offsets
translated center, larger offset
geometric intersection operator
p(inter) = (Cen(p(inter)), Off(p(inter)))
Calculated by: performing attention over the box centers, and shrinking the box offset using the sigmoid function
Entity-to-box distance
Training objective
Learn entity embeddings and geometric projection and intersection operators
3.3 Tractable handling of disjunction using disjunctive normal form
Can define union, but union operations over boxes is not closed
Key idea: transform a given EPFO query into a Disjunctive Normal Form (DNF) (i.e. disjunction of conjunctive queries, and union operations only appear at the last steps)
Transformation to DNF
See the paper (calculate computation graphs and combine)
Aggregation
Distance between the given EPFO query q and an entity v
N > 1: minimum distance to the closest box as the distance to an entity
Computational complexity
Complexity of answering an EPFO query = answering the N conjunctive queries
N is not large, and the computation can be parallelized
Answering each conjunctive query: execute a sequence of box operations, and then perform a range search
Experiment
KGs and Query Generation
E.x. FB15k, FB15k-237, NELL995
'p': projection
'i': intersection
'u': union
Take-away
Workloads
Originally: more like structured embeddings, model individual entities and study pairwise relations
Here (box embeddings): model sets of entities and reason over those sets
Last updated
Was this helpful?