Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings

https://arxiv.org/abs/2002.05969

Task: answering complex logical queries on large-scale incomplete knowledge graphs (KGs)
Motivation:
- Prior work models queries as single points in the vector space: problematic because a complex query represents a large set of its answer entities
- Prior work only handles queries with conjunctions and existential quantifiers, not disjunctions
Key insight: embed queries as boxes --> a set of points inside the box corresponds to a set of answer entities of the query
Able to: handles queries that use conjunctions, existential quantifiers, and logical disjunctions in massive and incomplete KGs

First-order logical queries: can be represented as DAGs
- Con
  - Limited scalability: computational complexity of subgraph matching (exp in query size)
  - Correctness: cannot correctly answer queries with missing relations
    Imputation: make the graph denser, and limit scalability
Alternative: embed logical queries and KG entities to low-dim vector space
- Pro
  - Robustly handle missing relations
  - Orders of magnitude faster
- Con: single point in vector space
  - Answer logical query requires modeling a set of active entities while traversing the KG
  - Unnatural to define logical operators of points in the vector space
  - Can only handle conjunctive queries
This paper: embedding-based framework for reasoning over KGs that is capable of handling EPFO logical queries in scalable manner
- Model a set of entities using a closed region instead of a single point (i.e. box)
  - Box naturally model sets of entities they enclose
  - Logical operators (e.g. set intersection) can be similarly defined over boxes
  - Operations are closed
- EPFO --> DNF
  - Represent any EPFO query as a set of individual boxes, where each box is obtained for each conjunctive query in the DNF
  - Return NN to any of the boxes as the answers to the query
  - Then take the union of the answer entities

Objective function
- Learn embeddings of entities in the KG
- Learn parameterized geometric logical operators over boxes
Query answer process: arbitrary EPFO query q
- Identify its computation graph
- embed the query by executing a set of geometric operators over boxes
- entities that are enclosed in the final box embedding are returned as answers to the query

Conjunctive queries: subclass of first-order logical queries that uses existential and conjunction operations
Dependency graph
- Nodes: variable or non-variable entities in q
- Edges: relations in q
- In order for the query to be valid, the corresponding dependency graph needs to be a DAG
  - Anchor entities: source nodes
  - Query target: unique sink node
Computation graph: projection, intersection

Intuitions: decompose complex query into a sequence of logical operations, and then execute these operations in the vector space
- This way we will obtain the embedding of the query
- Answers to the query will be entities that are enclosed in the final query embedding box
Two methodological advances
- The use of box embeddings to model and reason over set of entities
- How to tractably handle disjunction operators
Box embeddings
- p = (Cen(p), Off(p))
  - Cen(p) is the center of the box
  - Off(p) is the positive offset of the box, modeling the size of the box
- Initial boxes for source nodes: (v, 0), where v is the anchor entity vector and 0 is a d-dim all-zero vector
- geometric projection operator
  - relation embedding: r = (Cen(r), Off(r))
  - Given input box embedding p, projection is modeled by p + r: sum the centers and sum the offsets
    translated center, larger offset
- geometric intersection operator
  - p(inter) = (Cen(p(inter)), Off(p(inter)))
  - Calculated by: performing attention over the box centers, and shrinking the box offset using the sigmoid function
- Entity-to-box distance
- Training objective
  - Learn entity embeddings and geometric projection and intersection operators

Can define union, but union operations over boxes is not closed
Key idea: transform a given EPFO query into a Disjunctive Normal Form (DNF) (i.e. disjunction of conjunctive queries, and union operations only appear at the last steps)
Transformation to DNF
- See the paper (calculate computation graphs and combine)
Aggregation
- Distance between the given EPFO query q and an entity v
  - N > 1: minimum distance to the closest box as the distance to an entity
Computational complexity
- Complexity of answering an EPFO query = answering the N conjunctive queries
  - N is not large, and the computation can be parallelized
  - Answering each conjunctive query: execute a sequence of box operations, and then perform a range search

Workloads
- Originally: more like structured embeddings, model individual entities and study pairwise relations
- Here (box embeddings): model sets of entities and reason over those sets

Last updated 3 years ago

Was this helpful?