# Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings

### Abstract&#x20;

* Task: answering **complex** **logical queries** on large-scale incomplete knowledge graphs (KGs)&#x20;
* Motivation:&#x20;
  * Prior work models queries as single points in the vector space: problematic because a complex query represents a large set of its answer entities&#x20;
  * Prior work only handles queries with conjunctions and existential quantifiers, not disjunctions&#x20;
* Key insight: embed queries as boxes --> a set of points inside the box corresponds to a set of answer entities of the query&#x20;
* Able to: handles queries that use conjunctions, existential quantifiers, and logical disjunctions in massive and incomplete KGs&#x20;

### Introduction&#x20;

* First-order logical queries: can be represented as DAGs&#x20;
  * Con&#x20;
    * *Limited scalability*: computational complexity of subgraph matching (exp in query size)
    * *Correctness*: cannot correctly answer queries with missing relations&#x20;
      * Imputation: make the graph denser, and limit scalability&#x20;
* Alternative: embed logical queries and KG entities to low-dim vector space&#x20;
  * Pro&#x20;
    * Robustly handle missing relations&#x20;
    * Orders of magnitude faster&#x20;
  * Con: single point in vector space
    * Answer logical query requires modeling a set of active entities while traversing the KG&#x20;
    * Unnatural to define logical operators of points in the vector space&#x20;
    * Can only handle conjunctive queries&#x20;
* This paper: embedding-based framework for reasoning over KGs that is capable of handling **EPFO logical queries** in **scalable** manner&#x20;
  * Model a set of entities using a closed region instead of a single point (i.e. box)
    * Box naturally model sets of entities they enclose&#x20;
    * Logical operators (e.g. set intersection) can be similarly defined over boxes&#x20;
    * Operations are closed&#x20;
  * EPFO --> DNF&#x20;
    * Represent any EPFO query as a set of individual boxes, where each box is obtained for each conjunctive query in the DNF&#x20;
    * Return NN to any of the boxes as the answers to the query&#x20;
    * Then take the union of the answer entities&#x20;

### Query2Box

* Objective function&#x20;
  * Learn embeddings of entities in the KG
  * Learn parameterized geometric logical operators over boxes&#x20;
* Query answer process: arbitrary EPFO query q&#x20;
  * Identify its computation graph
  * embed the query by executing a set of geometric operators over boxes&#x20;
  * entities that are enclosed in the final box embedding are returned as answers to the query&#x20;

![Conjunctive queries: Where did Canadian citizens with Turing Award Graduate?](https://lh6.googleusercontent.com/QkUBr4kAHYOd4qZjiIUgWL_SmkvxzUfi-t8NWeL4RuteF5C4eNKLB6yAF295HVQLIzu673kM52Ymk3Jo8IQ9IP6z8yycHoWiwI8Pwfj42U5UBZi_eKviXfGESCVPSDHp7d75V-IPtuk)

#### 3.1 Knowledge graphs and conjunctive queries&#x20;

* Conjunctive queries: subclass of first-order logical queries that uses existential and conjunction operations&#x20;
* Dependency graph
  * Nodes: variable or non-variable entities in q&#x20;
  * Edges: relations in q&#x20;
  * In order for the query to be valid, the corresponding dependency graph needs to be a DAG&#x20;
    * Anchor entities: source nodes&#x20;
    * Query target: unique sink node&#x20;
* Computation graph: projection, intersection&#x20;

#### 3.2 Reasoning over sets of entities using box embeddings&#x20;

* Intuitions: decompose complex query into a sequence of logical operations, and then execute these operations in the vector space&#x20;
  * This way we will obtain the embedding of the query
  * Answers to the query will be entities that are enclosed in the final query embedding box&#x20;
* Two methodological advances&#x20;
  * The use of box embeddings to model and reason over set of entities&#x20;
  * How to tractably handle disjunction operators&#x20;
* **Box embeddings**&#x20;
  * p = (Cen(p), Off(p))
    * Cen(p) is the center of the box&#x20;
    * Off(p) is the positive offset of the box, modeling the size of the box&#x20;
  * *Initial boxes for source nodes:* (v, 0), where v is the anchor entity vector and 0 is a d-dim all-zero vector&#x20;
  * *geometric projection* operator
    * relation embedding: r = (Cen(r), Off(r))&#x20;
    * Given input box embedding p, projection is modeled by  p + r: sum the centers and sum the offsets&#x20;
      * translated center, larger offset&#x20;
  * *geometric intersection operator*&#x20;
    * p(inter) = (Cen(p(inter)), Off(p(inter)))
    * Calculated by: performing attention over the box centers, and shrinking the box offset using the sigmoid function&#x20;
  * *Entity-to-box distance*&#x20;
  * Training objective
    * Learn entity embeddings and geometric projection and intersection operators&#x20;

![Entity to box distance ](/files/-MgYLm4cBZOWONmViXXB)

![Entity to box distance (2) ](/files/-MgYLvak7kKBpFieaVgF)

#### 3.3 Tractable handling of disjunction using disjunctive normal form&#x20;

* Can define union, but union operations over boxes is not closed&#x20;
* Key idea: transform a given EPFO query into a Disjunctive Normal Form (DNF) (i.e. disjunction of conjunctive queries, and union operations only appear at the last steps)&#x20;
* **Transformation to DNF**&#x20;
  * See the paper (calculate computation graphs and combine)&#x20;
* **Aggregation**&#x20;
  * Distance between the given EPFO query q and an entity v&#x20;
    * N > 1: minimum distance to the closest box as the distance to an entity&#x20;
* **Computational complexity**&#x20;
  * Complexity of answering an EPFO query = answering the N conjunctive queries&#x20;
    * N is not large, and the computation can be parallelized&#x20;
    * Answering each conjunctive query: execute a sequence of box operations, and then perform a range search&#x20;

### Experiment&#x20;

#### KGs and Query Generation&#x20;

* E.x. FB15k, FB15k-237, NELL995

![Query structures considered in the experiments ](/files/-MgcJR8ekwZqYULcxSoQ)

* 'p': projection&#x20;
* 'i': intersection&#x20;
* 'u': union&#x20;

### Take-away

* Workloads
  * Originally: more like structured embeddings, model individual entities and study pairwise relations&#x20;
  * Here (box embeddings): model sets of entities and reason over those sets &#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/shivarams-group/embeddings/query2box-reasoning-over-knowledge-graphs-in-vector-space-using-box-embeddings.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
