# Accelerating Graph Sampling for Graph Machine Learning using GPUs

#### Requirement for GPU performance&#x20;

* *thread*: fundamental unit of computation in a GPU&#x20;
* *thread block*: threads are statically grouped into thread blocks and assigned a unique id within a block&#x20;
* *streaming multiprocessors* (SMs): each of which executes one or more thread blocks&#x20;
* Types of memory&#x20;
  * *shared memory*: each SM's private memory, which is only available to the thread blocks assigned to that SM&#x20;
  * *global memory*: the GPU has global memory, which is accessible to all SMs&#x20;
  * Accessed latency of global memory >> shared memory
* To run a thread block, an SM schedules a subset of threads from the thread block, known as *warp*&#x20;
  * Warp typically consists of 32 threads with consecutive thread IDs&#x20;
  * GPU employs: Single Instruction Multiple Threads (SIMT) execution model&#x20;
    * All threads in a warp runs the same instruction in lock-step&#x20;
    * Consequence&#x20;
      * Two threads cannot execute two sides of the branch concurrently&#x20;
      * Warp divergence: when the threads in a warp encounter a branch, the subset of threads that do not take the branch must wait for other threads to complete the branch&#x20;
    * Goal: **minimize warp divergence** &#x20;
* Another goal: **balance resource usage across thread blocks**&#x20;
* the GPU can provide high-bandwidth access to global memory by coalescing several memory accesses from the same warp
  * only possible when concurrent memory accesses from threads in the same warp access consecutive memory segments.&#x20;

#### Presentation&#x20;

{% embed url="<https://www.youtube.com/watch?v=GsffY0j6tVE>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/shivarams-group/group-papers/accelerating-graph-sampling-for-graph-machine-learning-using-gpus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
