> For the complete documentation index, see [llms.txt](https://sliu583.gitbook.io/blog/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://sliu583.gitbook.io/blog/specific-work/shivarams-group/group-papers/accelerating-graph-sampling-for-graph-machine-learning-using-gpus.md).

# Accelerating Graph Sampling for Graph Machine Learning using GPUs

#### Requirement for GPU performance&#x20;

* *thread*: fundamental unit of computation in a GPU&#x20;
* *thread block*: threads are statically grouped into thread blocks and assigned a unique id within a block&#x20;
* *streaming multiprocessors* (SMs): each of which executes one or more thread blocks&#x20;
* Types of memory&#x20;
  * *shared memory*: each SM's private memory, which is only available to the thread blocks assigned to that SM&#x20;
  * *global memory*: the GPU has global memory, which is accessible to all SMs&#x20;
  * Accessed latency of global memory >> shared memory
* To run a thread block, an SM schedules a subset of threads from the thread block, known as *warp*&#x20;
  * Warp typically consists of 32 threads with consecutive thread IDs&#x20;
  * GPU employs: Single Instruction Multiple Threads (SIMT) execution model&#x20;
    * All threads in a warp runs the same instruction in lock-step&#x20;
    * Consequence&#x20;
      * Two threads cannot execute two sides of the branch concurrently&#x20;
      * Warp divergence: when the threads in a warp encounter a branch, the subset of threads that do not take the branch must wait for other threads to complete the branch&#x20;
    * Goal: **minimize warp divergence** &#x20;
* Another goal: **balance resource usage across thread blocks**&#x20;
* the GPU can provide high-bandwidth access to global memory by coalescing several memory accesses from the same warp
  * only possible when concurrent memory accesses from threads in the same warp access consecutive memory segments.&#x20;

#### Presentation&#x20;

{% embed url="<https://www.youtube.com/watch?v=GsffY0j6tVE>" %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/shivarams-group/group-papers/accelerating-graph-sampling-for-graph-machine-learning-using-gpus.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
