# MOSAIC: Processing a Trillion-Edge Graph on a Single Machine

* Limit by disk size not memory&#x20;

* Out-of-core&#x20;

* Trillion-edge graph (8TB overall)&#x20;

* Large-scale graph processing is ubiquitous&#x20;
  * Social network, genome analysis, graphs enable machine learning&#x20;

* Powerful, heterogeneous machines&#x20;
  * Terabytes of RAM on multiple sockets&#x20;
  * Powerful many-core coprocessors (Xeon Phi processor)&#x20;
    * Specific? Or applicable to GPU as well?&#x20;
  * Fast, large-capacity non-volatile memory&#x20;
  * Goal: take advantages of heterogeneous machine to process tera-scale graphs&#x20;

### Graph Processing: Sample Application&#x20;

* Community Detection
* Find Common Friends
* Find Shortest Paths
* Estimate Impact of Vertices (webpages, users, ...)&#x20;
* ...

### Design&#x20;

* Design space&#x20;
  * Single Machine&#x20;
    * Out-of-core: cheap, but potentially slow&#x20;
    * In memory: fast, but limited graph size&#x20;
  * Cluster&#x20;
    * Out-of-core: large graphs, but expensive & slow &#x20;
    * In memory: large graphs, fast, but expensive&#x20;
* Single machine, out-of-core: cost-effective
  * 10? higher memory more expensive? What is the ratio? Any evaluation?&#x20;
  * Validate this&#x20;
* Goal: good performance and large graphs&#x20;
* Goal: Run algorithms on large graphs on a single machine using coprocessors&#x20;
  * Enabled by&#x20;
    * Common, familiar API (vertex / edge-centric)
    * Encoding: Lossless compression&#x20;
    * Cache locality&#x20;
    * Processing on isolated subgraphs&#x20;
      * Divided into small subgraphs&#x20;
      * Do a massively parallel using cores and hyper-threads&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-MbqyuHmMrVWhtIP6ehU%2Fimage.png?alt=media\&token=3407d7f9-7f8a-421d-bc26-ff4dd0e7fe8f)

* Xeon: host CPU&#x20;
  * 8 GB of high bandwidth memory&#x20;
* Access pattern&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-Mbr-9DjG4CRZXdvMzxz%2Fimage.png?alt=media\&token=714979c8-8acf-4da0-828d-cc274b43953d)

* Subgraphs: tiles&#x20;
  * At most 2^16 vertices&#x20;
  * Local identifier: 2 bytes + level of indirection&#x20;
* Cache locality&#x20;
  * Inside subgraphs: sort by access order&#x20;
  * Between subgraphs: overlap vertex sets&#x20;
* Produce the tiles&#x20;

### Evaluation&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-Mbr2GhBNsEragBDKw4I%2Fimage.png?alt=media\&token=289c8561-6595-4cbf-a9f1-3787e51c79de)

* Re-run the preprocessing step&#x20;
* Generate incrementally?&#x20;
  * No

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-Mbr3174Riup_Ew-CAzg%2Fimage.png?alt=media\&token=c61ccaac-8e69-411a-81f3-93c550ec918f)

* Not supported multiple combines&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-Mbr3l1TFKg-1aChlhcT%2Fimage.png?alt=media\&token=3f90a57e-b228-47f9-81b0-f5613e5d3557)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-Mbr4mAMHsY4MB1n7ivD%2Fimage.png?alt=media\&token=f0a946f3-bf84-4c2c-bac5-9783fd57a571)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-Mbr4xRGxGUncJZSOjKW%2Fimage.png?alt=media\&token=43252a97-e663-49c0-945a-79d97ca7d7bf)

* Execution model (when they place the tiles, inter-core communication? Or independent?)&#x20;

### Conclusion&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-Mbqrh1miNj-DsddWzCi%2F-Mbr5jkhx97XmGOha3yF%2Fimage.png?alt=media\&token=c0b47712-84fc-4ad6-a870-af85878a65a5)

* Same gain to put in the multi-core CPUs?&#x20;
  * Use the principle&#x20;
  * Spread computation&#x20;
  * Hybrid systems?&#x20;
    * Taking benefits on tiling&#x20;
    * Along with offloading to GPUs&#x20;

Think about:

* Workloads, technologies, trends. Design something that benefits the problem they have.&#x20;
