Support sparse computations in ML

Saman Amarasinghe (MIT)

History of Computing

  • The Gilded Era of Computing

    • Driven by Moore's Law

    • Performance was free

    • Software Engineering Ruled

      • High level languages, abstraction, layering

      • Innovations, complex software systems, coupled with excess bloat

  • The Trustfund Era of Computing

    • Moore's law is dead

    • How bad is the bloat?

  • The Resurgent Era of Computing

    • Why is sparsity in machine learning

      • Existing ML problems are sparse

        • Replacing dense solvers with sparse can have a huge benefit

      • Next-generation ML problems can be sparse

        • GNNs

      • Training ML models

        • RW

Today

  • The world is built for dense

Hardware utilization

Dense

  • Peak Performance (GEMM)

    • 70-80% of CPU

    • 80-90% GPU

  • Optimization

    • Prefetching, Branch, Predictions, TLB, cache

Sparse

  • Peak performance

    • <10%

Programming System

Dense

  • Abstractions that work across different algorithms (dense linear algebra, image processing, deep learning, ...)

    • BLAS, Halide, TensorFlow

    • Optimizing Compilers

Sparse

  • What abstractions?

Tensor

  • Too many tensor kernels for a fixed-function library

  • Many non-binary expressions must be computed in a single kernel

    • Sampled Dense-Dense Matrix Multiplication (SDDMM)

The Tensor Algebra Compiler (Taco)

  • Challenges

    • Irregular data structures

      • Hierarchical Storage Abstractions

    • Sparse iteration space with limited O(1) access

      • sparse Iteration graph

      • Code generation from iteration graph

    • Avoid wasted work and iterations

      • Coiteration code generation

    • Optimize Parallelism and Locality

      • Scheduling language

Last updated