Fluid: Resource-aware Hyperparameter Tuning Engine

https://proceedings.mlsys.org/paper/2021/file/9f61408e3afb633e50cdf1b20de6f466-Paper.pdf

  • Successive Halving

    • Workers underutilized, only one job to run

  • More efficient, but there're also some problems

  • Goal: Utilized & Useful work?

  • Resource-aware hyperparameter tuning

    • Previous work: Hypersched (resource management), but more specialized on the algorithm itself

  • Intra-GPU sharing (pack jobs in single GPU)

    • Current schedule: FIFO queue to manage to jobs

Design and Algorithms

  • Intra: packing several jobs on single GPU

  • Very helpful when people read the paper (very helpful in communicating the ideas)

  • Overhead of packing?

  • Packing overhead of doing the placement

    • Limit the number of packing trails

  • Model fits on the single worker

  • Makespan of all the trials

  • Improvement more prominent when applying for asynchronous version

  • Intuition:

    • If the runtime is very scaled

  • Parameters they are tuning

    • Learning rate, dropout rate

    • Number of layers

    • Batch size

  • Failures

    • Know reasonable ranges

  • Multiple hyperparameter jobs

    • Multiple trial groups

    • But extra parallelism there?

    • Different things in the same trial group

  • Variability

    • Some

    • More?

  • Space sharing & Parallelism

    • Automatically parallelism, don't need anything from hyperparameter

    • Queue problem, but in tight space

Last updated