Fluid: Resource-aware Hyperparameter Tuning Engine
https://proceedings.mlsys.org/paper/2021/file/9f61408e3afb633e50cdf1b20de6f466-Paper.pdf



Successive Halving
Workers underutilized, only one job to run

More efficient, but there're also some problems


Goal: Utilized & Useful work?
Resource-aware hyperparameter tuning
Previous work: Hypersched (resource management), but more specialized on the algorithm itself

Intra-GPU sharing (pack jobs in single GPU)
Current schedule: FIFO queue to manage to jobs
Design and Algorithms

Intra: packing several jobs on single GPU



Very helpful when people read the paper (very helpful in communicating the ideas)
Overhead of packing?

Packing overhead of doing the placement
Limit the number of packing trails


Model fits on the single worker

Makespan of all the trials

Improvement more prominent when applying for asynchronous version
Intuition:
If the runtime is very scaled
Parameters they are tuning
Learning rate, dropout rate
Number of layers
Batch size
Failures
Know reasonable ranges

Multiple hyperparameter jobs
Multiple trial groups
But extra parallelism there?
Different things in the same trial group
Variability
Some
More?
Space sharing & Parallelism
Automatically parallelism, don't need anything from hyperparameter
Queue problem, but in tight space
Last updated
Was this helpful?