Fluid: Resource-aware Hyperparameter Tuning Engine
https://proceedings.mlsys.org/paper/2021/file/9f61408e3afb633e50cdf1b20de6f466-Paper.pdf
Last updated
Was this helpful?
https://proceedings.mlsys.org/paper/2021/file/9f61408e3afb633e50cdf1b20de6f466-Paper.pdf
Last updated
Was this helpful?
Successive Halving
Workers underutilized, only one job to run
More efficient, but there're also some problems
Goal: Utilized & Useful work?
Resource-aware hyperparameter tuning
Previous work: Hypersched (resource management), but more specialized on the algorithm itself
Intra-GPU sharing (pack jobs in single GPU)
Current schedule: FIFO queue to manage to jobs
Intra: packing several jobs on single GPU
Very helpful when people read the paper (very helpful in communicating the ideas)
Overhead of packing?
Packing overhead of doing the placement
Limit the number of packing trails
Model fits on the single worker
Makespan of all the trials
Improvement more prominent when applying for asynchronous version
Intuition:
If the runtime is very scaled
Parameters they are tuning
Learning rate, dropout rate
Number of layers
Batch size
Failures
Know reasonable ranges
Multiple hyperparameter jobs
Multiple trial groups
But extra parallelism there?
Different things in the same trial group
Variability
Some
More?
Space sharing & Parallelism
Automatically parallelism, don't need anything from hyperparameter
Queue problem, but in tight space