On the Use of ML for Blackbox System Performance Prediction
https://www.usenix.org/conference/nsdi21/presentation/fu
Performance prediction is increasingly important
Optimization, capacity planning, SLO-aware scheduling
F(parameter) --> performance
Challenges
Accurate: precise predictions
Simple / easy-to-use: in-depth understanding of the systems not required
General: works across a spectrum of workloads and applications
Can ML provide an accurate, general, and simple performance predictor?
This paper: a systematic and broad study on performance prediction
ML for system perf. prediction?
Start with the best-case scenario
The best-case (BC) test
Given parameters, learn
ML assumptions
One-feature-at-a-time: e.g., vary P2, keeping P1, P3, ..., Pk fixed
Seen-config
System assumptions
No-contention: dedicated EC2 instances, isolated experiments
Identical-inputs: same input data for a given input dataset size
Applications and models
ML models: NN, LR, RF, SVM, NN
Metrics and predictors
Accuracy metric: rMSRE
ML predictors --> best-of-model / BoM-err
rMSRE of the most accurate model
Oracle predictor --> O-err
Allow Oracle to peek at both the error function and test data
Best case test results
High oracle error even under our best-case setup!
Methodology
Root cause for each of the applications
Fix?
With system modifications
For all applications, oracle error is now well within 10%!
Best-of-model error likewise
Trade-off between predictability and other design goals!
E.g., disabling an optimization can lead to higher prediction accuracy but degraded performance
These fixes require in-depth understanding of the app. and reasoning about the trade-offs!
Embrace variability: probabilistic predictions
Idea: predicting a mixture distribution instead of a single value
Then, use the "modes" of each distribution as the "top-k" prediction value
ML: mixture density networks and probabilistic random forest
Significant decrease in BoM-err with top-3 (k=3) predictions!
So far, best-case setup only
Go beyond?
Prediction errors can remain high if the underlying performance trend is difficult to learn
Conclusion
Taken "out of the box", many apps exhibit a surprisingly high degree of irreducible error
We can significantly improve the accuracy if we accept the loss of simplicity and / or generality
Modify applications
Modify predictions
... but they don't work in all cases
Need a more nuanced methodology for applying ML
Last updated
Was this helpful?