On the Use of ML for Blackbox System Performance Prediction

https://www.usenix.org/conference/nsdi21/presentation/fu

Performance prediction is increasingly important
- Optimization, capacity planning, SLO-aware scheduling
- F(parameter) --> performance
Challenges
- Accurate: precise predictions
- Simple / easy-to-use: in-depth understanding of the systems not required
- General: works across a spectrum of workloads and applications
Can ML provide an accurate, general, and simple performance predictor?
This paper: a systematic and broad study on performance prediction
ML for system perf. prediction?
- Start with the best-case scenario
- The best-case (BC) test
  - Given parameters, learn
- ML assumptions
  - One-feature-at-a-time: e.g., vary P2, keeping P1, P3, ..., Pk fixed
  - Seen-config
- System assumptions
  - No-contention: dedicated EC2 instances, isolated experiments
  - Identical-inputs: same input data for a given input dataset size
Applications and models
- ML models: NN, LR, RF, SVM, NN
Metrics and predictors
- Accuracy metric: rMSRE
- ML predictors --> best-of-model / BoM-err
  - rMSRE of the most accurate model
- Oracle predictor --> O-err
  - Allow Oracle to peek at both the error function and test data
Best case test results
- High oracle error even under our best-case setup!
Methodology
- Root cause for each of the applications
- Fix?
  - With system modifications
    For all applications, oracle error is now well within 10%!
    Best-of-model error likewise
- Trade-off between predictability and other design goals!
- E.g., disabling an optimization can lead to higher prediction accuracy but degraded performance
- These fixes require in-depth understanding of the app. and reasoning about the trade-offs!
Embrace variability: probabilistic predictions
- Idea: predicting a mixture distribution instead of a single value
- Then, use the "modes" of each distribution as the "top-k" prediction value
- ML: mixture density networks and probabilistic random forest
- Significant decrease in BoM-err with top-3 (k=3) predictions!
So far, best-case setup only
- Go beyond?
- Prediction errors can remain high if the underlying performance trend is difficult to learn
Conclusion
- Taken "out of the box", many apps exhibit a surprisingly high degree of irreducible error
- We can significantly improve the accuracy if we accept the loss of simplicity and / or generality
  - Modify applications
  - Modify predictions
  - ... but they don't work in all cases
- Need a more nuanced methodology for applying ML

PreviousOther NextMarauder: Synergized Caching and Prefetching for Low-Risk Mobile App Acceleration

Last updated 3 years ago

Was this helpful?