Enabling Hyperscale Web Services

http://today.wisc.edu/events/view/157998

Web services will not grow
Radical shift in hyperscale computing
- Urgent need: redesign computer systems to enable futuristic web services
- Now: target individual stack layers
- Software stacks
  - Web service application --> fragmented into small units
  - OS & Software Stacks
- Hardware
  - Custom hardware (E.x. GPU)
  - Smart NIC
- Disconnected Research
Bridge the software and hardware work to enable future web services
- How to redesign SW to be aware of the HW constraints?
- How to re-architect HW for new S paradigms post-Moore?

OSDI '18
User request
- Front-end microservice
- Hashing microservice
- Ads microservice
- Caching microservice
- Ranking microservice
- End-to-end response latency
  - Interacting in an extremely complicated way
  - Microsecond-scale system overheads significantly affect microservices
Microsecond overheads: accessing OS/NW
- Urgent need to understand SW threading interactions with OS/NW
Software Threading Dimensions -- taxonomy of threading models
- Block v.s. Poll
- In-line v.s. dispatch
- Synchronous v.s. asynchronous
Latency tradeoffs across threading models
- Poll better than block
- In-line poll faces contention; dispatch poll with one poller is best
  - Support increase in load
- Dispatch block is best at high load as it does not waste CPU cycles
- No single threading model works best at all loads!
Automatic Load Adaption
- Exploit trade-offs among threading models at run-time
micro-Tune [OSDI '18]
- Design challenges: synchronization, when to switch, interact with NW, thread hops, scale thread pools?
  - Abstracts threading design from service code to seamlessly manage threading

Diverse accelerators will break the bank
- Customized platforms are expensive
- Hardware homogeneity
- Avoids testing overhead
Facebook
- Web
- Feed1, Feed2
- Ads1, Ads2
- Cache1, Cache 2 (similar to key-value store)
Question
- HW overheads?
- SW overheads are worth building HW for?
Observation (HW)
- Great diversity in overheads across microservices
- New surprisingly dominant HW overheads
  - Code footprints (in code cache misses, TLB misses)
Observation (SW)
- Orchestration: I/O processing, compression, encryption
- Orchestration overheads are significant & common across microservices

Tune coarse HW & OS knobs on commodity HW
SoftSKUs achieve performance efficiency on cheap commodity HW
- Serves 2.7B users
- Saved cost
- Reduced footprint

Accelerating the "encryption" orchestration logic
- HW vender only improves the accelerators, ignoring the end-to-end overhead
  - SW interaction overhead
  - ? Question
    Accelerator (PCIE)
    Offload, and context switches overhead
HW acceleration
- Design, Test, Deploy
- But performance is .. due to perf. bounds from software interactions with hardware
Analytical Model for HW Acceleration
- Simplicity
- Accelerometer: Analytical Model [ASPLOS' 20]
  - HW accelerator
  - SW threading design
  - Metrics
  - Offload transfer via interface
- E.x. synchronous offload
  - Queuing delay, offload transfer latency...
- E.x. asynchronous
  - Accelerator cycles do not critically affect speedup
  - But context switch penalty
- Other
  - Waits for offload ack?
  - Offload size?
  - Offload preparation time?
  - Many threads? Context switches?
Accelerometer: estimates
- TPU, GPU?
  - Type of accelerometer we can build
  - Opportunities
    Stage of the HW pipeline
    Type of the accelerator
    Richer model to consider paradigm

Rethink I/O interactions in life of emerging HW and SW paradigms
- I/O path efficient event notification mitigating microsecond stalls
Rethink SW for new HW emerging technologies: non-volatile memory, accelerators
New app paradigms & domains
- Serverless: locality issues
- AR/VR, IoT: data mgmt. issues
ML for SW-HW design, design space exploration resource management
- scheduling, mgmt
Intersectionality, equity, fairness: first-order HW-SW design metrics
- discriminate based on demographics, or ethic

Verify implementation hard?

Last updated 4 years ago

Was this helpful?