Enabling Hyperscale Web Services

http://today.wisc.edu/events/view/157998

  • Web services will not grow

  • Radical shift in hyperscale computing

    • Urgent need: redesign computer systems to enable futuristic web services

    • Now: target individual stack layers

    • Software stacks

      • Web service application --> fragmented into small units

      • OS & Software Stacks

    • Hardware

      • Custom hardware (E.x. GPU)

      • Smart NIC

    • Disconnected Research

  • Bridge the software and hardware work to enable future web services

    • How to redesign SW to be aware of the HW constraints?

    • How to re-architect HW for new S paradigms post-Moore?

Research Approach

  • Characterization: understand service behavior, overheads, emerging trends

  • Novel solutions: self-navigate design space based on characterization insights

  • Real systems

HW-aware SW threading

  • OSDI '18

  • User request

    • Front-end microservice

    • Hashing microservice

    • Ads microservice

    • Caching microservice

    • Ranking microservice

    • End-to-end response latency

      • Interacting in an extremely complicated way

      • Microsecond-scale system overheads significantly affect microservices

  • Microsecond overheads: accessing OS/NW

    • Urgent need to understand SW threading interactions with OS/NW

  • Software Threading Dimensions -- taxonomy of threading models

    • Block v.s. Poll

    • In-line v.s. dispatch

    • Synchronous v.s. asynchronous

  • Latency tradeoffs across threading models

    • Poll better than block

    • In-line poll faces contention; dispatch poll with one poller is best

      • Support increase in load

    • Dispatch block is best at high load as it does not waste CPU cycles

    • No single threading model works best at all loads!

  • Automatic Load Adaption

    • Exploit trade-offs among threading models at run-time

  • micro-Tune [OSDI '18]

    • Design challenges: synchronization, when to switch, interact with NW, thread hops, scale thread pools?

      • Abstracts threading design from service code to seamlessly manage threading

Micro-tune System Design

  • Piecewise linear model

    • Input?

      • Load, and other features

      • Heuristics: queuing patterns and other

    • Ecosystem changes?

      • Will need to retrain

Hardware

  • Diverse accelerators will break the bank

    • Customized platforms are expensive

    • Hardware homogeneity

    • Avoids testing overhead

  • Facebook

    • Web

    • Feed1, Feed2

    • Ads1, Ads2

    • Cache1, Cache 2 (similar to key-value store)

  • Question

    • HW overheads?

    • SW overheads are worth building HW for?

  • Observation (HW)

    • Great diversity in overheads across microservices

    • New surprisingly dominant HW overheads

      • Code footprints (in code cache misses, TLB misses)

  • Observation (SW)

    • Orchestration: I/O processing, compression, encryption

    • Orchestration overheads are significant & common across microservices

How to mitigate microservice overheads?

  • Great diversity in HW overheads across microservices

  • Microservice orchestration logic is a significant, common SW overhead

Approach 1: Make better use of existing hardware

Soft SKUs [ISCA' 19]

  • Tune coarse HW & OS knobs on commodity HW

  • SoftSKUs achieve performance efficiency on cheap commodity HW

    • Serves 2.7B users

    • Saved cost

    • Reduced footprint

Approach 2: Design and deploy new hardware architectures - accelerators

  • Accelerating the "encryption" orchestration logic

    • HW vender only improves the accelerators, ignoring the end-to-end overhead

      • SW interaction overhead

      • ? Question

        • Accelerator (PCIE)

          • Offload, and context switches overhead

  • HW acceleration

    • Design, Test, Deploy

    • But performance is .. due to perf. bounds from software interactions with hardware

  • Analytical Model for HW Acceleration

    • Simplicity

    • Accelerometer: Analytical Model [ASPLOS' 20]

      • HW accelerator

      • SW threading design

      • Metrics

      • Offload transfer via interface

    • E.x. synchronous offload

      • Queuing delay, offload transfer latency...

    • E.x. asynchronous

      • Accelerator cycles do not critically affect speedup

      • But context switch penalty

    • Other

      • Waits for offload ack?

      • Offload size?

      • Offload preparation time?

      • Many threads? Context switches?

  • Accelerometer: estimates

    • TPU, GPU?

      • Type of accelerometer we can build

      • Opportunities

        • Stage of the HW pipeline

          • Type of the accelerator

          • Richer model to consider paradigm

Data movement

  • Rethink I/O interactions in life of emerging HW and SW paradigms

    • I/O path efficient event notification mitigating microsecond stalls

  • Rethink SW for new HW emerging technologies: non-volatile memory, accelerators

  • New app paradigms & domains

    • Serverless: locality issues

    • AR/VR, IoT: data mgmt. issues

  • ML for SW-HW design, design space exploration resource management

    • scheduling, mgmt

  • Intersectionality, equity, fairness: first-order HW-SW design metrics

    • discriminate based on demographics, or ethic

Verify implementation hard?

  • slice and dice in way that still makes sense

Last updated