Workshop

https://www.youtube.com/watch?v=gfGTd7PXK54&list=PLMPUUgLIYH1ZJVEXTTZT82ipTprSu1hn8&index=1

Learning at the Wireless Edge

Vince Poor (Princeton University)
Two aspects
- Using ML to optimize communication networks
- Learning on mobile devices (the focus of today's talk)
Today's talk: focus on federated learning
- Motivation
- Federated learning over wireless channels (scheduling)
- Privacy protection in federated learning (differential privacy)
- Some research issues
ML
- Tremendous progress in recent years (more data, increase in computational power)
- Standard ML: implement in centralized manner (data center, cloud), full access to the data
- SOTA models:
  - Standard software tools, specialized hardware
Wireless edge
- Centralized ML are not suitable for many emerging applications
  - Self-driving cars, first responder networks, healthcare networks
- What makes the application different:
  - Data is born at the edge (phone, IoT devices)
  - Limited capacity uplinks
  - Low latency & high reliability
  - Data privacy / security
  - Scalability & locality
- Motivate moving learning closer to the network edge

Federated learning over wireless channels (scheduling)
- Wireless: communication to the AP needs to go through wireless channels
  - Shared, resource-constrained
    Only limited number of devices can be selected in each update round
    Transmissions are not reliable due to interference
  - Questions
    How should we schedule devices to update the trained weights?
    How does the interference affect the training?
- Scheduling mechanisms
  - Random scheduling: aggregator select N out of K users at random
  - Round Robin: divide into group
  - Proportional Fair: strongest SNRs
Design metric: age of information (AoI)
- Age-based scheduling scheme for federated learning in mobile edge networks
- Optimization algorithm in each iteration round
- Wireless round robin
Privacy in federated learning
- "privacy preserving": data remains on end-user devices
- But end-user data can be inferred from the parameter (or gradient) updates
- Approach: use differential privacy to protect end-user data
  - Refers to a type of privacy in which two datasets, one with private information and one without it, but otherwise identical, cannot be distinguished by a statistical query (with high probability)
- Trade-off between privacy and accuracy
Other issues
- Model efficiency
  - Resources on end-user devices are limited (e.g., energy, storage, computational power)
  - Trade-offs between # of layers, # of neurons per layer, accuracy
- Communication efficiency
- Limited data at the edge
  - Local data is sparse
  - Incorporating domain and physical knowledge
- Security & Privacy
  - Robustness to malicious end-user devices & adversarial training examples
  - Other approaches to end-user privacy

Challenges in ML and the way forward

Ariela Zeira, Intel Labs
Challenges in DL
- Compute efficiency
- Memory overhead
- Data efficiency
- Online learning
- Robustness
- Knowledge Representation
Hyper Dimensional Computing
- New paradigm for energy-efficient, noise-robust and fast alternatives to standard ML

Adventures in Learning-Based Rate Control

Brighten Godfrey, UIUC
TCP protocols: point solutions designed for specific environments, and far from optimal
Why does traditional CC architecture struggle?
- TCP Reno, CUBIC, FAST, Scalable, HTCP
- "Hardwired" control actions
  - Underlying conditions --> ideal control action
  - Not enough information about what's happening in the network
What is the right rate to send?
- Network is a blackbox: but we can send at some rate and see what happens
  - Collect observations: throughput, loss rate, latency. We can summarize that in a utility function
A change in perspective
- Traditional perspective: simple network model + well-crafted rules --> predictable results
- "Black-box" perspective: the world is complex. Quantifying goal and observing effect of actions yields good decisions
- A fit for learning!
  - Diverse, opaque environments
  - Only a trickle of information
  - Infer good action at millisecond timescales
Software components
- Paper: PCC
- Control algorithm: heuristic hill-climbing algorithm
  - Noise in measurement? randomized controlled trials
- Where is the congestion control?
  - Equilibrium depends on utility function
  - Selfish utility-maximizing decision --> non-cooperative game
Promising performance
Upgrade: PCC Vivace (NSDI 2018)
- Leveraging powerful tools from online learning theory
- New utility function framework
  - Latency-awareness
  - Strictly concave --> equilibrium guarantee
  - Weighted fairness among senders
- New control algorithm
  - Gradient-ascent --> convergence speed / stability
  - Deals with measurement noise
- Performance: great improvement in latency & responsiveness, but still suboptimal in extremely dynamic networks (i.e. wireless)
Deep RL on congestion control (ICML 2019)

scale-free values to aid robustness
History length: what lengths work well
Training
- Simulated environment
  - Order of magnitude faster than emulation
  - Each episode chooses link parameters from a range
- Setting appropriate discount factor
  - Maximize expected cumulative discounted return
Future
- Multi-gent scenarios: training, competition
- Online training: challenge is to improve outcomes with limited additional training data
New uses
- Scavenger transport
  - SIGCOMM 2020: Proteus: Scavenger Transport and Beyond
  - Different applications: software updates, CDN warmup, cloud storage replication, online video, real-time streaming, search
    Elastic timing, inelastic timing
  - Scavenger design goals
    Yielding: minimally impact primary flows
    Performance: high utilization, low latency when only scavengers exist
    Flexibility: dynamically switch, avoid separate implementation
  - Utility functions
    Primary, Scavenger, Hybrid
    RTT deviation as a competition indicator
    Definition: standard deviation of observed RTT samples
    Intuition: earlier signal of dynamics of buffer occupancy during flow competition
    Proteus: yields more effectively
    Cross-layer design
    Dynamic threshold based on the application requirement (buffer occupancy)
    Hybrid mode: to get the bandwidth when they need it
    Improving QoE (rebuffer raito)
Rate control robustness (HotNets 2019)
- Keeping up with rapid change
  - Recent acceleration of innovation in rate control
  - Approach: ML as an adversary
    It gets rewarded when it finds environmental parameters that cause this algorithm under test to perform poorly (looking for the hard cases)
    Do this carefully, rewarded for suboptimal performance of the algorithm, and smoothing to make it more useful results to see
    Reward = -1 * protocol score + optimal score - smoothing penalty
    Implement: ABR video, CC
Lessons learned
- What worked
  - Modular architecture
    New control algorithms
    New utility functions open new issues
  - Learning-based control can improve performance over traditional protocols
    Industry implementations
- Open challenges & opportunities
  - Performance: fast decisions v.s careful decisions
    Even one RTT can be a long time - especially in fluctuating wireless environments
  - Understanding protocol robustness
  - Existing opportunities in system design
    Complexity in systems, environments, and application needs lead to opportunity for inference
    Restructure systems for a learning mindset

PreviousSnicket: Query-Driven Distributed Tracing NextHoma: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities

Last updated 3 years ago

Was this helpful?