Reducto: On-Camera Filtering for Resource Efficient Real-Time Video Analytics

https://dl.acm.org/doi/10.1145/3387514.3405874

Video analytics trends
- More cameras and video data
- Greater ability to extract information from video
Video analytics pipelines
- Goals
  - Accuracy target (e.g., 90%)
  - Real-time (e.g., 30 fps)
Challenging: resource intensive
- Network bandwidth and compute cost
- 1 video at 1080p: 2 Mbps
- Faster RCNN: 6 sec to process 1 sec of video on $6000 GPU
Frame Filtering
- Remove frames that are not meaningfully contribute to the final result
- Focus, OSDI '18 (approximate model): e.g. Tiny YOLO. Low confidence?
- NoScope, VLDB'17 (binary classifier) --> frame contains car?
- Glimpse, SenSys '15 (Pixel-level differences) --> frame change above threshold?
Key question: filtering benefits increase closer to the video source
- Can we filter frames directly on the camera itself?
  - What computational resources are available on existing cameras?
  - How do existing approaches fare?
Camera Market Study
- Existing deployments
- Existing Filtering Approaches
  - Approximate models: too slow on camera (Tiny YOLO: 0.6 fps)
  - Binary classification: miss 45% of filtering opportunities
  - Accuracy of pixel-level differences
    Acc target: 90%, query: counting --> hard to meet accuracy targets with static thresholds
    Using frame differencing effectively
    Dynamic threshold to deal with rapid changes
    Expand beyond pixel comparison
    Low-level differences that are more effective: i.e. area of motion. Pixel does not show that contrast.
    Different features
Reducto Overview
- Questions
  - #1: which filtering threshold to use?
  - #2: which differencing feature to use?
- Main idea: wimpy cameras can use cheap differencing techniques to filter frames effectively with guidance from a server
- #1: threshold
  - Building table is expensive --> run on server
  - Looking up table is cheap --> run on camera
- #2: differencing feature
  - Calculating best feature is expensive --> run on server
  - Best feature changes between query types but not between videos
    One type per query type
  - Q: how to select the features? now the experimental result only shows 3
  - Q: what is the intuition that it does not change across videos?
  - Q: what are the query types?
Putting it together
- New content? send unfiltered frames for a short period (100% accuracy, but not doing any filtering)
- Evaluation: three types of queries (detection, counting, tagging), 8 traffic videos, DNN on server (YOLOv3), Camera (Raspberry Pi Zero or VM with matching resources)
- - #1 experiment
    Offline optimal: the max number of frames are eliminated while meeting the accuracy target
    Reducto filters 36-51% of frames while meeting accuracy target
  - #2
    Extract frame features 99.7 fps
    Calculate frame difference 129.5 fps
    Hash table lookups 318.6 fps
  - #3
    Network: reducto saves average of 22% bandwidth
    Compute: reducto doubles backend processing speed
    End-to-end latency: reduces median response time by 22-26% (within 13% of offline optimal)
- Conclusion
  - Reducto: on-camera filtering using cheap frame differences
    Camera's filtering decisions are guided by server
    Filters over 50% and lowers latency by 22% while consistently meeting accuracy target

PreviousOverview NextLearning in situ: a randomized experiment in video streaming

Last updated 3 years ago

Was this helpful?