Reducto: On-Camera Filtering for Resource Efficient Real-Time Video Analytics
https://dl.acm.org/doi/10.1145/3387514.3405874
Last updated
Was this helpful?
https://dl.acm.org/doi/10.1145/3387514.3405874
Last updated
Was this helpful?
Video analytics trends
More cameras and video data
Greater ability to extract information from video
Video analytics pipelines
Goals
Accuracy target (e.g., 90%)
Real-time (e.g., 30 fps)
Challenging: resource intensive
Network bandwidth and compute cost
1 video at 1080p: 2 Mbps
Faster RCNN: 6 sec to process 1 sec of video on $6000 GPU
Frame Filtering
Remove frames that are not meaningfully contribute to the final result
Focus, OSDI '18 (approximate model): e.g. Tiny YOLO. Low confidence?
NoScope, VLDB'17 (binary classifier) --> frame contains car?
Glimpse, SenSys '15 (Pixel-level differences) --> frame change above threshold?
Key question: filtering benefits increase closer to the video source
Can we filter frames directly on the camera itself?
What computational resources are available on existing cameras?
How do existing approaches fare?
Camera Market Study
Existing deployments
Existing Filtering Approaches
Approximate models: too slow on camera (Tiny YOLO: 0.6 fps)
Binary classification: miss 45% of filtering opportunities
Accuracy of pixel-level differences
Acc target: 90%, query: counting --> hard to meet accuracy targets with static thresholds
Using frame differencing effectively
Dynamic threshold to deal with rapid changes
Expand beyond pixel comparison
Low-level differences that are more effective: i.e. area of motion. Pixel does not show that contrast.
Different features
Reducto Overview
Questions
#1: which filtering threshold to use?
#2: which differencing feature to use?
Main idea: wimpy cameras can use cheap differencing techniques to filter frames effectively with guidance from a server
#1: threshold
Building table is expensive --> run on server
Looking up table is cheap --> run on camera
#2: differencing feature
Calculating best feature is expensive --> run on server
Best feature changes between query types but not between videos
One type per query type
Q: how to select the features? now the experimental result only shows 3
Q: what is the intuition that it does not change across videos?
Q: what are the query types?
Putting it together
New content? send unfiltered frames for a short period (100% accuracy, but not doing any filtering)
Evaluation: three types of queries (detection, counting, tagging), 8 traffic videos, DNN on server (YOLOv3), Camera (Raspberry Pi Zero or VM with matching resources)
#1 experiment
Offline optimal: the max number of frames are eliminated while meeting the accuracy target
Reducto filters 36-51% of frames while meeting accuracy target
#2
Extract frame features 99.7 fps
Calculate frame difference 129.5 fps
Hash table lookups 318.6 fps
#3
Network: reducto saves average of 22% bandwidth
Compute: reducto doubles backend processing speed
End-to-end latency: reduces median response time by 22-26% (within 13% of offline optimal)
Conclusion
Reducto: on-camera filtering using cheap frame differences
Camera's filtering decisions are guided by server
Filters over 50% and lowers latency by 22% while consistently meeting accuracy target