# Reducto: On-Camera Filtering for Resource Efficient Real-Time Video Analytics

* Video analytics trends&#x20;
  * More cameras and video data&#x20;
  * Greater ability to extract information from video&#x20;
* Video analytics pipelines&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FIEpgWuDRewGLFQRM0eEN%2Fimage.png?alt=media\&token=05368f10-8ddf-4270-a9ed-3c1c9b80d577)
  * Goals
    * Accuracy target (e.g., 90%)
    * Real-time (e.g., 30 fps)&#x20;
* Challenging: resource intensive&#x20;
  * Network bandwidth and compute cost&#x20;
  * 1 video at 1080p: 2 Mbps&#x20;
  * Faster RCNN: 6 sec to process 1 sec of video on $6000 GPU&#x20;
* Frame Filtering&#x20;
  * Remove frames that are not meaningfully contribute to the final result&#x20;
  * Focus, OSDI '18 (approximate model): e.g. Tiny YOLO. Low confidence?&#x20;
  * NoScope, VLDB'17 (binary classifier) --> frame contains car?&#x20;
  * Glimpse, SenSys '15 (Pixel-level differences) --> frame change above threshold?&#x20;
* Key question: filtering benefits increase closer to the video source&#x20;
  * Can we filter frames directly on the camera itself?&#x20;
    * What computational resources are available on existing cameras?
    * How do existing approaches fare? &#x20;
* Camera Market Study&#x20;
  * Existing deployments&#x20;
    * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FR8NIn0HpOjv6Tqx5EykT%2Fimage.png?alt=media\&token=9aca5913-b7f0-48af-a843-dee6902414ff)
  * Existing Filtering Approaches&#x20;
    * Approximate models: too slow on camera (Tiny YOLO: 0.6 fps)&#x20;
    * Binary classification: miss 45% of filtering opportunities&#x20;
    * Accuracy of pixel-level differences&#x20;
      * Acc target: 90%, query: counting --> hard to meet accuracy targets with static thresholds&#x20;
      * Using frame differencing effectively&#x20;
        * Dynamic threshold to deal with rapid changes&#x20;
        * Expand beyond pixel comparison&#x20;
          * Low-level differences that are more effective: i.e. area of motion. Pixel does not show that contrast.
          * Different features&#x20;
* Reducto Overview&#x20;
  * Questions&#x20;
    * \#1: which filtering threshold to use?
    * \#2: which differencing feature to use?&#x20;
  * Main idea: wimpy cameras can use cheap differencing techniques to filter frames effectively with guidance from a server&#x20;
  * \#1: threshold&#x20;
    * Building table is expensive --> run on server&#x20;
    * Looking up table is cheap --> run on camera &#x20;
  * \#2: differencing feature&#x20;
    * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2F0GLCW6rI5hNYtl6dgmxD%2Fimage.png?alt=media\&token=0744dfa7-2756-4735-8598-a2e0b3555874)
    * Calculating best feature is expensive --> run on server&#x20;
    * Best feature changes between query types but not between videos&#x20;
      * One type per query type&#x20;
    * Q: how to select the features? now the experimental result only shows 3&#x20;
    * Q: what is the intuition that it does not change across videos?&#x20;
    * Q: what are the query types?&#x20;
* Putting it together&#x20;
  * New content? send unfiltered frames for a short period (100% accuracy, but not doing any filtering)&#x20;
  * Evaluation: three types of queries (detection, counting, tagging), 8 traffic videos, DNN on server (YOLOv3), Camera (Raspberry Pi Zero or VM with matching resources)&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FUfXqHL3SerUp2k5hs8qi%2Fimage.png?alt=media\&token=6ca2952e-3717-40b8-a596-e5dd21945bae)
    * \#1 experiment&#x20;
      * Offline optimal: the max number of frames are eliminated while meeting the accuracy target&#x20;
      * Reducto filters 36-51% of frames while meeting accuracy target&#x20;
    * \#2&#x20;
      * Extract frame features 99.7 fps&#x20;
      * Calculate frame difference 129.5 fps&#x20;
      * Hash table lookups 318.6 fps&#x20;
    * \#3&#x20;
      * Network: reducto saves average of 22% bandwidth&#x20;
      * Compute: reducto doubles backend processing speed&#x20;
      * End-to-end latency: reduces median response time by 22-26% (within 13% of offline optimal)&#x20;
  * Conclusion
    * Reducto: on-camera filtering using cheap frame differences&#x20;
      * Camera's filtering decisions are guided by server&#x20;
      * Filters over 50% and lowers latency by 22% while consistently meeting accuracy target&#x20;
