Reducto: On-Camera Filtering for Resource Efficient Real-Time Video Analytics

https://dl.acm.org/doi/10.1145/3387514.3405874

  • Video analytics trends

    • More cameras and video data

    • Greater ability to extract information from video

  • Video analytics pipelines

    • Goals

      • Accuracy target (e.g., 90%)

      • Real-time (e.g., 30 fps)

  • Challenging: resource intensive

    • Network bandwidth and compute cost

    • 1 video at 1080p: 2 Mbps

    • Faster RCNN: 6 sec to process 1 sec of video on $6000 GPU

  • Frame Filtering

    • Remove frames that are not meaningfully contribute to the final result

    • Focus, OSDI '18 (approximate model): e.g. Tiny YOLO. Low confidence?

    • NoScope, VLDB'17 (binary classifier) --> frame contains car?

    • Glimpse, SenSys '15 (Pixel-level differences) --> frame change above threshold?

  • Key question: filtering benefits increase closer to the video source

    • Can we filter frames directly on the camera itself?

      • What computational resources are available on existing cameras?

      • How do existing approaches fare?

  • Camera Market Study

    • Existing deployments

    • Existing Filtering Approaches

      • Approximate models: too slow on camera (Tiny YOLO: 0.6 fps)

      • Binary classification: miss 45% of filtering opportunities

      • Accuracy of pixel-level differences

        • Acc target: 90%, query: counting --> hard to meet accuracy targets with static thresholds

        • Using frame differencing effectively

          • Dynamic threshold to deal with rapid changes

          • Expand beyond pixel comparison

            • Low-level differences that are more effective: i.e. area of motion. Pixel does not show that contrast.

            • Different features

  • Reducto Overview

    • Questions

      • #1: which filtering threshold to use?

      • #2: which differencing feature to use?

    • Main idea: wimpy cameras can use cheap differencing techniques to filter frames effectively with guidance from a server

    • #1: threshold

      • Building table is expensive --> run on server

      • Looking up table is cheap --> run on camera

    • #2: differencing feature

      • Calculating best feature is expensive --> run on server

      • Best feature changes between query types but not between videos

        • One type per query type

      • Q: how to select the features? now the experimental result only shows 3

      • Q: what is the intuition that it does not change across videos?

      • Q: what are the query types?

  • Putting it together

    • New content? send unfiltered frames for a short period (100% accuracy, but not doing any filtering)

    • Evaluation: three types of queries (detection, counting, tagging), 8 traffic videos, DNN on server (YOLOv3), Camera (Raspberry Pi Zero or VM with matching resources)

      • #1 experiment

        • Offline optimal: the max number of frames are eliminated while meeting the accuracy target

        • Reducto filters 36-51% of frames while meeting accuracy target

      • #2

        • Extract frame features 99.7 fps

        • Calculate frame difference 129.5 fps

        • Hash table lookups 318.6 fps

      • #3

        • Network: reducto saves average of 22% bandwidth

        • Compute: reducto doubles backend processing speed

        • End-to-end latency: reduces median response time by 22-26% (within 13% of offline optimal)

    • Conclusion

      • Reducto: on-camera filtering using cheap frame differences

        • Camera's filtering decisions are guided by server

        • Filters over 50% and lowers latency by 22% while consistently meeting accuracy target

Last updated