Are Machine Learning Cloud APIs Used Correctly?

http://people.cs.uchicago.edu/~cwan/paper/ml_api.pdf

Presentation

  • Machine learning provides effective solutions

  • Software development: problems --> bugs

  • ML cloud API

    • Function as a service

    • Help incorporating learning solutions into software systems

      • Require less domain knowledge

      • No need to design and train neural networks

  • ML APIs raise unique challenges

    • Performing cognitive tasks: how people ask questions greatly affect the result

    • Largely defined by training data: properties might not be known by API users

    • Numeric vector output: high-dim, tricky to interpret

    • Complicated accuracy - performance tradeoffs

  • Corpus

    • Google / Amazon ML cloud API

    • 3 ML domains: vision, language, speech

    • 18 months, size of 2,200 lines

  • Anti-pattern identification methodology

    • Manual examine

    • Design test cases

    • Report bugs

  • Result

    • Most applications: misuses!

    • Pattern

      • Calling the wrong API

        • Subtle semantics difference among cognitive tasks

        • e.g. image classification, object detection. Which one to use?

        • e.g. text-detection, document-text-detection

        • Escape the traditional testing

      • Misinterpreting outputs

        • Numeric vector outputs are difficult to interpret

      • Misuse of async APIs

        • complicated accuracy-performance tradeoffs

      • Necessarily high-resolution inputs

        • higher resolution - performance degrades

      • Many other misuses --> what types of impact (reduce functionality, degraded performance, increased cost)

  • Design checkers

    • Three static analysis tools for three misuses

    • API wrappers for four misuses

Last updated