# Efficient and Scalable Thread-Safety Violation Detection

### Presentation&#x20;

* Concurrent programming are everywhere --> bugs&#x20;
* Thread-safety violation (TSV)&#x20;
  * Thread-safety contract&#x20;
  * Thread-safety violation (TSV) occurs if two threads concurrently work on two conflicting methods on the same object at the same time&#x20;
    * Outcome: crashes, etc.&#x20;
* Reasoning about TSVs is difficult&#x20;
  * Too many possibilities of concurrency&#x20;
    * Multithreading, event-driven, message passing&#x20;
  * Too many forms of synchronizations&#x20;
* Data-race analysis on small scale, it's fine&#x20;
  * Program source code --> integration --> static data-race analysis --> pruning&#x20;
  * Or dynamic data-race analysis&#x20;
    * Run the program with test inputs and program binary (integration)
    * Dynamic data-race analysis will slow down by 10x or more&#x20;
* Large-scale: existing techniques fail&#x20;
  * CloudBuild: 100M tests from 4K teams, up to 10K machines / day&#x20;
  * Tens of thousands of extra machines are needed&#x20;
* **Challenges**: integration, overhead, and false positives&#x20;
* **TSVD**: a scalable dynamic analysis tool for TSVs&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FyhcmJWK1WxM3hMDP1aTJ%2Fimage.png?alt=media\&token=f845fbc9-1309-43a0-aed7-7034f48d6dfd)

* Design&#x20;
  * How to achieve zero false positive?
    * Report after violation + inject delays to trigger violations&#x20;
      * Question: how to identify potential unsafe calls to insert delays?&#x20;
        * Happen-before analysis (expensive, and require integration effort to annotate all sync primitives)
      * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2F2moj9PaoMxFsrr5iTpPk%2Fimage.png?alt=media\&token=80dd272c-7491-44fa-b5cb-cb9a2c62dc1a)
      * Racing calls&#x20;
        * Two conflict methods, called from different threads, accessing the same object, having close-by physical timestamps&#x20;
        * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2Fv6aMNHJrFHJaIKl5SHWE%2Fimage.png?alt=media\&token=7a4ef053-d840-4d8a-a114-41a68dcbce73)
        * Remove unlikely race calls&#x20;
          * Insights for synchronization inference&#x20;
            * Insights: many ways to implement synchronization, one common effect of all synchronizations: if m1 synchronized before m2 and m1, m2 are nearby, delay to m1 will cause delay to m2&#x20;
            * Transitive delay&#x20;
              * If another thread cannot make progress, ..., infer synchronization&#x20;
    * TSVD inference&#x20;
      * Is there exist a sync?
      * Is the program running in a sequential mode?
      * Is a likely racing call less likely than another?&#x20;
    * TSVD multiple runs&#x20;
      * Earlier runs' knowledge will be inherited by later runs&#x20;
* TSVD review
  * instrumentation: push-button design&#x20;
  * runtime:&#x20;
    * likely racing list: add to list based on the physical timestamp, read the list to inject delay, remove based on the delay feedback, and report bugs&#x20;
    * Remaining entries can be passed to the next run&#x20;
* Evaluation: test in Microsoft&#x20;
  * Compared with Random, data collider, and HB-Tracking&#x20;
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FzvWkXDlNlUSKWypIiJi2%2Fimage.png?alt=media\&token=ffccdc1e-5a51-41ed-9b0e-07f9b1839a1f)
  * ![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FJqINOVumXH6z2W57kqpF%2Fimage.png?alt=media\&token=b22f5356-9394-4e80-a122-656e6a38e5d4)
* Overall: push-button TSV detection tool.&#x20;
  * Infers sync and uses them in the same run&#x20;
    * Easy integration: oblivious to synchronization patterns&#x20;
    * Lightweight&#x20;
    * Accuracy
    * Coverage&#x20;
