# Efficient and Scalable Thread-Safety Violation Detection

### Presentation&#x20;

* Concurrent programming are everywhere --> bugs&#x20;
* Thread-safety violation (TSV)&#x20;
  * Thread-safety contract&#x20;
  * Thread-safety violation (TSV) occurs if two threads concurrently work on two conflicting methods on the same object at the same time&#x20;
    * Outcome: crashes, etc.&#x20;
* Reasoning about TSVs is difficult&#x20;
  * Too many possibilities of concurrency&#x20;
    * Multithreading, event-driven, message passing&#x20;
  * Too many forms of synchronizations&#x20;
* Data-race analysis on small scale, it's fine&#x20;
  * Program source code --> integration --> static data-race analysis --> pruning&#x20;
  * Or dynamic data-race analysis&#x20;
    * Run the program with test inputs and program binary (integration)
    * Dynamic data-race analysis will slow down by 10x or more&#x20;
* Large-scale: existing techniques fail&#x20;
  * CloudBuild: 100M tests from 4K teams, up to 10K machines / day&#x20;
  * Tens of thousands of extra machines are needed&#x20;
* **Challenges**: integration, overhead, and false positives&#x20;
* **TSVD**: a scalable dynamic analysis tool for TSVs&#x20;

![](/files/abNCTOqU34X6GlqIXBWI)

* Design&#x20;
  * How to achieve zero false positive?
    * Report after violation + inject delays to trigger violations&#x20;
      * Question: how to identify potential unsafe calls to insert delays?&#x20;
        * Happen-before analysis (expensive, and require integration effort to annotate all sync primitives)
      * ![](/files/JKIGrCF0ng3MIWLVp31q)
      * Racing calls&#x20;
        * Two conflict methods, called from different threads, accessing the same object, having close-by physical timestamps&#x20;
        * ![](/files/oGHyopsSAO0hata40x6G)
        * Remove unlikely race calls&#x20;
          * Insights for synchronization inference&#x20;
            * Insights: many ways to implement synchronization, one common effect of all synchronizations: if m1 synchronized before m2 and m1, m2 are nearby, delay to m1 will cause delay to m2&#x20;
            * Transitive delay&#x20;
              * If another thread cannot make progress, ..., infer synchronization&#x20;
    * TSVD inference&#x20;
      * Is there exist a sync?
      * Is the program running in a sequential mode?
      * Is a likely racing call less likely than another?&#x20;
    * TSVD multiple runs&#x20;
      * Earlier runs' knowledge will be inherited by later runs&#x20;
* TSVD review
  * instrumentation: push-button design&#x20;
  * runtime:&#x20;
    * likely racing list: add to list based on the physical timestamp, read the list to inject delay, remove based on the delay feedback, and report bugs&#x20;
    * Remaining entries can be passed to the next run&#x20;
* Evaluation: test in Microsoft&#x20;
  * Compared with Random, data collider, and HB-Tracking&#x20;
  * ![](/files/HS2GpvH45mu1qnJplu3Q)
  * ![](/files/GQNRuIiapCKg9yXX9Mlo)
* Overall: push-button TSV detection tool.&#x20;
  * Infers sync and uses them in the same run&#x20;
    * Easy integration: oblivious to synchronization patterns&#x20;
    * Lightweight&#x20;
    * Accuracy
    * Coverage&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/conference/index/sosp-21/consistency/efficient-and-scalable-thread-safety-violation-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
