Efficient and Scalable Thread-Safety Violation Detection
SOSP 19: https://dl.acm.org/doi/pdf/10.1145/3341301.3359638?casa_token=tJoAJXALN4MAAAAA:JUd0XcAhj-AXAA8ORPeHmOS2xaLGSI-vwBGVMjVsixOq2KIXpuypu0V9ENOzpswb39bFZoXJs2n-
Presentation
Concurrent programming are everywhere --> bugs
Thread-safety violation (TSV)
Thread-safety contract
Thread-safety violation (TSV) occurs if two threads concurrently work on two conflicting methods on the same object at the same time
Outcome: crashes, etc.
Reasoning about TSVs is difficult
Too many possibilities of concurrency
Multithreading, event-driven, message passing
Too many forms of synchronizations
Data-race analysis on small scale, it's fine
Program source code --> integration --> static data-race analysis --> pruning
Or dynamic data-race analysis
Run the program with test inputs and program binary (integration)
Dynamic data-race analysis will slow down by 10x or more
Large-scale: existing techniques fail
CloudBuild: 100M tests from 4K teams, up to 10K machines / day
Tens of thousands of extra machines are needed
Challenges: integration, overhead, and false positives
TSVD: a scalable dynamic analysis tool for TSVs
Design
How to achieve zero false positive?
Report after violation + inject delays to trigger violations
Question: how to identify potential unsafe calls to insert delays?
Happen-before analysis (expensive, and require integration effort to annotate all sync primitives)
Racing calls
Two conflict methods, called from different threads, accessing the same object, having close-by physical timestamps
Remove unlikely race calls
Insights for synchronization inference
Insights: many ways to implement synchronization, one common effect of all synchronizations: if m1 synchronized before m2 and m1, m2 are nearby, delay to m1 will cause delay to m2
Transitive delay
If another thread cannot make progress, ..., infer synchronization
TSVD inference
Is there exist a sync?
Is the program running in a sequential mode?
Is a likely racing call less likely than another?
TSVD multiple runs
Earlier runs' knowledge will be inherited by later runs
TSVD review
instrumentation: push-button design
runtime:
likely racing list: add to list based on the physical timestamp, read the list to inject delay, remove based on the delay feedback, and report bugs
Remaining entries can be passed to the next run
Evaluation: test in Microsoft
Compared with Random, data collider, and HB-Tracking
Overall: push-button TSV detection tool.
Infers sync and uses them in the same run
Easy integration: oblivious to synchronization patterns
Lightweight
Accuracy
Coverage
Last updated
Was this helpful?