RR: Engineering Record and Replay for Deployability
https://www.usenix.org/conference/atc17/technical-sessions/presentation/ocallahan
Last updated
Was this helpful?
https://www.usenix.org/conference/atc17/technical-sessions/presentation/ocallahan
Last updated
Was this helpful?
Topic: partial record and replay debugging with rr
Debugging nondeterminisim
Bugs, confuses the output of the system
Difficult to debug
Deterministic hardware
Sources of nondeterminism
Record inputs
Reply execution
Old idea
Nirvana, PinPlay, ReVirt, Jockey, ReSpec, Chronomancer, PANDA, Echo, FlashBack, ...
Easy to deploy: stock hardware (i.e. not customized), commodity OS, no kernel changes
Low overhead
Works on Firefox
Small investment
Idea: user-space processes running in the linux, record all the input (system call results, signals) to those processes, reply those inputs, get the same process execution, would be able to replay and debug
No code instrumentation
Use modern HW/OS features
Linux API: ptrace
Data races: multiple CPU running the same time, one read and one write, can lead to non-deterministic result
Shared memory data access: limit to single core , manage context switches
Asynchronous event timing: HW performance counters
Signal runs at the right program states during replay
Idea: count the number of signals, deliver the signal after the signals
Doing this in HW, no code instrumentation
Trap on a subset of system calls: seccomp-bpf
Two traps, four context switches, system calls expensive
shim library: loaded into the process that we're tracing; part of the recording and replay; wrap the common system calls; after system calls, record the results into the buffer (periodically flush by the supervisor process)
Inject bpf predicate to the kernel
What happens if the system call blocks?
Schedule another thread to run
DESCHED perf event
Everytime a thread is put out of the core and put into the idle queue
Get the performance of it
Other issues
RDTSC
RDRAND
XGEBIN/XEND
CPUID
CP: recursive
Octane: javascript
HTMLTEST: firefox on html unit test
Sambatest
Also: reverse-execution debugging
Replay performance matters
Session-cloning performance matters (checkpoint of the current system states)
Clonning processes via fork() seems cheaper than e.g. cloning VM state
In-process system-call interception is fragile
Applications make syscalls in strage states (bad TLS, insufficient stack, etc)
In-process interception code could be accidentally or maliciously subverted
Move this part into kernel?
OS design implications
Recording boundary should
Be stable, simple, documented API boundary
Also be a boundary for hardware performance counter measurement
ARM
Need hardware support to detect / compensate
Or binary rewriting
Related work
VM-level reply: heavyweight
Kernel-supported replay: hard to maintain
Pure user-space replay: instrumentation, higher overhead
Higher-level replay: more limited scope
Parallel replay: more limited scope, higher overhead
Hardware-supported parallel replay: nonexistent hardware
rr's apporach delivers a lot of vlaue
more research needed for multicore apporaches
lots of unexplored applications of record+replay
1-thread-one-time: disappearance?
virtual system calls?
patch this to normal system call
application: undefined behavior?
fine, re-produce the exact execution during replay
deal with msi free
recording: deterministic behavior
recording locations of memory maps, use map fix to make sure that ...
applications that have randomizations?
exponential back-off?
random numbers are from some source, record and replay from the random number generations
move traces in between different machines is difficult?
trace format: pack
cpu ID
Non-determinism (debugging)
Test randomly failed, don't know why / how often
Diff tests running (linux opt/pgo/debug/...)
orange/red test has nothing to do with the change
Deterministic hardware
External sources of non-determinism
Building in the middle: record input
Non-determinisitc conditions
Replay execution
Old idea
ODR, PinPlay, ...
RR goals
Easy to deploy
Low overhead
Works on FF
Small investment (other work: binary instrumentation, OS kernel changes, hard to maintain and distribute)
Modern HW/OS features
Ptrace: one process monitors what is happening on the other one (system tracing)
Tracer / tracee
Single sys call: context switches (overhead)
I.e. get PID, read (cheap sys call)
Shared memory data races --> limit to single core
Async event timing --> HW performance counters (retired conditional branch)
When the tracee gets to the point, software interrupt
Instruction stream, runtime instrumentation
JIT
Trap on a subset of system calls
seccomp-bpf
Do the filtering when staying in the user-space
Conditions to be checked before context-switches
Recording in user-space
Sys call block
Look at the scheduled event and record them
Record all the memories
Same memory map in the same locations
Other issues
instructions that generate randomness in CPU
RDTSC: tell the .. to interrupt
Back them: same CPU
Now: ptrace to tell what the CPU is
Replay: can be fast (no context switches)
Another apporach: cloning the whole VM state
Capture the evolution of the memory?
See what's changing
Doing it as the process level
No need to re-record, but keep track of the changes
GDB: go backward
In forward execution, at diff points in time, fork() of replay
Backward: breakpoint, and then go into one of the fork()
Move this part into kernel
Painful: because they do all of these in the process
Some of the recording phase can maybe done in kernel
Security
Faster
could create snapshot but it's not what they're describing
Can we apply this kind of technique
Common bugs but hard to find
FlyMS (eurosys 19), samc (osdi 14)
Make the traces more interpretable?
Oathkeeper (OSDI 22)
etcd still gets constant flow of bug reports
Bottom line: many bugs reproducible by recreating external conditions (e.g., not OS thread timing dependent)
No memory race conditions, etc.
Now
Debugging distributed systems now
Collect per-machine logs
Virtually unify them
Guess root causes
A system that emits machine readable logs?
Logs --> reproduce the bugs
Reverse-execute the system from the bug location
Like in rr
Root cause analysis in distributed systems: prior work
Demi (NSDI 16) [minimize faulty executions of distributed systems)
FlyMC, SAMC
Collect per-node partial event orders
Use DPOR to recreate total order
I.e. RR needs to have exact recording, but not this one
STOA debugging tools for DS? limited to the types of systems, but not in general distributed system way (Define a general model for every distributed system that you have)
Ray: log file of outputs, per process
Message contents
Annotate each of the log? figure out from the message contents / payload?
Merging files? (where do you merge [problem])
Common goal of DS: common state machine ? to do this you need an exact ordering of the log ? network parition? [data inconsistency --> data corruption]