Kernel Tracing David Ferry, Chris Gill, Brian Kocoloski CSE 422S - Operating Systems Organization Washington University in St. Louis St. Louis, MO 63143
Things Happen: Kernel Oops vs Panic A kernel panic is unrecoverable and results in an instant halt An oops communicates something bad happened but the kernel tries to continue executing An oops means the kernel is not totally broken, but is probably in an inconsistent state An oops in interrupt context, the idle task (pid 0), or the init task (pid 1) results in a panic How to figure out what went wrong? "Kernel-panic" by Kevin http://flickr.com/photos/kevincollins/74279815/ CSE 422S –Operating Systems Organization
CSE 422S –Operating Systems Organization Debugging Can take two forms Correctness debugging e.g., your kernel is panic’ing when you load a kernel module Performance debugging e.g., you parallelized a program to run on 4 cores rather than 1, but did not see 4x speedup … CSE 422S –Operating Systems Organization
CSE 422S –Operating Systems Organization Source: https://opensourceforu.com/2011/01/understanding-a-kernel-oops/ CSE 422S –Operating Systems Organization
CSE 422S –Operating Systems Organization Important Bits BUG: unable to handle kernel NULL pointer dereference You probably have a pointer that was never set to allocated memory e.g: int * ptr = NULL; *ptr = x; Instruction pointer (sometimes called “program counter”) Address of instruction that generated BUG is stored in a special register On x86_64: RIP On ARM: R15 Linux will map this instruction back to the function where this instruction occurred In the previous example: my_oops_init CSE 422S –Operating Systems Organization
Kernel Correctness Debugging Debugging in the kernel is much more complicated than debugging in userspace You don’t have simple to use debuggers (though some like kgdb do exist) You don’t get graceful kill signals like segfaults Small errors can bring down the machine (question: why?) CSE 422S –Operating Systems Organization
Simplest “Tracer”: printk() printk() prints information to the system log Messages stored in circular buffer Can be read with dmesg Eight possible log levels (set with dmesg –n) Example: printk(KERN_ALERT “bad thing %ld”, bad_thing); Uses same format as printf() Note there is no comma after log level (KERN_ALERT) CSE 422S –Operating Systems Organization
Other Debugging Mechanisms int some_function(…) { // <code you know is correct> // <code never reached> } CSE 422S –Operating Systems Organization
Other Debugging Mechanisms int some_function(…) { // <code you know is correct> // <code that must have a bug> // <code never reached> } CSE 422S –Operating Systems Organization
Other Debugging Mechanisms int some_function(…) { printk(“Value of some variable: %d\n”, var); // stop executing code to avoid // triggering bug while (1) {} Sits in a loop forever, but will never relinquish the CPU (Linux generally does not preempt tasks executing kernel code) CSE 422S –Operating Systems Organization
Other Debugging Mechanisms int some_function(…) { printk(“Value of some variable: %d\n”, var); // stop executing code to avoid // triggering bug while (1) { schedule(); } Relinquishes the current CPU, giving it back to Linux to run other tasks CSE 422S –Operating Systems Organization
Performance Debugging The more interesting type of debugging We’ll look at two things today: monitoring system calls executed by a process (fyi: this is something a rootkit or sandboxer might also want to do ) Monitoring preemption and scheduling operations experienced by a process CSE 422S –Operating Systems Organization
System Call Tracing: Strace Allows one userspace process (tracer) to inspect the system calls made by another thread (tracee). Tracer calls ptrace() on tracee Tracee halts at every system call, system call return, and signal (except SIGKILL) Tracer records info, and releases tracee to continue Note: Tracing is per-thread Seriously warps program timing CSE 422S –Operating Systems Organization
CSE 422S –Operating Systems Organization strace ./program CSE 422S –Operating Systems Organization
CSE 422S –Operating Systems Organization strace ./program strace KERNEL CSE 422S –Operating Systems Organization
strace ./program strace fork KERNEL ./program <kernel creates new process> KERNEL ./program CSE 422S –Operating Systems Organization
ptrace (PTRACE_TRACEME) strace ./program strace fork <kernel creates new process> <kernel begins intercepting syscalls of ./program> KERNEL ptrace (PTRACE_TRACEME) ./program CSE 422S –Operating Systems Organization
ptrace (PTRACE_ATTACH) fork strace ./program strace ptrace (PTRACE_ATTACH) fork <kernel begins forwarding syscalls of ./program to strace> <kernel creates new process> <kernel begins intercepting syscalls of ./program> KERNEL ptrace (PTRACE_TRACEME) ./program CSE 422S –Operating Systems Organization
ptrace (PTRACE_ATTACH) fork strace ./program strace ptrace (PTRACE_ATTACH) fork <kernel begins forwarding syscalls of ./program to strace> <kernel creates new process> <kernel stops ./program and wakes up strace> <kernel begins intercepting syscalls of ./program> KERNEL ptrace (PTRACE_TRACEME) open (…) ./program CSE 422S –Operating Systems Organization
ptrace (PTRACE_ATTACH) Inspect open system call strace ./program strace ptrace (PTRACE_ATTACH) Inspect open system call fork <kernel begins forwarding syscalls of ./program to strace> <kernel creates new process> <kernel stops ./program and wakes up strace> <kernel begins intercepting syscalls of ./program> KERNEL ptrace (PTRACE_TRACEME) open (…) ./program CSE 422S –Operating Systems Organization
ptrace (PTRACE_ATTACH) Inspect open system call ptrace (PTRACE_CONT) strace ./program strace ptrace (PTRACE_ATTACH) Inspect open system call ptrace (PTRACE_CONT) fork <kernel begins forwarding syscalls of ./program to strace> <kernel creates new process> <kernel stops ./program and wakes up strace> <kernel stops strace, executes system calls and wakes up ./program> <kernel begins intercepting syscalls of ./program> KERNEL ptrace (PTRACE_TRACEME) open (…) ./program CSE 422S –Operating Systems Organization
ptrace (PTRACE_ATTACH) Inspect open system call ptrace (PTRACE_CONT) strace ./program strace ptrace (PTRACE_ATTACH) Inspect open system call ptrace (PTRACE_CONT) fork <kernel begins forwarding syscalls of ./program to strace> <kernel creates new process> <kernel stops ./program and wakes up strace> <kernel stops strace, executes system calls and wakes up ./program> <kernel begins intercepting syscalls of ./program> KERNEL ptrace (PTRACE_TRACEME) open (…) ./program CSE 422S –Operating Systems Organization
Ftrace – the Function Tracer Tracing beyond system calls; many features: Event tracepoints (scheduler, interrupts, etc.) Trace any kernel function Call graphs Kernel stack size Latency tracing How long interrupts disabled How long preemption disabled Has a user interface called trace-cmd Very nice graphical trace browser called Kernelshark CSE 422S –Operating Systems Organization
CSE 422S –Operating Systems Organization Ftrace Internals When tracing is enabled, the kernel maintains: Per-CPU ring buffer for holding events Per-CPU kernel thread that empties ring buffer If readers can’t keep up, data is lost (dropped) Tracepoints in kernel: Kernel maintains list of tracepoint locations Locations normally converted to no-ops (ftrace_make_nop()) Trace code is runtime-patched into kernel code when activated (ftrace_make_call()) CSE 422S –Operating Systems Organization
CSE 422S –Operating Systems Organization Today’s Studio By the end of the studio, you will understand how to perform basic analysis of program and kernel interaction What system calls are made? Where and when does the kernel schedule processes? You will use strace to inspect system calls made by a program strace ./dense_mm 100 &> dense100trace.txt strace ./dense_mm 300 &> dense300trace.txt You will use ftrace and kernelshark to inspect scheduling behavior CSE 422S –Operating Systems Organization