Time Sources and Timing David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO 63143
Questions About the Previous Studio Q: Why does the kernel create an init thread at boot instead of using the original execution context? A: The start_kernel execution context in main.c eventually becomes the main processor idle loop, a.k.a. the swapper or sched process. (Follow the start_kernel function through to cpu_idle_loop() ) CSE 522S – Advanced Operating Systems
Questions About the Previous Studio Q: Do we really need to issue make clean when rebuilding the kernel? A: Not always, but probably a good idea to do so anyway unistd.h #define NR_SYSCALLS 292 baz.c foo.c foo.o NR_SYSCALLS = 388 baz.o NR_SYSCALLS = 292 CSE 522S – Advanced Operating Systems
Today: What is meant by Time? Absolute time since some fixed past event, e.g.: Seconds since start of the Unix Epoch (00:00:00 UTC on 1 January 1970) Seconds since system boot Relative time, E.g.: Seconds between two events Ten seconds into the future (from now) Execution time of a program segment World time, E.g.: February 4th, 10:37 AM An OS must approximate time to provide time-based functions for users. CSE 522S – Advanced Operating Systems
CSE 522S – Advanced Operating Systems Time Sources RTC (Real-Time Clock) Available on most computers (not on RPi 2 or 3 unless you add it) Low precision (as low as 0.5 seconds) Hardware Timers Might be used to generate interrupts, might be queryable Run at a variety of frequencies Programmable Interval Timer (PIT) High-Performance Event Timer (HPET) Programmable Interrupt Controller (PIC) Advanced Programmable Interrupt Controller (APIC) Processor Cycles Timestamp Counter (TSC) on x86, 64-bit Cycle Counter (CCNT) on ARM, 32-bit, 64-cycle divider, not accessible in user mode Potentially very high accuracy Can generate interrupts on x86 with TSC-deadline CSE 522S – Advanced Operating Systems
How does the kernel schedule services? Three approaches: Timer interrupts to regularly provide service e.g.: task scheduling, current time update Kernel timers (hrtimers, legacy timers) Kernel threads (not time dependent) A portable OS must use whatever timer hardware is available to support actions #1 and #2. CSE 522S – Advanced Operating Systems
Basic Timer Interrupt in Linux Historically the OS would run periodically: Fundamental timestep is variable HZ found in /include/asm-generic/param.h OS function tick_periodic() runs each 1/HZ seconds jiffies variable tracks number of ticks since boot xtime variable tracks wall-clock time Current time since boot: jiffies * 1/HZ Current wall time: boot_time + jiffies * 1/HZ Timers are checked for expiry each tick Concerns with this approach: Excessive power consumption (can’t really idle processors) Inappropriate for highly virtualized systems Might be appropriate for real-time systems CSE 522S – Advanced Operating Systems
Modern Timer Interrupt in Linux Two fundamental operating modes: Periodic (CLOCK_EVT_MODE_PERIODIC) One-shot (CLOCK_EVT_MODE_ONESHOT) One-shot mode (CONFIG_NO_HZ): All timer events are independent (not periodic) Next timer event is scheduled for: Next hrtimer expiration Next 1/HZ interval, whichever is sooner Not rescheduled if no runnable tasks jiffies is maintained as though periodic Big power savings in idle CSE 522S – Advanced Operating Systems
Problems with legacy time system Legacy timers could only expire each tick Latency problems: with 10ms tick a timeout might be delivered 9.9ms late Resolution problems: with a 10ms tick all we know was that timeout was 0-9.9ms ago Legacy timer wheel implementation problems: Bad worst case expiration running time (O(N)) 95-99% of timers never actually expire (e.g. http timeouts, error checking) but still incur overhead Thus, not optimized for “millions of network timers” case CSE 522S – Advanced Operating Systems
CSE 522S – Advanced Operating Systems hrtimers subsystem The kernel now provides a high resolution timing subsystem: hrtimers Not bound to jiffies Can use multiple time sources High resolution (nanosecond representation) Implementation: Per-CPU timer list stored as a sorted red-black tree for fast expiration Separate list for fast removal (no tree walking) Expiration handled by dedicated interrupts CSE 522S – Advanced Operating Systems