Kernel Synchronization in Linux (Chap. 5 in Understanding the Linux Kernel) J. H. Wang Sep. 29, 2011.

Slides:



Advertisements
Similar presentations
Chapter 6: Process Synchronization
Advertisements

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
CS510 Concurrent Systems Jonathan Walpole. What is RCU, Fundamentally? Paul McKenney and Jonathan Walpole.
Bilgisayar Mühendisliği Bölümü GYTE - Bilgisayar Mühendisliği Bölümü Multithreading the SunOS Kernel J. R. Eykholt, S. R. Kleiman, S. Barton, R. Faulkner,
Kernel Synchronization
CS 6560 Operating System Design Lecture 7: Kernel Synchronization Kernel Time Management.
COMS W6998 Spring 2010 Erich Nahum
Synchronization. Shared Memory Thread Synchronization Threads cooperate in multithreaded environments – User threads and kernel threads – Share resources.
Read-Copy Update P. E. McKenney, J. Appavoo, A. Kleen, O. Krieger, R. Russell, D. Saram, M. Soni Ottawa Linux Symposium 2001 Presented by Bogdan Simion.
Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.
CS533 Concepts of Operating Systems Class 4 Linux Kernel Locking Issues.
CS510 Concurrent Systems Class 1
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
1 Concurrency: Deadlock and Starvation Chapter 6.
CS533 - Concepts of Operating Systems 1 CS533 Concepts of Operating Systems Class 8 Synchronization on Multiprocessors.
CS510 Concurrent Systems Class 1 Linux Kernel Locking Techniques.
CS533 Concepts of Operating Systems Class 17 Linux Kernel Locking Techniques.
CPS110: Implementing threads/locks on a uni-processor Landon Cox.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
CS533 Concepts of Operating Systems Linux Kernel Locking Techniques.
CS252: Systems Programming Ninghui Li Final Exam Review.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
14 1 Embedded Software Lab. Embedded Software Lab Daejun Park, Eunsoo Park Lecture 8 Synchronization.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Mutual Exclusion.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Operating Systems CSE 411 Kernel synchronization, deadlocks Kernel synchronization, deadlocks Dec Lecture 31 Instructor: Bhuvan Urgaonkar.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Synchronization.
Scheduling Lecture 6. What is Scheduling? An O/S often has many pending tasks. –Threads, async callbacks, device input. The order may matter. –Policy,
Kernel Locking Techniques by Robert Love presented by Scott Price.
Lecture 8 Page 1 CS 111 Online Other Important Synchronization Primitives Semaphores Mutexes Monitors.
Synchronization in Linux COMS W4118 Spring Kernel Synchronization Can think of the kernel as a server Concurrent requests are possible Synchronization.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
© 2006 RightNow Technologies, Inc. Synchronization September 15, 2006 These people do not actually work at RightNow.
CS533 Concepts of Operating Systems Jonathan Walpole.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
Kernel Synchronization in Linux Uni-processor and Multi-processor Environment By Kathryn Bean and Wafa’ Jaffal (Group A3)
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Synchronization.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Advanced Operating Systems (CS 202) Memory Consistency and lock-free synchronization (RCU) Jan, 27, 2016.
Process Synchronization. Concurrency Definition: Two or more processes execute concurrently when they execute different activities on different devices.
Kernel Synchronization David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
Chapter 4 – Thread Concepts
PROCESS MANAGEMENT IN MACH
Background on the need for Synchronization
Kernel Synchronization
Chapter 4 – Thread Concepts
SYNCHRONIZATION IN LINUX
Linux Synchronization
CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Lecture 2 Part 2 Process Synchronization
Chapter 7: Synchronization Examples
Threads Chapter 4.
Concurrency: Mutual Exclusion and Process Synchronization
CS510 Concurrent Systems Class 1a
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CS333 Intro to Operating Systems
Chapter 6: Synchronization Tools
CS510 Concurrent Systems Jonathan Walpole.
Linux Kernel Locking Techniques
CSE 153 Design of Operating Systems Winter 2019
CSE 542: Operating Systems
CSE 542: Operating Systems
Kernel Synchronization
Presentation transcript:

Kernel Synchronization in Linux (Chap. 5 in Understanding the Linux Kernel) J. H. Wang Sep. 29, 2011

Outline Kernel Control Paths When Synchronization is not Necessary Synchronization Primitives Synchronizing Accesses to Kernel Data Structures Examples of Race Condition Prevention

Kernel Control Paths Linux kernel: like a server that answers requests –Parts of the kernel are run in interleaved way A kernel control path : a sequence of instructions executed in kernel mode on behalf of current process –Interrupts or exceptions –Lighter than a process (less context)

Example Kernel Control Paths Three CPU states are considered –Running a process in User Mode (User) –Running an exception or a system call handler (Excp) –Running an interrupt handler (Intr)

Kernel Preemption Preemptive kernel : a process running in kernel mode can be replaced by another process while in the middle of a kernel function The main motivation for making a kernel preemptive is to reduce the dispatch latency of the user mode processes –Delay between the time they become runnable and the time they actually begin running The kernel can be preempted only when it is executing an exception handler (in particular a system call) and the kernel preemption has not been explicitly disabled

When Synchronization in Necessary A race condition can occur when the outcome of a computation depends on how two or more interleaved kernel control paths are nested To identify and protect the critical regions in exception handlers, interrupt handlers, deferrable functions, and kernel threads –On single CPU, critical region can be implemented by disabling interrupts while accessing shared data –If the same data is shared only by the service routines of system calls, critical region can be implemented by disabling kernel preemption while accessing shared data Things are more complicated on multiprocessor systems –Different synchronization techniques are necessary

When Synchronization is not Necessary The same interrupt cannot occur until the handler terminates Interrupt handlers and softirqs are non- preemptable, non-blocking A kernel control path performing interrupt handling cannot be interrupted by a kernel control path executing a deferrable function or a system call service routine Softirqs cannot be interleaved

Synchronization Primitives TechniqueDescriptionScope Per-CPU variables Duplicate a data structure among CPUs All CPUs Atomic operation Atomic read-modify-write instruction All Memory barrierAvoid instruction re-orderingLocal CPU Spin lockLock with busy waitAll SemaphoreLock with blocking wait (sleep)All SeqlocksLock based on access counterAll Local interrupt disabling Forbid interrupt on a single CPU Local Local softirq disabling Forbid deferrable function on a single CPU Local Read-copy- update (RCU) Lock-free access to shared data through pointers All

Per-CPU Variables The simplest and most efficient synchronization technique consists of declaring kernel variables as per- CPU variables –an array of data structures, one element per each CPU in the system –A CPU should not access the elements of the array corresponding to the other CPUs While per-CPU variables provide protection against concurrent accesses from several CPUs, they do not provide protection against accesses from asynchronous functions (interrupt handlers and deferrable functions) Per-CPU variables are prone to race conditions caused by kernel preemption, both in uniprocessor and multiprocessor systems

Functions and Macros for the Per- CPU Variables Macro/ function name Description DEFINE_PER_CPU(ty pe, name) Statically allocates a per-CPU array per_cpu(name, cpu) Selects the element for CPU of the per-CPU array __get_cpu_var(name) Selects the local CPU's element of the per-CPU array get_cpu_var(name) Disables kernel preemption, then selects the local CPU's element of the per-CPU array put_cpu_var(name) Enables kernel preemption alloc_percpu(type) Dynamically allocates a per-CPU array free_percpu(pointer) Releases a dynamically allocated per-CPU array per_cpu_ptr(pointer, cpu) Returns the address of the element for CPU of the per-CPU array

Atomic Operations Atomic 80x86 instructions –Instructions that make zero or one aligned memory access –Read-modify-write instructions (inc or dec) –Read-modify-write instructions whose opcode is prefixed by the lock byte (0xf0) –Assembly instructions whose opcode is prefixed by a rep byte (0xf2, 0xf3) are not atmoic

Atomic_t type: 24-bit atomic counter Atomic operations in Linux: FunctionDescription atomic_read(v) atomic_set(v,i) atomic_add(i,v) atomic_sub(i,v) atomic_sub_and_test(i,v) atomic_inc(v) atomic_dec(v) atomic_dec_and_test(v) atomic_inc_and_test(v) atomic_add_negative(i,v) Return *v set *v to i add i to *v subtract i from *v subtract i from *v and return 1 if result is 0 add 1 to *v subtract 1 from *v subtract 1 from *v and return 1 if result is 0 add 1 to *v and return 1 if result is 0 add i to *v and return 1 if result is negative

Atomic Bit Handling Functions FunctionDescription test_bit(nr, addr) set_bit(nr, addr) clear_bit(nr, addr) change_bit(nr, addr) test_and_set_bit(nr, addr) test_and_clear_bit(nr, addr) test_and_change_bit(nr, addr) atomic_clear_mask(mask, addr) atomic_set_mask(mask, addr) return the nrth bit of *addr set the nrth bit of *addr clear the nrth bit of *addr invert the nrth bit of *addr set nrth bit of *addr and return old value clear nrth bit of *addr and return old value invert nrth bit of *addr and return old value clear all bits of addr specified by mask set all bits of addr specified by mask

Memory Barriers When dealing with synchronization, instruction reordering must be avoided A memory barrier primitive ensures that the operations before the primitive are finished before starting the operations after the primitive –All instructions that operate on I/O ports –All instructions prefixed by lock byte –All instructions that write into control registers, system registers, or debug registers –A few special instructions, e.g. iret –lfence, sfence, and mfence instructions for Pentium 4

Memory Barriers in Linux MacroDescription mb() rmb() wmb() smp_mb() smp_rmb() smp_wmb() Memory barrier for MP and UP Read memory barrier for MP, UP Write memory barrier for MP, UP Memory barrier for MP only Read memory barrier for MP only Write memory barrier for MP only

Spin Locks Spin locks are a special kind of lock designed to work in a multiprocessor environment –Busy waiting –Very convenient –Represented by spinlock_t structure slock: 1 – unlocked, <=0 - locked break_lock: flag

Protecting Critical Regions with Several Locks

Spin Lock Macros MacroDescription spin_lock_init() spin_lock() spin_unlock() spin_unlock_wait() spin_is_locked() spin_trylock() set the spinlock to 1 (unlocked) cycle until spin lock becomes 1, then set to 0 set the spin lock to 1 wait until the spin lock becomes 1 return 0 if the spin lock is set to 1 set the spin lock to 0 (locked), and return 1 if the lock is obtained

Read/Write Spin Locks To increase the amount of concurrency in the kernel –Multiple reads, one write rwlock_t structure –lock field: 32-bit 24-bit counter: (bit 0-23) # of kernel control paths currently reading the protected data (in two’s complement) An unlock flag: (bit 24) Macros –read_lock() –read_unlock() –write_lock() –write_unlock()

Read/Write Spin Locks

Seqlock Seqlocks introduced in Linux 2.6 are similar to read/write spin locks –except that they give a much higher priority to writers –a writer is allowed to proceed even when readers are active

Read-Copy Update Read-copy update ( RCU ): another synchronization technique designed to protect data structures that are mostly accessed for reading by several CPUs –RCU allows many readers and many writers to proceed concurrently –RCU is lock-free Key ideas –Only data structures that are dynamically allocated and referenced via pointers can be protected by RCU –No kernel control path can sleep inside a critical section protected by RCU

Macros –rcu_read_lock() –rcu_read_unlock() –call_rcu() RCU –New in Linux 2.6 –Used in networking layer and VFS

Semaphores Two kinds of semaphores –Kernel semaphores: by kernel control paths –System V IPC semaphores: by user processes Kernel semaphores –struct semaphore count wait sleepers –up(): to acquire a kernel semaphore (similar to signal) –down(): to release kernel semaphore (similar to wait)

Read/Write Semaphores Similar to read/write spin locks –except that waiting processes are suspended instand of spinning struct rw_semaphore –count –wait_list –wait_lock init_rwsem() down_read(), down_write(): acquire a read/write semaphore up_read(), up_write(): release a read/write semaphore

Completions To solve a subtle race condition in mutliprocessor systems –Similar to semaphores struct completion –done –wait complete(): corresponding to up() wait_for_completion(): corresponding to down()

Local Interrupt Disabling Interrupts can be disabled on a CPU with cli instruction –local_irq_disable() macro Interrupts can be enabled by sti instruction –local_irq_enable() macro

Disabling/Enabling Deferrable Functions “softirq” The kernel sometimes needs to disable deferrable functions without disabling interrupts –local_bh_disable() macro –local_bh_enable() macro

Synchronizing Accesses to Kernel Data Structures Rule of thumb for kernel developers: –Always keep the concurrency level as high as possible in the system –Two factors: The number of I/O devices that operate concurrently The number of CPUs that do productive work

A shared data structure consisting of a single integer value can be updated by declaring it as an atomic_t type and by using atomic operations Inserting an element into a shared linked list is never atomic since it consists of at least two pointer assignments

Choosing among Spin Locks, Semaphores, and Interrupt Disabling Kernel control pathsUP protectionMP further protection Exceptions interrupts deferrable functions exceptions+interrupts exceptions+deferrable interrupts+deferrable exceptions+interrupts+d eferrable Semaphore local interrupt disabling none local interrupt disabling local softirq disabling local interrupt disabling local interrupt disabling None spin lock none or spin lock spin lock spin lock spin lock spin lock

Interrupt-aware Spin Lock Macros spin_lock_irq(l), spin_unlcok_irq(l) spin_lock_bh(l), spin_unlock_bh(l) spin_lock_irqsave(l,f), spin_unlock_irqrestore(l,f) read_lock_irq(l), read_unlock_irq(l) read_lock_bh(l), read_unlock_bh(l) write_lock_irq(l), write_unlock_irq(l) write_lock_bh(l), write_unlock_bh(l) read_lock_irqsave(l,f), read_unlock_irqrestore(l,f) write_lock_irqsave(l,f), write_unlock_irqrestore(l,f) read_seqbegin_irqsave(l,f), read_seqretry_irqrestore(l,f), write_seqlock_irqsave(l,f), write_sequnlock_irqrestore(l,f) write_seqlock_irq(l), write_sequnlock_irq(l) write_seqlock_bh(l), write_sequnlock_bh(l)

Examples of Race Condition Prevention Reference counters: an atomic_t counter associated with a specific resource The global kernel lock (a.k.a big kernel lock, or BKL) –Lock_kernel(), unlock_kernel() –Mostly used in early versions, used in Linux 2.6 to protect old code (related to VFS, and several file systems) Memory descriptor read/write semaphore –mmap_sem field in mm_struct Slab cache list semaphore –cache_chain_sem semaphore Inode semaphore –i_sem field

When a program uses two or more semaphores, the potential for deadlock is present because two different paths could wait for each other –Linux has few problems with deadlocks on semaphore requests since each path usually acquire just one semaphore –In cases such as rmdir() and rename() system calls, two semaphore requests –To avoid such deadlocks, semaphore requests are performed in address order Semaphore request are performed in predefined address order

Thanks for Your Attention!