The Linux Kernel: A Challenging Workload for Transactional Memory Hany E. Ramadan Christopher J. Rossbach Emmett Witchel Operating Systems & Architecture.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Compaq Enterprise Technical Symposium 2001 OpenVMS on the Itanium TM Processor Family Clair Grant OpenVMS Engineering Clair Grant OpenVMS Engineering.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Bilgisayar Mühendisliği Bölümü GYTE - Bilgisayar Mühendisliği Bölümü Multithreading the SunOS Kernel J. R. Eykholt, S. R. Kleiman, S. Barton, R. Faulkner,
Cpr E 308 Spring 2004 Recap for Midterm Introductory Material What belongs in the OS, what doesn’t? Basic Understanding of Hardware, Memory Hierarchy.
MotoHawk Training Model-Based Design of Embedded Systems.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Chapter 13 Embedded Systems
Chapter 13 Embedded Systems Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan,
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
1 Concurrency: Deadlock and Starvation Chapter 6.
CS533 - Concepts of Operating Systems 1 CS533 Concepts of Operating Systems Class 8 Synchronization on Multiprocessors.
Experience with K42, an open- source, Linux-compatible, scalable operation-system kernel IBM SYSTEM JOURNAL, VOL 44 NO 2, 2005 J. Appovoo 、 M. Auslander.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
1 Concurrency: Deadlock and Starvation Chapter 6.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
VxWorks & Memory Management
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Maximum Benefit from a Minimal HTM Owen Hofmann, Chris Rossbach, and Emmett Witchel The University of Texas at Austin.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
Kernel Locking Techniques by Robert Love presented by Scott Price.
A summary by Nick Rayner for PSU CS533, Spring 2006
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Tim Hamilton.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
© 2004, D. J. Foreman 1 Device Mgmt. © 2004, D. J. Foreman 2 Device Management Organization  Multiple layers ■ Application ■ Operating System ■ Driver.
Managing Processors Jeff Chase Duke University. The story so far: protected CPU mode user mode kernel mode kernel “top half” kernel “bottom half” (interrupt.
Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.
1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
Introduction to Operating Systems Concepts
Introduction to threads
Limited Direct Execution
Processes and threads.
Presented by Yoon-Soo Lee
University of Texas at Austin
Mechanism: Limited Direct Execution
CSCI206 - Computer Organization & Programming
Chapter 4: Threads.
Threads, SMP, and Microkernels
Chapter 4: Threads.
Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E
Threads Chapter 4.
Hybrid Transactional Memory
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
Decomposing Hardware Lock Elision
Chip&Core Architecture
Operating System Overview
CS Introduction to Operating Systems
Presentation transcript:

The Linux Kernel: A Challenging Workload for Transactional Memory Hany E. Ramadan Christopher J. Rossbach Emmett Witchel Operating Systems & Architecture Group University of Texas at Austin

Talk overview Why OSes are interesting workloads (1) Interrupts (2) –Transaction stacking (3) Configurable contention management (1) Other issues considered in the paper (1) Preliminary results (2)

Why OSes are interesting workloads Large concurrent program with interacting subsystems Complex, will benefit from ease of programming and maintainability Lack of OS scalability will harm application performance Diverse primitives for managing concurrency –spinlocks, semaphores, per-CPU variables, RCU, seqlocks, completions, mutexes

Interrupts Cause asynchronous transfer of control Do not cause a thread switch Are more frequent than thread switches May interrupt other interrupt handlers Question: How does a kernel which uses transactional memory handle interrupts?

Using transactions in interrupt handlers 0x10 0x20 0x30 0x40 TX #1 { 0x10 } system_call() { XBEGIN modify 0x10 XEND } intr_handler() { XBEGIN modify 0x30 XEND } No tx in interrupts TX #1 { 0x10 } TX #2 { 0x30 } Interrupts abort active tx TX #1 { 0x10, 0x30 } Nest the transactions TX #1 { 0x10 } TX #2 { 0x30 } Multiple active transactions TX #1 { 0x10 } interrupt

Benefits of multiple active Tx Most flexibility for programmer –Interrupt handlers free to use Tx as necessary Aborts only when necessary –Interrupts are frequent Interrupt handlers stay independent Implies.. –Multiple transactions on a single thread !

Multiple transactions per thread Many transactions may be simultaneously active but at most one is running per thread –They can conflict with each other –Independent (no nesting relation) Stacked transactions –Transactions complete in LIFO order –Each thread has a logical stack of transactions Stacked transactions ideal for interrupts –Stack grows and shrinks as interrupts occur and complete

Multiple Tx Per Thread - Open questions What are the roles of HW and SW –ISA changes for managing multiple transactions –Efficient HW implementation Contention management must know about stacking –Stacked transactions can livelock Identifying other scenarios where this is useful –Non-interrupt cases? –Forms other than stacking? Program stack issues

Configurable Contention Management Contention can be heavy within OS –Transactions most effective when contention is rare OS contains programmer hints for contention management –RCU (read-change-update) favor readers –Seqlocks favor writers Hardware TM should accept programmer hints –XBEGIN takes contention mgmt parameter

Other issues considered in the paper Primitives for which transactional memory might not be suitable –Per-CPU data structures –Blocking operations I/O in transactions –Big issue for Linux I/O is frequently performed while spinlock held –May be possible to just allow it TLB shootdown

Implementation Implemented HTM as extensions to x86 –With multiple active transactions Modified many spinlocks in Linux kernel ( ) to use transactional memory Simulation environment –Simics machine simulator –16KB L1 ; 4MB L2 ; 256MB RAM –1 cycle/instruction, 200 cycle/memory miss

Preliminary Results We are booting Linux –Transactions speed up boot by ~2%

Fin