March 12, 2001 Kperfmon-MP Multiprocessor Kernel Performance Profiling Alex Mirgorodskii Computer Sciences Department University of Wisconsin.

Slides:



Advertisements
Similar presentations
Computer-System Structures Er.Harsimran Singh
Advertisements

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Scheduler Activations Effective Kernel Support for the User-Level Management of Parallelism.
Abhinav Kamra Computer Science, Columbia University 2.1 Operating System Concepts Silberschatz, Galvin and Gagne  2002 Chapter 2: Computer-System Structures.
1 Chapter 4 Threads Threads: Resource ownership and execution.
Process Concept An operating system executes a variety of programs
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
© 2004, D. J. Foreman 2-1 Concurrency, Processes and Threads.
General System Architecture and I/O.  I/O devices and the CPU can execute concurrently.  Each device controller is in charge of a particular device.
Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.
Protection and the Kernel: Mode, Space, and Context.
2.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 2: Computer-System Structures Computer System Operation I/O Structure.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 2: Computer-System Structures Computer System Operation I/O Structure.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Chapter 2: Computer-System Structures
Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2 Computer-System Structures n Computer System Operation n I/O Structure.
1 CSE Department MAITSandeep Tayal Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
2: Computer-System Structures
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Chapter 2: Computer-System Structures 2.1 Computer System Operation 2.5 Hardware Protection 2.6 Network Structure.
1 Chapter 2: Computer-System Structures  Computer System Operation  I/O Structure  Storage Structure  Storage Hierarchy  Hardware Protection  General.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
© 2004, D. J. Foreman 2-1 Concurrency, Processes and Threads.
Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection Network Structure.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
4P13 Week 3 Talking Points 1. Process State 2 Process Structure Catagories – Process identification: the PID and the parent PID – Signal state: signals.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
© 2001 Barton P. MillerParadyn/Condor Week (12 March 2001, Madison/WI) The Paradyn Port Report Barton P. Miller Computer Sciences Department.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 2 Computer-System Structures Slide 1 Chapter 2 Computer-System Structures.
Introduction to Operating Systems and Concurrency.
Silberschatz, Galvin and Gagne  Applied Operating System Concepts Chapter 2: Computer-System Structures Computer System Architecture and Operation.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
© 1999 Ariel TamchesFebruary 19, 1999OSDI ‘99 Fine-Grained Dynamic Instrumentation of Commodity Operating System Kernels Ariel Tamches Barton P. Miller.
Managing Processors Jeff Chase Duke University. The story so far: protected CPU mode user mode kernel mode kernel “top half” kernel “bottom half” (interrupt.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
© 2001 Week (14 March 2001)Paradyn & Dyninst Demonstrations Paradyn & Dyninst Demos Barton P. Miller Computer.
Chapter 2: Computer-System Structures(Hardware) or Architecture or Organization Computer System Operation I/O Structure Storage Structure Storage Hierarchy.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 4: Processes Process Concept Process Scheduling Types of shedulars Process.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Kernel Code Coverage Nilofer Motiwala Computer Sciences Department
Threads & Multithreading
Chapter 2: Computer-System Structures(Hardware)
Chapter 2: Computer-System Structures
Process Management Process Concept Why only the global variables?
CS 6560: Operating Systems Design
CS703 – Advanced Operating Systems
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
CS140 – Operating Systems Midterm Review
Computer-System Architecture
Module 2: Computer-System Structures
Efficient x86 Instrumentation:
Module 2: Computer-System Structures
CS333 Intro to Operating Systems
Chapter 2: Computer-System Structures
Chapter 2: Computer-System Structures
Module 2: Computer-System Structures
Module 2: Computer-System Structures
Adaptive Operating Systems: An Architecture for Evolving Systems
Presentation transcript:

March 12, 2001 Kperfmon-MP Multiprocessor Kernel Performance Profiling Alex Mirgorodskii Computer Sciences Department University of Wisconsin 1210 W. Dayton Street Madison, WI USA

– 2 –Kperfmon-MPMarch 12, 2001 Kperfmon: Overview Specify a resource –Almost any function or basic block in the kernel Apply a metric to the resource: –Number of entries to a function or basic block –Wall clock time, CPU time (virtual time) –All Sparc Hardware Counters: cache misses, branch mispredictions, instructions per cycle,... Visualize the metric data in real time

– 3 –Kperfmon-MPMarch 12, 2001 Kperfmon-MP: Goals Modify uniprocessor Kperfmon to provide: Safe operation on SMP machines –Thread safety –Migration safety New feature: Per-CPU performance data –More detailed performance data –Reduce cache coherence traffic caused by the tool

– 4 –Kperfmon-MPMarch 12, 2001 Kperfmon: Technology No need for kernel recompilation –Works with stock SPARC Solaris 7 kernels –Supports both 32-bit and 64-bit kernels No need for rebooting –Important for 24 x 7 systems Use the KernInst framework to: –Insert measurement code in the kernel at run time –Sample accumulated metric values from the user space periodically

– 5 –Kperfmon-MPMarch 12, 2001 Patch Heap Data Heap Kernel Space Instrumentation request Instrumentation request ioctl() /dev/kerninst KperfmonKperfmon Kperfmon SystemKerninstdKerninstd Sampling request Sampling request VisisVisis

– 6 –Kperfmon-MPMarch 12, 2001 Kperfmon instrumentation Counter primitive –Number of entries to a function or a basic block Wall clock timer primitive –Real time spent in a function CPU timer primitive –Excludes time while the thread was switched-out –Can count more than just timer ticks All HW-counter metrics use this mechanism

– 7 –Kperfmon-MPMarch 12, 2001 tcp_lookup() cnt sethi hi(&cnt), r0 or r0, lo(&cnt), r0 ldx [r0], r1 retry: add r1, 1, r2 casx [r0], r1, r2 cmp r1, r2 bne retry mov r2, r1 nop ba,a tcp_lookup+4 Code Patch Area Data Area (entry) Non-MP Counter primitive Atomic, thread-safe update Lightweight No register save/restore required Relocated instruction

– 8 –Kperfmon-MPMarch 12, 2001 tcp_lookup() stop timer start timer Code Patch Area Data Area (entry) (exit) Non-MP Wall clock timer primitive Inclusive (includes time in callees) Keeps accumulating if switched-out timer

– 9 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism stop timer start timer pause timer restart timer context switch List of paused timers head free

– 10 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer pause timer restart timer context switch List of paused timers head free Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism

– 11 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer pause timer restart timer context switch List of paused timers head free Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism

– 12 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer pause timer restart timer context switch List of paused timers head freetmr Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism

– 13 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer pause timer restart timer context switch head free List of paused timers tmr Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism

– 14 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer pause timer restart timer context switch head free List of paused timers Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism

– 15 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer pause timer restart timer context switch head free List of paused timers Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism

– 16 –Kperfmon-MPMarch 12, 2001 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer pause timer restart timer context switch head free List of paused timers Non-MP CPU timer primitive Exclude the time spent while switched out –Instrument context switch routines HW counter metrics are based on this mechanism

– 17 –Kperfmon-MPMarch 12, 2001 Kperfmon-MP: Goals Modify uniprocessor Kperfmon to provide: Safe operation on SMP machines –Thread safety –Migration safety New feature: Per-CPU performance data –More detailed performance data –Reduce cache coherence traffic caused by the tool

– 18 –Kperfmon-MPMarch 12, 2001 ld [head], R1 add R1, 4, R1 st R1, [head] Non-MP timer allocation routine Thread Safety Used on switch-out to save the paused timers Context switch is serial on uniprocessors –No thread safety problems there Context switches may be concurrent on SMPs! –Multiple threads are being scheduled simultaneously –The allocation code is no longer safe head freetmrfree

– 19 –Kperfmon-MPMarch 12, 2001 MP timer allocation routine Thread Safety Context switches may be concurrent on SMPs Use the atomic cas instruction to ensure safety alloc: ld [head], R1 add R1, 4, R2 cas [head], R1, R2 cmp R1, R2 bne alloc head freetmrfree

– 20 –Kperfmon-MPMarch 12, 2001 tcp_lookup() cnt-cpu0 … rd cpu#, r0 ldx cnt[r0], r1 add r1, 1, r2 casx r2, cnt[r0] … Code Patch Area Data Area (entry) Per-CPU performance data Instrumentation code is shared by all CPUs Per-CPU copies of the primitive’s data –Two copies are never placed in the same cache line cnt-cpu1 cnt-cpu31

– 21 –Kperfmon-MPMarch 12, 2001 timer-cpu0 Data Area timer-cpu1 Migration Between Primitives Wall timer started on CPU0, stopped on CPU1 Counters and CPU timers are not affected tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0

– 22 –Kperfmon-MPMarch 12, 2001 timer-cpu0 Data Area timer-cpu1 Migration Between Primitives Wall timer started on CPU0, stopped on CPU1 Counters and CPU timers are not affected tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0

– 23 –Kperfmon-MPMarch 12, 2001 timer-cpu0 Data Area timer-cpu1 Migration Between Primitives Wall timer started on CPU0, stopped on CPU1 Counters and CPU timers are not affected tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0

– 24 –Kperfmon-MPMarch 12, 2001 timer-cpu0 Data Area timer-cpu1 Migration Between Primitives Wall timer started on CPU0, stopped on CPU1 Counters and CPU timers are not affected tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0 CPU1

– 25 –Kperfmon-MPMarch 12, 2001 timer-cpu0 Data Area timer-cpu1 Migration Between Primitives Wall timer started on CPU0, stopped on CPU1 Counters and CPU timers are not affected tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0 CPU1

– 26 –Kperfmon-MPMarch 12, 2001 timer-cpu0 Data Area timer-cpu1 Migration Between Primitives Wall timer started on CPU0, stopped on CPU1 Counters and CPU timers are not affected tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0 CPU1

– 27 –Kperfmon-MPMarch 12, 2001 Solution: virtualization Implement wall timers on top of CPU timers! Data Area tcp_lookup() (entry) (exit) start timer CPU0 timer-cpu0 timer-cpu1

– 28 –Kperfmon-MPMarch 12, 2001 Solution: virtualization Implement wall timers on top of CPU timers! Data Area tcp_lookup() (entry) (exit) start timer CPU0 timer-cpu0 timer-cpu1

– 29 –Kperfmon-MPMarch 12, 2001 Solution: virtualization Implement wall timers on top of CPU timers! Data Area tcp_lookup() switch-out (entry) (exit) start timer context switch CPU0 timer-cpu0 timer-cpu1 pause timer record curr. time

– 30 –Kperfmon-MPMarch 12, 2001 Solution: virtualization Implement wall timers on top of CPU timers! Data Area timer-cpu1 tcp_lookup() switch-out switch-in (entry) (exit) start timer context switch CPU0 CPU1 timer-cpu0 pause timer record curr. time add time switched-out restart timer

– 31 –Kperfmon-MPMarch 12, 2001 Solution: virtualization Implement wall timers on top of CPU timers! Data Area timer-cpu1 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0 CPU1 timer-cpu0 add time switched-out restart timer pause timer record curr. time

– 32 –Kperfmon-MPMarch 12, 2001 Solution: virtualization Implement wall timers on top of CPU timers! Data Area timer-cpu1 tcp_lookup() switch-out switch-in (entry) (exit) stop timer start timer context switch CPU0 CPU1 timer-cpu0 add time switched-out restart timer pause timer record curr. time

– 38 –Kperfmon-MPMarch 12, 2001 Conclusion Techniques for correct MP profiling: –Atomic memory updates to ensure thread safety –Virtualized timers to handle thread migration Per-CPU data collection is important –Provides detailed performance information –Introduces fewer coherence cache misses

– 39 –Kperfmon-MPMarch 12, 2001 Future Work New metrics –Locality of CPU assignments –Per-thread performance data Formal verification of instrumentation code for migration/preemption problems Ports to other architectures and OS’es

– 40 –Kperfmon-MPMarch 12, The Big Picture

– 41 –Kperfmon-MPMarch 12, 2001 The Big Picture Demo: Wednesday, Room 6372Demo: Wednesday, Room 6372 Available for download on requestAvailable for download on request –mailto: –Public release in April