Perfctr-Xen: A framework for Performance Counter Virtualization

Slides:



Advertisements
Similar presentations
Diagnosing Performance Overheads in the Xen Virtual Machine Environment Aravind Menon Willy Zwaenepoel EPFL, Lausanne Jose Renato Santos Yoshio Turner.
Advertisements

Virtualization Technology
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Virtualisation From the Bottom Up From storage to application.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
3.5 Interprocess Communication
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Fall 2014 Presented By: Probir Roy.
Virtualization for Cloud Computing
虛擬化技術 Virtualization and Virtual Machines
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
Tanenbaum 8.3 See references
Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2009 Seminar #1 VIRTUALIZATION EVERYWHERE.
Attilio Rao FreeBSD developers summit 2012 FreeBSD Xen status update.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall Department of Computer Science Columbia University
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Xen I/O Overview.
Improving Network I/O Virtualization for Cloud Computing.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall Department of Computer Science Columbia University
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
SplitX: Split Guest/Hypervisor Execution on Multi-Core 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
CS533 Concepts of Operating Systems Jonathan Walpole.
High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Improving Xen Security through Disaggregation Derek MurrayGrzegorz MilosSteven Hand.
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Technical Reading Report Virtual Power: Coordinated Power Management in Virtualized Enterprise Environment Paper by: Ripal Nathuji & Karsten Schwan from.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
CSE598c - Virtual Machines - Spring Diagnosing Performance Overheads in the Xen Virtual Machine EnvironmentPage 1 CSE 598c Virtual Machines “Diagnosing.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Understanding Virtualization Overhead.
Chapter 6 Limited Direct Execution Chien-Chung Shen CIS/UD
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
7/2/20161 Re-architecting VMMs for Multicore Systems: The Sidecore Approach Presented by: Sanjay Kumar PhD Candidate, Georgia Institute of Technology Co-Authors:
Kernel-based Virtual Machine (KVM) Memory Management Unit (MMU) Virtualization Mohammad H. Mofrad February 11, 2016
Chapter 4 – Thread Concepts
Virtualization for Cloud Computing
Introduction to Virtualization
Virtual Machine Monitors
Virtualization Technology
Processes and threads.
Presented by Yoon-Soo Lee
Virtualization Dr. Michael L. Collard
OPERATING SYSTEMS CS3502 Fall 2017
Mechanism: Limited Direct Execution
Chapter 4 – Thread Concepts
Virtualization overview
Virtual Machines Calum Aldridge.
Xen: The Art of Virtualization
Virtual Servers.
Running other code under LINUX
What we need to be able to count to tune programs
OS Virtualization.
Virtualization Techniques
Computer-System Architecture
BIC 10503: COMPUTER ARCHITECTURE
The Design & Implementation of Hyperupcalls
Prof. Leonardo Mostarda University of Camerino
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment. Phone:
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

Perfctr-Xen: A framework for Performance Counter Virtualization Ruslan Nikolaev and Godmar Back Virginia Polytechnic Institute Blacksburg

Overview IaaS widely use virtual machine monitors Type 1 hypervisors: Xen, KVM, ESX … Commonly used performance analysis tools (e.g., PAPI) cannot be used because existing VMM and guests do not provide necessary per-thread virtualization support for hardware event counters Our contribution: Perfctr-Xen: Framework for performance counter virtualization Software-compatible with widely used perfctr library Techniques for collaboration of guest and hypervisor Experimental validation

Existing Performance Counter Virtualization Solutions XenoProf Extension of Oprofile system-wide profiler Does not provide per-domain abstraction of hardware counter facilities (supports only 1 domain at a time) VPMU driver Treats PMU registers like ordinary registers (saved/restored by VMM) Requires use of hardware assisted virtualization mode; support for limited number of architecture generations since VMM must contain architecture-specific code Not compatible with all architectures

High-level performance counters Perfctr-Xen Perfctr (Native) Perfctr-Xen PerfExplorer, HPCToolkit, etc. Profilers PerfExplorer, HPCToolkit, etc. Software Compatible High-level performance counters PAPI PAPI Low-level performance counters Perfctr Lib Perfctr Lib Software engineering points: minor changes to perfctr lib, and complete reuse of architecture-dependent code in both guest kernel and hypervisor driver. Kernel Guest Kernel Perfctr Driver Perfctr-Xen Driver Perfctr Guest Driver Xen Hypervisor

Per-thread PMU Virtualization Logical per-thread value includes only events incurred during the thread execution Intradomain Switch Thread 0 Thread 1 Domain 0 Animation. The diagram shows that intra-domain and inter-domain switches will affect how events must be counted. Since we have global counters, we have to take both type of context switches into account to correctly implement per-thread virtualization. Interdomain Switch Domain 1

Perfctr Library Modes of operation A-mode: an event count in some region of a program I-mode: an interrupt after a certain number of events has occurred Perfctr library API or /dev/perfctr User level User level threads have direct access! Low overhead! No system call required! - Requires that kernel updates and exposes data structures to user level thread. Read-only memory mapping to user level Sumthread, Startthread Sumthread, Startthread Sumthread, Startthread Kernel level Threads CPU1 CPU2 Hardware 6

Perfctr: A-mode counters Sumthread records accumulated event count Kernel records physical value to Startthread = Phys(t1) Thread samples physical value Phys(t2), computes Logical value Logthread = Sumthread + (Phys(t2) – Startthread) Kernel increments Sumthread = Sumthread + (Phys(t3) – Startthread) Sumthread t1 t2 t3 Advantage: no expensive writes to PMU registers Thread 0 Thread 1 Sumthread Phys(t3)-Startthread Phys(t2)-Startthread

Perfctr: I-mode counters PMU registers trigger interrupt on zero-overflow Physical register initialized to negated sample period Requires that physical value be saved & restored on each context switch Compute logical accumulated value similar to a-mode Time Change the order -5000 -1, 0 -5000 -1, 0 -5000 -1, 0 Events

Perfctr-Xen: A-mode counters Requires cooperation of guest kernel and hypervisor: Guest: maintains per-thread state: Sumthread, Startthread Hypervisor: a per-VCPU (Virtual CPU) state: Sumvcpu, Startvcpu Guest kernel makes per-VCPU state available user threads Sumvcpu, Startvcpu Sumthread, Startthread Sumthread, Startthread Sumthread, Startthread Guest Threads Talk about what a VCPU is (one sentence) Introduce who updates which data structures Read-only mapping Sumvcpu, Startvcpu Sumvcpu, Startvcpu Hypervisor VCPUs CPU Hardware

Perfctr-Xen: A-mode counters Startvcpu = Phys(t1) t1 t2 Thread 0 Thread 1 Sumthread Sumvcpu Phys(t2)-Startvcpu Domain 0 Domain 1

Perfctr-Xen: A-mode counters Hypervisor: Sumvcpu = 0, Startvcpu = Phys(t1) Guest: Start*thread = Sumvcpu + (Phys(t2) – Startvcpu) Hypervisor: Sumvcpu = Sumvcpu + (Phys(t3) – Startvcpu) Hypervisor: Startvcpu = Phys(t4) Logthread = Sumthread + Sumvcpu + (Phys(t5) – Startvcpu) - Start*thread t1 t2 Start*thread t4 t5 Thread 0 Thread 1 t3 Hypercall: Activate configuration Sumvcpu ←0 Startvcpu ←Phys(t1) Sumthread Sumvcpu Phys(t5)-Startvcpu Additional benefits: hypercall batching, hypercall is not always necessary, overhead can be accounted for Domain 0 Domain 1

Perfctr-Xen: I-mode counters Suspension hypercall to increment Sumvcpu and sample Startvcpu Resumption hypercall to restore per-VCPU values Inter-domain context switch Intra-domain context switch Physical Counter Registers Sumvcpu, Startvcpu Sumthread, Startthread Save Save Restore Restore Logthread = Sumvcpu + (Phys(t) – Startvcpu)

Perfctr-Xen: Interrupt delivery Hypervisor delivers overflow interrupts to guest via VIRQ_PERFCTR virtual interrupts Upon receipt, guest kernel signals user thread Virtual interrupts are delivered asynchronously (as soft interrupts) Guest must ensure that overflow interrupt is delivered to correct thread by rechecking overflow status If thread causing overflow is suspended before virtual interrupt arrives at guest, mark as pending and deliver on next resume

Experimental Results Baseline: native execution Exercise multiple VCPU/PCPU scenarios Exercise multiple virtualization modes Paravirtualization Hardware-assisted virtualization (HVM) Hybrid mode (HVM + guest enhancement) Correctness of implementation and accuracy of results Microbenchmarks for a-mode, PAPI test for i-mode Macrobenchmarks: SPEC CPU 2006 Verify Profiling (HPCToolkit)

Microbenchmarks Each domain on 2 dedicated PCPUs; each thread on a dedicated VCPU. Each domain on a dedicated PCPU; all threads in a domain on a shared VCPU. All domains on a shared PCPU; all threads on a shared VCPU. Random migration PCPUs and VCPUs

SPEC CPU2006: L2 Cache Misses Native mode Fully-virtualized Dom1 and Dom2, each on a dedicated core Fully-virtualized Dom1 and Dom2 on the same core Paravirtualized Dom0 and Dom1, each on a dedicated core Paravirtualized Dom0 and Dom1 on the same core

SPEC CPU2006: L2 Cache References Native mode Fully-virtualized Dom1 and Dom2, each on a dedicated core Fully-virtualized Dom1 and Dom2 on the same core Paravirtualized Dom0 and Dom1, each on a dedicated core Paravirtualized Dom0 and Dom1 on the same core

Related Work Performance counter support for VMM XenoProf [Menon 2005] Counter Virtualization for KVM [Du 2010, 2011] VTSS++ system [Bratanov 2009] Performance counters in non-virtualized systems perf_counter, Perfmon [Eranian 2006], Intel VTune, AMD Code Analyst Higher-level libraries: PAPI [Browne 1999]

Conclusion PerfCtr-Xen Available at http://people.cs.vt.edu/~rnikola/ Efficient and accurate per-thread virtualization of hardware event counters Supports all commonly used virtualization modes Plug-in Compatibility with PAPI, HPCToolkit, etc. Techniques extend to other Type I hypervisors and low-level virtualization libraries Available at http://people.cs.vt.edu/~rnikola/ (LGPL license)