Jeongseob Ahn*, Chang Hyun Park‡, Taekyung Heo‡, Jaehyuk Huh‡

Slides:



Advertisements
Similar presentations
Diagnosing Performance Overheads in the Xen Virtual Machine Environment Aravind Menon Willy Zwaenepoel EPFL, Lausanne Jose Renato Santos Yoshio Turner.
Advertisements

Virtualization Technology
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.
A Secure System-wide Process Scheduling across Virtual Machines Hidekazu Tadokoro (Tokyo Institute of Technology) Kenichi Kourai (Kyushu Institute of Technology)
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
5: CPU-Scheduling1 Jerry Breecher OPERATING SYSTEMS SCHEDULING.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Fall 2014 Presented By: Probir Roy.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Supporting GPU Sharing in Cloud Environments with a Transparent
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall Department of Computer Science Columbia University
Virtual Machine Scheduling for Parallel Soft Real-Time Applications
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall Department of Computer Science Columbia University
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Profiling and Modeling Resource Usage.
SAN FRANCISCO, CA, USA Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can HankendiAyse K. Coskun Boston.
Embedded System Lab. 오명훈 Memory Resource Management in VMware ESX Server Carl A. Waldspurger VMware, Inc. Palo Alto, CA USA
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Revisiting Hardware-Assisted Page Walks for Virtualized Systems
Virtualization Infrastructure Administration
Difference of Degradation Schemes among Operating Systems -Experimental analysis for web application servers- Hideaki Hibino*(Tokyo Tech) Kenichi Kourai.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
Fall 2013 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk.
VTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella,
Min Lee, Vishal Gupta, Karsten Schwan
Processor Architecture
Full and Para Virtualization
Processes & Threads Introduction to Operating Systems: Module 5.
Real Time System with MVL. © 2007 MontaVista Confidential | Overview of MontaVista Agenda Why Linux Real Time? Linux Kernel Scheduler Linux RT technology.
1 OS Review Processes and Threads Chi Zhang
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Managing Processors Jeff Chase Duke University. The story so far: protected CPU mode user mode kernel mode kernel “top half” kernel “bottom half” (interrupt.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
Unit 4: Processes, Threads & Deadlocks June 2012 Kaplan University 1.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Understanding Virtualization Overhead.
7/2/20161 Re-architecting VMMs for Multicore Systems: The Sidecore Approach Presented by: Sanjay Kumar PhD Candidate, Georgia Institute of Technology Co-Authors:
vCAT: Dynamic Cache Management using CAT Virtualization
Guy Martin, OSLab, GNU Fall-09
Is Virtualization ready for End-to-End Application Performance?
Processes and threads.
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
Process Management Process Concept Why only the global variables?
Achieving Multiprogramming Scalability of Parallel Programs on Intel SMP Platforms: Nanothreading in the Linux Kernel Christos D. Antonopoulos Panagiotis.
Demand-Based Coordinated Scheduling for SMP VMs
Intro to Processes CSSE 332 Operating Systems
Comparison of the Three CPU Schedulers in Xen
Chapter 6: CPU Scheduling
Address Translation for Manycore Systems
CSE591 October Rotation Report Haoran Li Nov
Chapter 15, Exploring the Digital Domain
HW & Systems: Operating Systems IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, October 22, 2013 Carolyn Seaman University of Maryland, Baltimore.
CPU SCHEDULING.
The Design & Implementation of Hyperupcalls
Hardware Counter Driven On-the-Fly Request Signatures
Chapter 5: CPU Scheduling
CSE 153 Design of Operating Systems Winter 2019
Progress Report 2015/01/28.
Xing Pu21 Ling Liu1 Yiduo Mei31 Sankaran Sivathanu1 Younggyun Koh1
Presentation transcript:

Jeongseob Ahn*, Chang Hyun Park‡, Taekyung Heo‡, Jaehyuk Huh‡ Accelerating Critical OS Services in Virtualized Systems with Flexible Micro-sliced Cores Jeongseob Ahn*, Chang Hyun Park‡, Taekyung Heo‡, Jaehyuk Huh‡ * ‡

Challenge of Server Consolidation Fraction of time 0.5 1 Utilization Consolidation improves system utilization However, resources are contended

Challenge of Server Consolidation Fraction of time 0.5 1 Utilization Consolidation improves system utilization However, resources are contended

So, What Can Happen? Virtual time discontinuity Virtual Time Running Waiting vCPU 0 Virtual Time VMEXIT / VMENTER vCPU 1 pCPU 0 Physical Time Time shared

So, What Can Happen? Virtual time discontinuity Virtual Time Running Waiting vCPU 0 Virtual Time ➊ Spinlock waiting time(gmake) VMEXIT / VMENTER ➋ TLB synchronization latency (μsec) Kernel Component Avg. waiting time (μsec) solo co-run* Page reclaim 1.03 420.13 Page allocator 3.42 1,053.26 Dentry 2.93 1,298.87 Runqueue 1.22 256.07 vCPU 1 Avg. Min. Max. dedup solo 28 5 1927 co-run* 6,354 7 74915 vips 55 2052 14,928 17 121548 pCPU 0 Physical Time Time shared ➌ I/O latency & throughput (iPerf) * Concurrently running with Swaptions of PARSEC Jitters (ms) Throughput (Mbits/sec) solo 0.0043 936.3 mixed co-run* 9.2507 435.6 Processing time is amplified

How about Shortening Time Slice? vCPU 1 pCPU 0 vCPU 0 vCPU 2 Time slice T Time shared Waiting time Shortening time slice is very simple and powerful, but the overhead of frequent context switches is significant vCPU 1 pCPU 0 vCPU 0 vCPU 2 Reduced waiting time T Time shared 𝑊𝑎𝑖𝑡𝑖𝑛𝑔 𝑡𝑖𝑚𝑒=(# 𝑎𝑐𝑡𝑖𝑣𝑒 𝑣𝐶𝑃𝑈𝑠 −1)∗𝑡𝑖𝑚𝑒 𝑠𝑙𝑖𝑐𝑒

Approach: Dividing CPUs into Two Pools ➊ Normal pool ➋ Micro-sliced pool - quickly but briefly schedule vCPUs Shortened time slice pCPU 3 vCPU 1 vCPU 0 vCPU 2 vCPU 1 vCPU 0 vCPU 2 Waiting time Time slice pCPU 0 Serving the main work of applications Serving critical OS services to minimize the waiting time

Challenges in Serving Critical OS Services on Micro-sliced Cores 1. Precise detection of urgent tasks 2. Guest OS transparency 3. Dynamic adjustment of micro-sliced cores

Detecting Critical OS Services Instruction pointer 0x8106ed62 ➊ Examining the instruction pointer (a.k.a PC) whenever a vCPU yields its pCPU workloads # of yields solo co-run* gmake 79,440 295,262,662 exim 157,023 24,102,495 dedup 290,406 164,578,839 vips 644,643 57,650,538 * Concurrently running with Swaptions of PARSEC

Profiling Virtual CPU Scheduling Logs Investigating frequently preempted regions vCPU scheduling trace (w/ Inst. Pointer) Module Operation sched scheduler_ipi() resched_curr() … mm flush_tlb_all() get_page_from_freelist() irq irq_enter() irq_exit() spinlock __raw_spin_unlock() __raw_spin_unlock_irq() In our paper, you can find the table in details Critical Guest OS Components Kernel symbol tables Instruction pointer and kernel symbols enable to precisely detect vCPUs preempted while executing critical OS services without guest OS modification

Accelerating Critical Sections ➊ ➋ ➌ P0 P1 P2 P3 ➊ Yield occurring ➋ Investigating the preempted vCPUs ➌ Scheduling the selected vCPU on the micro-sliced pool

Accelerating Critical TLB Synchronizations ➊ ➋ ➌ P0 P1 P2 P3 ➍ ➊ Yield occurring ➋ Investigating the preempted vCPUs ➌ Scheduling the selected vCPU on the micro-sliced pool ➍ Dynamically adjusting micro-sliced cores based on profiling

Detecting Critical I/O Events vIPI vIRQ ➌ ➋ ➊pIRQ I/O handling consists of a chain of operations involving potentially multiple vCPUs

Experimental Environments Testbed 12 HW threads (Intel Xeon) 2 VMs with 12 vCPUs for each Xen hypervisor 4.7 Benchmarking workloads dedup and vips from PARSEC exim and gmake from MOSBENCH Pool configuration Normal: 30ms (Xen default) Micro-sliced: 0.1ms Xen hypervisor OS App 12 physical threads 12 virtual CPUs 2-to-1 consolidation ratio

Performance of Micro-sliced Cores [Our schemes] 1 3 3 3 1 3

Performance of Micro-sliced Cores [Our schemes] 1 3 3 8% gap 3 1 3

I/O Performance Workloads VM-1 iPerf lookbusy VM-2

Conclusions We introduced a new approach to mitigate the virtual time discontinuity problem Three distinct contributions Precise detection of urgent tasks Guest OS transparency Dynamic adjustment of the micro-sliced cores Overhead is very low for applications which do not frequently use OS services

Thank You! jsahn@ajou.ac.kr Accelerating Critical OS Services in Virtualized Systems with Flexible Micro-sliced Cores Jeongseob Ahn*, Chang Hyun Park‡, Taekyung Heo‡, Jaehyuk Huh‡ * ‡