By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

Slides:

Advertisements

Similar presentations

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

Orchestrated Scheduling and Prefetching for GPGPUs Adwait Jog, Onur Kayiran, Asit Mishra, Mahmut Kandemir, Onur Mutlu, Ravi Iyer, Chita Das.

ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors Mohammad Hammoud, Sangyeun Cho, and Rami Melhem Presenter: Socrates Demetriades.

FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.

SLA-Oriented Resource Provisioning for Cloud Computing

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,

Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

SLA-aware Virtual Resource Management for Cloud Infrastructures

1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.

SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

How to read and critique a technical paper?. 3 phases to reading Determine if there is anything interesting at all in the paper. Determine which portion.

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.

A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.

Cloud Computing Energy efficient cloud computing Keke Chen.

Virtual Machine Scheduling for Parallel Soft Real-Time Applications

Improving Network I/O Virtualization for Cloud Computing.

StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments Author Linlin Wu, Saurabh Kumar Garg and Rajkumar.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.

Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.

Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.

Challenges towards Elastic Power Management in Internet Data Center.

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Profiling and Modeling Resource Usage.

SAN FRANCISCO, CA, USA Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can HankendiAyse K. Coskun Boston.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

Energy Management in Virtualized Environments Gaurav Dhiman, Giacomo Marchetti, Raid Ayoub, Tajana Simunic Rosing (CSE-UCSD) Inside Xen Hypervisor Online.

Instrumentation of Xen VMs for efficient VM scheduling and capacity planning in hybrid clouds. Kurt Vermeersch Coordinator: Sam Verboven.

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk.

Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.

Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.

CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone.

Full and Para Virtualization

BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.

Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.

Copyright © 2010, Performance and Power Management for Cloud Infrastructures Hien Nguyen Van; Tran, F.D.; Menaud, J.-M. Cloud Computing (CLOUD),

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Authors: Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Sunpyo Hong, Hyesoon Kim

Speaker : Kyu Hyun, Choi. Problem: Interference in shared caches – Lack of isolation → no QoS – Poor cache utilization → degraded performance.

Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,

Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

Measuring Performance II and Logic Design

Is Virtualization ready for End-to-End Application Performance?

Adaptive Cache Partitioning on a Composite Core

Cloud-Assisted VR.

Cloud-Assisted VR.

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Applying SVM to Data Bypass Prediction

Presentation transcript:

Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and Sadagopan Srinivasan

Abstract Datacenters employ server consolidation to maximize the efficiency of platform resource usage. Impacts on their performance. Focus: Use of shared resource monitoring to Understand the resource usage. Collect resource usage and performance. Migrate VMs that are resource-constrained. Result : To improve overall datacenter throughput and improve Quality of Service (QoS)

Focus Monitor and address shared cache contention. Propose a new optimization metric that captures the priority of the VM and the overall weighted throughput of the datacenter Conduct detailed experiments emulating data center scenarios including on-line transaction processing workloads. Results: Monitoring shared resource contention is highly beneficial to better manage throughput and QoS in a cloud-computing datacenter environment.

Keyword Benchmark- TPCC, SPECjAppServer, SPECjbb, PARSEC Virtualization LLC – Last Level Cache Shared-cache CMP – Chip Multiprocessing Cache Contention Virtual Platform Architecture MPI – Misses Per Instruction IPC – Instruction Per Cycle

Outline Introduction Background and Motivation Proposed Approach Simulation Related Work Summary and Conclusions.

Introduction Evolved data center with large number of heterogeneous applications running within virtual machines on each Platform. Vsphere Service Level Agreements. Key Aspects: Shared Resource Monitoring VM Migration QoS and Datacenter Throughput

Contribution A simple methodology of using cache occupancy for shared cache environment. New optimization metric that captures QoS as part of the throughput measure of the datacenter. Detailed Experiments emulating data center scenario resulting in improvement in QoS and throughput. Work is unique as it addresses application/VM scheduling in the context of SLAs. Management of the shared cache occupancy. Focus on shared cache contention which has first-order impact on performance. LLC monitoring.

Typical Datacenter Platform and VM Usage

Background and Motivation Cloud-computing virtualized datacenters of the future will have machines that are based on CMP architecture with multiple cores sharing the same LLC. Measured the performance of Intel’s latest Core2 Duo platform when running all 26applications (in Windows XP) from the SPEC CPU2000 benchmark suite individually and in pair-wise mode .

Impact of Cache/Memory Contention

Cache sensitivity of Server Workloads

TPCC performance while co-running with other workloads on same shared LLC

Proposed MIMe Approach Key components : Mechanism used to monitor VM resource usage and identifying VMs that suffer due to resource contention. Techniques used to identify candidate VMs for migration based on priorities and their behavior to achieve improved weighted throughput and determinism across priorities. A metric that quantifies the goodness/efficiency of the datacenter as weighted throughput measure.

MIMe Key components to improve the efficiency of datacenter weighted throughput

Monitoring resource usage VPA architecture VPAID

IPC sensitivity for TPCC

Identifying VM candidates for migration Two key factors : VMs priority as agreed upon by an SLA Behavior. E.g. cache sensitivity . Example scenarios wherein an application like TPCC can exhibit a huge variation in performance depending on co-scheduled application.

The basic algorithm to identify a candidate VM for migration

Goal No VM of interest that has a higher priority but runs less efficiently than a VM of lower priority after the migrations. The whole process is cyclic which ensures that workload phases change or changes in SLAs with customers can be addressed with ease

Metric to quantify the efficiency of a datacenter Measure - Total System IPC. Benchmarking propose a Vconsolidate concept using weights associated to workload performance. Weighted normalized performance metric Our New metric that would incorporate the QoS value as part of the throughput measure : QoS-Weighted throughput performance metric

RESULTS AND ANALYSIS Simulation based methodology that uses CMPSched$im - Parallel multi-core performance simulator. Utilizes the Pin binary instrumentation system to evaluate the performance of single-threaded, multi-threaded, and multi- programmed workloads on a single/multi-core processor . Dynamically feeds instructions and memory references to the simulator. Modified to be used as a trace-driven simulator. Server workload traces for TPCC, Specjbb, SPECjAppServer, indexing workload and parsec . Result: In the absence of any type of enforcement mechanisms being available in the hardware to control the cache occupancy, we have to rely only on monitoring information to make scheduling decisions.

TPCC IPC and Occupancy with QoS values

After Migration TPCC IPC and Occupancy with QoS values

Effect of minimizing contention for HP applications

Mean IPC after VM migration for reducing cache contention for HP applications

Mean IPC after VM migration for reducing cache contention for HP applications

Experiment Result Logically clustering identical machines together, then applying the migration policy. The overall scrore increases 8% for TPCC workloads. With SjappServer workloads the increase is 4.5%

RELATED WORK All other study have focused on a single machine not virtualized environments. Recently, a few studies like - Cherkasova and Enright Jerger and have focused on sharing in caches and for better scheduling policies. We show how identical machines can be logically clustered, and based on VPA monitoring how higher priority applications that we care about are always guaranteed to get more platform resources (cache) than lower priority applications. We also propose a new metric that incorporates QoS as the throughput measure

CONCLUSION Problem of contention in the shared cache is a critical problem in virtualized cloud computing data centers. High priority applications can suffer if scheduling at a data center level is not done with cache contention in mind. How it can be solved without waiting for enforcement mechanisms to be available in the shared LLC . A very simple solution based on a VPA architecture

Future Work Incorporating memory bandwidth as part of the VPA architecture. Scheduling optimizations. Profiling of VMs to take scheduling decisions. Monitoring and Enforcement for cache, memory bandwidth and also power can be very efficiently used

THANK YOU !!