Resource Management in Data-Intensive Systems Bernie Acs, Magda Balazinska, John Ford, Karthik Kambatla, Alex Labrinidis, Carlos Maltzahn, Rami Melhem,

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

Three Perspectives & Two Problems Shivnath Babu Duke University.
Agreement-based Distributed Resource Management Alain Andrieux Karl Czajkowski.
SLA-Oriented Resource Provisioning for Cloud Computing
By Adam Balla & Wachiu Siu
Cloud Computing to Satisfy Peak Capacity Needs Case Study.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS Virtual Workplace Elevator Pitch Gernot Fels May 2009.
1 Placement (Scheduling) Optimal mapping of VMs – to physical hosts in a data center (cloud) – across multiple clouds Federation and bursting Multi-cloud.
CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.
Capacity Planning and Predicting Growth for Vista Amy Edwards, Ezra Freeloe and George Hernandez University System of Georgia 2007.
Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Resource Management in Virtualization-based Data Centers Bhuvan Urgaonkar Computer Systems Laboratory Pennsylvania State University Bhuvan Urgaonkar Computer.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
Data Center Infrastructure
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SOFTWARE DESIGN AND QUALITY GROUP INSTITUTE.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Challenges towards Elastic Power Management in Internet Data Center.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
IBM Global Services © Copyright IBM Corporation 2005 International Business Machines ITIL Capacity Management Deep Dive Chris Molloy IBM Distinguished.
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
Agenda Motion Imagery Challenges Overview of our Cloud Activities -Big Data -Large Data Implementation Lessons Learned Summary.
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
Group member: Kai Hu Weili Yin Xingyu Wu Yinhao Nie Xiaoxue Liu Date:2015/10/
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
May l Washington, DC l Omni Shoreham Parallels Virtuozzo Containers Roadmap Andrey Moruga Virtualization Product Manager, Parallels.
Challenges in the Next Generation Internet Xin Yuan Department of Computer Science Florida State University
Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Next Generation of Apache Hadoop MapReduce Owen
Copyright © 2010 Hitachi Data Systems. All rights reserved. Confidential – NDA Strictly Required Hitachi Storage Solutions Hitachi HDD Directions HDD Actual.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Understanding The Cloud
Organizations Are Embracing New Opportunities
Introduction to Operating Systems
Data Center Infrastructure
Ignacio Cano, Srinivas Aiyar, Arvind Krishnamurthy
An Introduction to Cloud Computing
Distributed Data Access and Resource Management in the D0 SAM System
Introduction to Operating System (OS)
Introduction.
GGF15 – Grids and Network Virtualization
Introduction to Operating Systems
O.S Lecture 13 Virtual Memory.
Specialized Cloud Architectures
Wide Area Workload Management Work Package DATAGRID project
Resource and Service Management on the Grid
Traditional Virtualized Infrastructure
Lecture Topics: 11/1 Hand back midterms
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Resource Management in Data-Intensive Systems Bernie Acs, Magda Balazinska, John Ford, Karthik Kambatla, Alex Labrinidis, Carlos Maltzahn, Rami Melhem, Paul Nowoczynski, Matthew Woitaszek, Mazin Yousif

Resource Utilization Problem Resource Management Perspectives – User: Application performance, cost, QoS (deadlines for interactivity) Need metering tools, job description language (e.g. JDL - developed in grid computing) – Provider: Power, physical space Network bandwidth, memory, CPU power, Disk I/O, space, Cost of metering

Resource Utilization Problem (cont’d) Overall Management Goals of Provider – Most efficient allocation of resources to meet service level agreements – Pricing model that drives users towards more efficient/predictable usage – Maintain a certain envelope of resource utilization – Difference to conventional super computing centers: Not only cores but network bandwidth, memory, disk Scheduling preference based on data locality

Common Challenges What should be guaranteed? – Example: SimpleDB returns whatever can be retrieved in 5s. Not applicable for science applications – Network bandwidth, storage throughput Management of Resources: Hardware – 3-4 year cycle, 20%/year – Resource discovery – Mapping optimized to user demand: – Upgrade based mapping history – Requires workload profiles -> elastic clustering, virtualization essential, applications servers Managmenet of Resources: Centralized Services/Software – Big databases – Visualization – Virtualization: as a packaging and delivery service (Testing/staging environment) Licensing, – Applications (Hadoop, R, …)

Hard Problems Failure & Recovery Resource Management – Cannot prevent, but estimate, over-provisioning – What level of failure protection is adequate? – Creeping failures – Real-time triage: extra cost -> often sampling only – Possible benefit: smaller set of libraries/apps – Two-tier approach? – Combined with security and other safety mechanisms Interactivity (Paradigm shift for batch environment) – Def: want to see what is happening right now, or in regular intervals – Intelligent placement of data – Reserve resources -> over-provisioning/waste – Different scheduling time scale: seconds to minutes vs ms SLAs for DIC workloads – Incorporating Power – Framework of SLAs for Science different than for commercial – Not clear whether that’s an agreement or optimization thing

Hard Problems (cont’d) Provisioning Framework – DIC application -> what resources am I going to need? – Hadoop friendly science applications – DIC framework configuration to adapt to user & HW profiles Performance Management – Granularity of Prediction (if predictable) – Co-location of workloads for efficiency – Real-time end-to-end scheduling (sometime too costly) Metrics, instrumentation – Blackbox vs grey vs transparent box alternatives