Download presentation
Presentation is loading. Please wait.
Published byPauline Skinner Modified over 9 years ago
1
Resource Management in Data-Intensive Systems Bernie Acs, Magda Balazinska, John Ford, Karthik Kambatla, Alex Labrinidis, Carlos Maltzahn, Rami Melhem, Paul Nowoczynski, Matthew Woitaszek, Mazin Yousif
2
Resource Utilization Problem Resource Management Perspectives – User: Application performance, cost, QoS (deadlines for interactivity) Need metering tools, job description language (e.g. JDL - developed in grid computing) – Provider: Power, physical space Network bandwidth, memory, CPU power, Disk I/O, space, Cost of metering
3
Resource Utilization Problem (cont’d) Overall Management Goals of Provider – Most efficient allocation of resources to meet service level agreements – Pricing model that drives users towards more efficient/predictable usage – Maintain a certain envelope of resource utilization – Difference to conventional super computing centers: Not only cores but network bandwidth, memory, disk Scheduling preference based on data locality
4
Common Challenges What should be guaranteed? – Example: SimpleDB returns whatever can be retrieved in 5s. Not applicable for science applications – Network bandwidth, storage throughput Management of Resources: Hardware – 3-4 year cycle, 20%/year – Resource discovery – Mapping optimized to user demand: – Upgrade based mapping history – Requires workload profiles -> elastic clustering, virtualization essential, applications servers Managmenet of Resources: Centralized Services/Software – Big databases – Visualization – Virtualization: as a packaging and delivery service (Testing/staging environment) Licensing, – Applications (Hadoop, R, …)
5
Hard Problems Failure & Recovery Resource Management – Cannot prevent, but estimate, over-provisioning – What level of failure protection is adequate? – Creeping failures – Real-time triage: extra cost -> often sampling only – Possible benefit: smaller set of libraries/apps – Two-tier approach? – Combined with security and other safety mechanisms Interactivity (Paradigm shift for batch environment) – Def: want to see what is happening right now, or in regular intervals – Intelligent placement of data – Reserve resources -> over-provisioning/waste – Different scheduling time scale: seconds to minutes vs ms SLAs for DIC workloads – Incorporating Power – Framework of SLAs for Science different than for commercial – Not clear whether that’s an agreement or optimization thing
6
Hard Problems (cont’d) Provisioning Framework – DIC application -> what resources am I going to need? – Hadoop friendly science applications – DIC framework configuration to adapt to user & HW profiles Performance Management – Granularity of Prediction (if predictable) – Co-location of workloads for efficiency – Real-time end-to-end scheduling (sometime too costly) Metrics, instrumentation – Blackbox vs grey vs transparent box alternatives
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.