grid & performability Aad van Moorsel aadvanmoorsel.com
page 2April 2003 Copyright Aad van Moorsel, HP Labs outline to set the stage: what is grid? what is performability? three perspectives on grid performability: `customer requirements system implementation – utility computing associated research challenges – focus on stochastic modeling
page 3April 2003 Copyright Aad van Moorsel, HP Labs what is grid? what is performability?
page 4April 2003 Copyright Aad van Moorsel, HP Labs grid for me, and in this talk: middleware layer, Globus-like shares resources crosses boundaries – administrative domains, user domains, enterprise domains, … software-implemented boundaries – flexibility in who uses what when – flexibility in what is secured against whom when – flexibility in who charges for what when – … makes resources manageable – grades of QoS – dynamic management of QoS – service level agreements, business metrics and penalties
page 5April 2003 Copyright Aad van Moorsel, HP Labs performability for me, and in this talk: quality of service (QoS) context: Meyer: metric P(T<t) where T was some random variable my thesis: meaningful quantitative evaluation of a system (definition 2 out of 3) others: performance and reliability SPN models for system state, rewards or queuing networks for performance/metric
page 6April 2003 Copyright Aad van Moorsel, HP Labs grid & performability we accept the claim that grid is software that will facilitate flexible performability management the software design still leaves to be desired – automation? autonomous? autonomic? – scaling? inter-business? security? but the applications will drive it in the right direction – utility computing – service-centric outsourcing
page 7April 2003 Copyright Aad van Moorsel, HP Labs grid & performability `customer perspective
page 8April 2003 Copyright Aad van Moorsel, HP Labs business costs of owning and operating IT have gone through the roof
page 9April 2003 Copyright Aad van Moorsel, HP Labs business cost of IT failures downtime costs per hour brokerage operations$6,450,000 credit card authorization$2,600,000 e-bay (1 outage 22 hours)$225,000 amazon.com$180,000 package shipping services$150,000 home shopping channel$113,000 catalog sales center $90,000 airline reservation center$89,000 cellular service activation$41,000 on-line network fees $25,000 ATM service fees$14,000 source: Dave Patterson keynote at FAST 02 survey of computer damages in France, 2000
page 10April 2003 Copyright Aad van Moorsel, HP Labs courtesy of Lisa Spainhower, IBM operational complexity: scale
page 11April 2003 Copyright Aad van Moorsel, HP Labs operator faces heterogeneity CDN BPR dynamic composition database Utility ZLE, DBMS App server Utility Web server Utility load balancing UDC/QM/SF VMs Storage management RSVP
page 12April 2003 Copyright Aad van Moorsel, HP Labs operation faces federation needs
page 13April 2003 Copyright Aad van Moorsel, HP Labs customer needs business-driven, automated operator tools for systems with increasing scale, heterogeneity and federation challenges
page 14April 2003 Copyright Aad van Moorsel, HP Labs grid & performability system perspective (utility computing)
page 15April 2003 Copyright Aad van Moorsel, HP Labs twin UDCs in HP Labs built the first large utility data center in Palo Alto (US) and Bristol (UK) – learn what it takes to build a solution – move HPL IT services to the UDC the first virtualized data center – from Server, storage, networks to energy management – dynamically assigns applications to resources – customer sees resources as utility – operator sees resources as utility
page 16April 2003 Copyright Aad van Moorsel, HP Labs utility computing from usage perspective UDC1UDC2Server Cluster ? reserving resources getting resources flexing resources
page 17April 2003 Copyright Aad van Moorsel, HP Labs utility computing from operator perspective UDC/XML Interface Utility Data Center = programmable pool of data center resources UDC GRAM = Globus Gatekeeper + UDC Adapter UDC GRAM UDC GRAM Grid interface (prototype developed at HP Labs, initially gtk2, currently migrated to gtk3)
page 18April 2003 Copyright Aad van Moorsel, HP Labs title configure properties
page 19April 2003 Copyright Aad van Moorsel, HP Labs title generate RSL
page 20April 2003 Copyright Aad van Moorsel, HP Labs utility computing for operators utility computing has great potential to improve operations: better utilization of resources better tools for setting up applications new business models, better accountability but UDC is just one, high-end solution need something that is open, extensible, uniform, … grid based management backplane
page 21April 2003 Copyright Aad van Moorsel, HP Labs utility computing grid middleware everything is a Grid service leverage Grid HP value- add management OpenView orchestrates IT OpenView command and control SLA base Grid: uniform interface, single sign-on, federation, stateful services management backplane: monitoring, rich discovery, life-cycle, coordinated act, policy, biz-impact driven adaptation, flexible secure mgmt domains
page 22April 2003 Copyright Aad van Moorsel, HP Labs more automation: flexing resources objective: increase asset utilization via resource sharing while providing a desired quality of service for applications approach: a statistical multiplexing technique for resource utilities that host business applications characteristics of business applications: require resources continuously changes in number of users and workload mix may result in: –time varying demands –large peak to mean ratios for demand –future demands that are difficult to predict precisely customers want assurances they will get resources when needed –for example, resource request will be satisfied with a prob. p=0.999 –i.e. 999 times out of 1000 –customers dont always need an assurance of p=1.0
page 23April 2003 Copyright Aad van Moorsel, HP Labs statistical demand profiles to guide the development of our techniques we rely on gathered data: – 48 servers in an HP data center – hosting business applications – each with 2 to 8 CPUs create a statistical demand profile for each application – compact representation of pattern for demand – characterize day of week and day of weekend separately ignore weekends for the purpose of the study – characterize a weekday by minute time slots probability mass function (pmf) gives the observed distribution for the number of CPUs needed per slot the profiles populate a calendar of expected demand for the utility – enables admission control
page 24April 2003 Copyright Aad van Moorsel, HP Labs admission control approach a new application requests admission to the utility assume we admit the new application unfold its profile onto the utilitys calendar for a capacity planning horizon – for example, several months into the future characterize the calendars new per-slot distributions of aggregate demand use distributions to estimate required size of resource pool admit application if there are sufficient resources
page 25April 2003 Copyright Aad van Moorsel, HP Labs demands for a time slot t applications utility: - distribution of aggregate demand is approximated by the joint pmf - however, we must also consider correlations between application demands
page 26April 2003 Copyright Aad van Moorsel, HP Labs experimental design and results how many CPUs are needed if applications: – are statically assigned their peak numbers of CPUs? – are assigned the peak number of CPUs needed on per-slot basis? – are offered assurance p that resource requests will be satisfied? about the experiments: – include application demand correlations as measured – include 60 minute warm-up/warm-down application migration overheads – reported estimates verified using trace driven simulation resource access mechanismnumber of CPUs required static309 peak per slot (p=1.0)275 statistical multiplexing p= (estimate) statistical multiplexing p= (estimate)
page 27April 2003 Copyright Aad van Moorsel, HP Labs grid & performability modeling research perspective
page 28April 2003 Copyright Aad van Moorsel, HP Labs modeling issue I the many perspectives of virtualization virtualization enables flexibility in UDC: 1. storage area networks let applications use any storage device 2. computing virtualization allows to assign CPUs dynamically to customers 3. virtual LAN creates a secure private network virtualization gives the illusion of some traditional functionality (boundaries), but implements it soft modeling challenges: different views for different users, dynamic changing of boundaries (performability!), how to utilize the models contained by the software
page 29April 2003 Copyright Aad van Moorsel, HP Labs modeling issue II on-line algorithms on-line algorithms are key to conquer complexity: automated adaptation needs on-line algorithms on-line algorithms come in many shapes and forms: days: resource scheduling seconds: load balancing, admission control, retries milliseconds: memory optimization, real-time scheduling typical issues: speed of the model solution chose between statistical and structural models obtaining the right on-line data plug-in algorithm module need data model that fits with operational model
page 30April 2003 Copyright Aad van Moorsel, HP Labs modeling issue III how to validate large scale systems many facets to scale: more and more devices more and more interconnected (even globally) increasing number of users multi-party and multi-ownership greater differences in scale: smaller devices, bigger data centers amount of data collected and analysis done increases with the scale of the systems we have no good ways of analyzing large-scale systems: no test beds, no reliable data, no widely accepted modeling approaches
page 31April 2003 Copyright Aad van Moorsel, HP Labs modeling issue IV how to evaluate for business metrics the real metric of interest is euros: how much is the total cost of ownership how much am I as customer willing to pay for a service what penalties do I as provider accept in an SLA if I invest x, what is the return on IT investment how do we model the money/QoS correlation?
page 32April 2003 Copyright Aad van Moorsel, HP Labs conclusion adaptive/utility/autonomic computing has intrinsic need for QoS (performability) modeling and analysis the grid is believed to be the platform of choice – applications are more interesting than the middleware challenges for stochastic modeling larger than ever in this setting: – virtualization – on-line algorithms – large-scale systems – business metrics