Presentation is loading. Please wait.

Presentation is loading. Please wait.

CompSci 296.2 Self-Managing Systems Shivnath Babu.

Similar presentations


Presentation on theme: "CompSci 296.2 Self-Managing Systems Shivnath Babu."— Presentation transcript:

1 CompSci 296.2 Self-Managing Systems Shivnath Babu

2 2

3 3 Motivation Systems are becoming hard to manage Increasing size (both software and hardware)

4 4 Motivation WAN Clients Web server Application servers Database servers

5 5 Motivation WAN Clients Web server Application servers Database servers WAN

6 6 Motivation Systems are becoming hard to manage Increasing size (both software and hardware) Increasing heterogeneity (e.g., Grid systems) 24 x 7 operation 5 nines availability (system is down at most 5 minutes and 15 seconds per year)

7 7 Motivation

8 8 Downtime Costs (per Hour) Brokerage operations$6,450,000 Credit card authorization$2,600,000 Ebay (1 outage 22 hours)$225,000 Amazon.com$180,000 Package shipping services$150,000 Home shopping channel$113,000 Catalog sales center$90,000 Airline reservation center$89,000 Cellular service activation$41,000 On-line network fees$25,000 ATM service fees$14,000 Sources: InternetWeek 4/3/2000 + Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”...based on a survey done by Contingency Planning Research."

9 9 Motivation System administration cost is increasing

10 10 Cost of Storage Administration

11 11 Motivation System administration cost is increasing –Recently, $1 storage  $9 administration cost [Fujitsu] –Up to 75% of overall database ownership cost is for administration [Aberdeen] –Up to 80% of Information Technology (IT) budgets spent on maintenance [McKinsey]

12 12 Motivation System administration time & effort is increasing

13 13 Time Distribution for Database Mgmt.

14 14Motivation System administration time & effort is increasing – >40% of computer system outages caused by operator/administrator error Causes of system crashes Time (1985-1993) % of System Crashes System management Software failure Hardware failure Other 53% 18% 10%

15 15 Global Storage Service Site Failures Hardware0% 28% Network 22% Human 41% Unknown 9% SW

16 16Motivation System administration time & effort is increasing – >40% of computer system outages caused by operator error System is too difficult to understand Decisions need to be made quickly, under pressure Not enough well-trained operators Changes are frequent –E.g., workload, hardware, people, data

17 17 The Real Problem … The obstacle is complexity … Dealing with it is the single most important challenge facing the IT industry Paul Horn, Director of Research, IBM

18 18 The Solution Let the system deal with the complexity of management Computer-science-wide push towards Self-Managing Systems IBM calls this new field Autonomic Computing

19 19 Autonomic Computing (IBM) Computer systems that can regulate themselves much in the same way as our autonomic nervous system regulates and protects our bodies Paul Horn, Director of Research, IBM

20 20 Autonomic Nervous System

21 21 Autonomic Nervous System Tells you heart how fast to beat, checks your blood’s sugar and oxygen levels, and controls your pupils so the right amount of light reaches your eyes as you read these words, monitors your temperature and adjusts your blood flow and skin functions to keep it at 98.6ºF Is autonomic -you can make a mad dash for the train without having to calculate how much faster to breathe and pump your heart, or if you’ll need that little dose of adrenaline to make it through the doors before they close

22 22 Autonomic Computing (IBM)

23 23 What will we do in this class? Read research papers Listen to guest lectures Goal of the class: Give structure to this field, e.g., Concretely defining problems that arise in this setting Identifying algorithms and techniques useful in this domain Proposing guidelines for designers of future systems and software Semester-long project

24 24 Outline Part 1: Motivating Factors, Problems, and Applications –From Internet services, database management, computational grids, weather analysis and prediction, oil reservoir optimization, and others Part 2: Algorithms and Techniques –Control theory, machine learning, performance modeling, stochastic optimization, massive data management, data integration, building blocks in systems, and others Part 3: Putting everything together, implications, and future work

25 25Evaluation Class participation 25% Project 75%

26 26 Resources Google keywords –Autonomic computing –Self-managing systems IBM autonomic computing web page IBM Journal special issue on autonomic computing Berkeley ROC project

27 27 In the next class Read an overview paper on self-managing systems Summary of work in this area Sample projects


Download ppt "CompSci 296.2 Self-Managing Systems Shivnath Babu."

Similar presentations


Ads by Google