CompSci 296.2 Self-Managing Systems Shivnath Babu.

Slides:



Advertisements
Similar presentations
Three Perspectives & Two Problems Shivnath Babu Duke University.
Advertisements

Designing, Deploying and Managing Workflow in SharePoint Sites Steve Heaney Product Development Manager OBS
If you knew what I know or CloudWave - Improving services in the Cloud through collaborative adaptation Eliot Salant IBM Haifa Research.
SLA-Oriented Resource Provisioning for Cloud Computing
1 Planetary Network Testbed Larry Peterson Princeton University.
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
Fabián E. Bustamante, Winter 2006 Recovery Oriented Computing Embracing Failure A. B. Brown and D. A. Patterson, Embracing failure: a case for recovery-
Copyright 2007, Information Builders. Slide 1 Workload Distribution for the Enterprise Mark Nesson, Vashti Ragoonath June, 2008.
What will my performance be? Resource Advisor for DB admins Dushyanth Narayanan, Paul Barham Microsoft Research, Cambridge Eno Thereska, Anastassia Ailamaki.
DevOps and Private Cloud Automation 23 April 2015 Hal Clark.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
Performance Debugging in Data Centers: Doing More with Less Prashant Shenoy, UMass Amherst Joint work with Emmanuel Cecchet, Maitreya Natu, Vaishali Sadaphal.
Lecture 3 Feb 7, 2011 Goals: Chapter 2 (algorithm analysis) Examples: Selection sorting rules for algorithm analysis Image representation Image processing.
CalStan 3/2011 VIRAM-1 Floorplan – Tapeout June 01 Microprocessor –256-bit media processor –12-14 MBytes DRAM – Gops –2W at MHz –Industrial.
Recovery Oriented Computing: Update Armando Fox (in loco Patterson) Summer ROC Retreat, June 2002.
Recovery Oriented Computing (ROC) Dave Patterson and a cast of 1000s: Aaron Brown, Pete Broadwell, George Candea †, Mike Chen, James Cutler †, Prof. Armando.
Oracle Database Administration. Rana Almurshed 2 course objective After completing this course you should be able to: install, create and administrate.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.
Towards Autonomic Hosting of Multi-tier Internet Services Swaminathan Sivasubramanian, Guillaume Pierre and Maarten van Steen Vrije Universiteit, Amsterdam,
Cloud Attributes Business Challenges Influence Your IT Solutions Business to IT Conversation Microsoft is Changing too Supporting System Center In House.
Manage & Configure SQL Database on the Cloud Haishi Bai Technical Evangelist Microsoft.
CompSci Self-Managing Systems Shivnath Babu.
Self-Adaptive QoS Guarantees and Optimization in Clouds Jim (Zhanwen) Li (Carleton University) Murray Woodside (Carleton University) John Chinneck (Carleton.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
1 Autonomic Computing An Introduction Guenter Kickinger.
Click to add text TWA Cloud Integration with Tivoli Service Automation Manager TWS Education.
Recovery Oriented Computing (ROC) Aaron Brown*, Pete Broadwell, George Candea †, Mike Chen, Leonard Chung*, James Cutler †, Armando Fox †, Archana Ganapathi*,
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
C O N F I D E N T I A L 22-Oct-15 1 StarCite Engineering Weekly Meeting StarCite Engineering Feb 9, 2009.
CompSci Self-Managing Systems Shivnath Babu.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
Windows Role-Based Access Control Longhorn Update
Self-Managing Cost Models Shivnath Babu Stanford University.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Power at Your Fingertips –Overlooked Gems in Oracle EM John Sheaffer Principal Sales Consultant – Oracle Corporation.
EVGM081 Multi-Site Virtual Cluster: A User-Oriented, Distributed Deployment and Management Mechanism for Grid Computing Environments Takahiro Hirofuchi,
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
CompSci Self-Managing Systems Shivnath Babu.
NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /
June 13-15, 2007Policy 2007 Infrastructure-aware Autonomic Manager for Change Management H. Abdel SalamK. Maly R. MukkamalaM. Zubair Department of Computer.
CPS 216: Advanced Database Systems Shivnath Babu.
The Vision of Autonomic Computing Self-Management Unit 7-2 Managing the Digital Enterprise Kephart, and Chess.
Web Technologies Lecture 13 Introduction to cloud computing.
CompSci Self-Managing Systems Shivnath Babu.
EPICS and LabVIEW Tony Vento, National Instruments
CompSci Self-Managing Systems Shivnath Babu.
Slide 1 Recovery-Oriented Computing Aaron Brown, Dan Hettenna, David Oppenheimer, Noah Treuhaft, Leonard Chung, Patty Enriquez, Susan Housand, Archana.
Workflow in Microsoft Office SharePoint Server Jessica Gruber Consultant Microsoft Corporation.
1 Sean Aluoto Anthony Keeley Eric Werner. 2 Project Plan Overview Project Lifecycle model Time line Deliverables Organization plan Risk management Design.
EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
ROLLING DATABASE SNAPSHOTS David Cobb Daveslog.com.
1 Policy Based Systems Management with Puppet Sean Dague
DBMS & TPS Barbara Russell MBA 624.
Oracle Database Administration
Maximum Availability Architecture Enterprise Technology Centre.
Recovery-Oriented Computing
Database Management Systems
Why? (or … am I really in the right track?)
CompSci Self-Managing Systems
CompSci Self-Managing Systems
CompSci Self-Managing Systems
CompSci Self-Managing Systems
Monitor VMware with SC2012 SP1 Operation Manager & Veeam Microsoft Tools for VMware Integration & Migration Symon Perriman Michael Stafford Senior.
CMPT 102 Introduction to Scientific Computer Programming
Presentation transcript:

CompSci Self-Managing Systems Shivnath Babu

2 Today Some current work in self-managing systems  Ideas & resources for projects IBM ROC (Discussion deferred to next class) Our projects at Duke HP

3 Project Group size <= 2 Identify “general topic” by end of January, meet Shivnath Feb 7: Scope problem and give 15-minute talk Feb 21: 3-minute talk March 7: 15-minute talk March 28: 3-minute talk April 4/6: 15-minute talk April 20/24: 15-minute final in-class presentation (+ “demo”)

4 Work on Self-Managing Systems IBM IBM Journal, Volume 42, Number 1, 2003 Autonomic computing home page IBM autonomic home – library, demos Autonomic computing toolkit IBM Tivoli

5 Work on Self-Managing Systems Berkeley-Stanford ROC project Reading for this class Interesting source of project ideas and source code Sample project reports/presentations (follow the CS444A/294-4 link)

6 The past: research goals and assumptions of last 15 years Goal #1: Improve performance Goal #2: Improve performance Goal #3: Improve cost-performance

7 New research goals for a New Century: ACME Availability Changeability –support rapid deployment of new software, apps, UI Maintainability –reduce burden on system administrators –provide helpful, forgiving SysAdmin environments Evolutionary Growth –allow easy system expansion over time Also Security/Privacy

8 Recovery-Oriented Computing (ROC) Philosophy “If a problem has no solution, it may not be a problem, but a fact, not to be solved, but to be coped with over time” — Shimon Peres (“Peres’s Law”) People/HW/SW failures are facts, not problems Recovery/repair is how we cope with above facts Since major Sys Admin job is recovery after failure, ROC also helps with maintenance/TCO ROC focus is on fast repair Vs. old focus on longer time between failures

9 An Example Project in ROC Undo functionality for system administrators (useful for self-managing components as well) To recover from human errors To recover from failed operations like software upgrades, installs, and configuration updates An interesting mechanism project for self-healing

10 Mechanism Projects Required/useful mechanisms for self-managing systems Take a goal related to self-managing (e.g., self- optimization, predicting problems), take a system (e.g., a database)  What mechanisms are needed? Will current mechanisms suffice? Ex: Data collection –nonintrusive, distributed, “active probing”

11 Our Projects at Duke Ques: Querying Systems (as data) –Better tools for system administrators and self-managing system components CoD: Cluster on Demand –Allocate virtual clusters to applications on demand

12 Querying Systems as Data WAN Clients Web server Application servers Database servers

13 Querying Systems as Data WAN Clients Web server Application servers Database servers WAN

14 Querying Systems as Data What are probable causes of the Service-Level-Agreement (SLA) violations rising to 12%? Root-cause query

15 Queries: What if … Given today’s workload, how will average response time change if my database fails? If I double the memory on my application servers, how will SLA violation rate change?

16 Queries: Let me know … Let me know if, with 75% probability, average response time will exceed 5 seconds in next 30 minutes –Prediction –Continuous query

17 Queries: What should I do? What should I do to reduce SLA violations of requests A to <1%, without increasing violations of other requests? –Root-cause + What-if

18 Querying Systems as Data Instrumented traces, logs System activity data Data from active probing Workload System configuration data (e.g., buffer size, indexes) Source code Models –Analytic performance models –Machine learning models –Rules from system experts –Simulators DATADATA

19 Querying Systems with QueS (30,000 ft) DATADATA Query Processor Data Acquisition Data Maintenance Model- driven DB Engine Queries Answers System mgmt. services

20 Challenges: Query Complexity Support for complex queries –Rank probable causes of SLA violation rising to 12%? –“What should I do” queries Queries are ad-hoc Queries may be acquisitional

21 Challenges: Query Specification Declarative query language –Expressibility of language –Composition Snapshot queries and continuous queries

22 Challenges: Query Processing Model-based query processing Many types of data sources –Structured, semi-structured, and unstructured Uncertainty in input data –E.g., legacy systems may have partial/no instrumentation Imprecise answers –Answers may include quantification of accuracy –Ranking

23 Challenges: Run-time Overhead Real-time service for 24x7 systems Tunable data acquisition Active probing

24 Work in Progress With Piyush Shivam –Models for answering queries about expected performance given a resource assignment, feasible resource assignments to meet SLA, what-if queries for scientific applications With Songyun Duan –Use of Bayesian Networks for performance prediction and root-cause queries With Wanhong Xu –What-if queries on configuration-parameter settings

25 Projects at HP Research Project 1: Predicting performance problems, finding root cases of problems Project 2: Debugging complex systems Project 3: Designing adaptive systems