Fabián E. Bustamante, Winter 2006 Autonomic Computing The vision of autonomic computing, J. Kephart and D. Chess, IEEE Computer, Jan Also - A.G. Ganek and T.A. Corbi, “The dawning of the autonomic computing era”, IBM Systems Journal, 42 (1), R. Want, T. Pering and D. Tennehouse, “Comparing autonomic and proactive computing”, IBMS Systems Journal, 42 (1),
CS 395/495 Autonomic Computing Systems EECS, Northwestern University 2 The problem The main obstacle to further progress in IT industry –Not a change in Moore’s law, but –Looming software complexity crisis Beyond admin single environments, to integration into intra- and inter-corporate computing systems “Complexity is the business we are in, and complexity is what limits us.”, Fred Brooks Jr. Better programming won’t do it Consider –~1/3 to ½ of a company’s total IT budget goes to preventing and recovering from crashes –“For every dollar to purchase storage, you spend $9 to have someone manage it.”, N. Tabellion, CTO Fujitsu Softek –~40% of computer outages are caused by operator errors –Average downtime impact for IT ~ $1.4 millions revenue/hour
CS 395/495 Autonomic Computing Systems EECS, Northwestern University 3 The answer/hope – Autonomic computing Autonomic systems – can manage themselves given high-level objectives from admins. ~ autonomic nervous system An autonomic system –Knows itself –Knows its environment & the context surrounding its activity –(Re) configure itself under varying and unpredictable conditions –Is always on the look to optimize its working –Is able to protect and heal itself –Anticipates the optimized resources needed to meet a user’s information needs To incorporate these characteristics, it must have the following properties/features …
CS 395/495 Autonomic Computing Systems EECS, Northwestern University 4 Self-* properties Self-configuration –Current: Data centers made of components from/for multiple vendors/platforms; installation, configuration & integration is time consuming & error prone –Autonomic: Automated based high-level policies, host system adjust itself automatically and seamalessly Self-optimization –Current: Hundreds of manually set, nonlinear tuning knobs –Autonomic: Components and system continually seek optimization opportunities Self-healing –Current: e.g. problem determination can take weeks –Autonomic: self detection, diagnosis, and repair for HW&SW Self-protection –Current: Detection & recovery from attacks & cascading failures is manual –Autonomic: Self-defense using early warning to anticipate & prevent system-wide failures
CS 395/495 Autonomic Computing Systems EECS, Northwestern University 5 Autonomic element Autonomic systems – interactive collection of autonomic elements Autonomic element –1+ managed elements + autonomic manager that controls it –Function at many levels – from disk drives to entire enterprises –Fixed behavior, connections and relationships gives away to increased dynamism and flexibility expresed as high-level goals Autonomic manager AnalyzePlan Knowledge MonitorExecute Managed element
CS 395/495 Autonomic Computing Systems EECS, Northwestern University 6 Evolution to autonomic systems Basic Level 1 Managed Level 2 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Multiple sources of system generated data Requires extensive, highly skilled IT staff Consolidation of data through management tools IT staff analyzes and takes actions System monitors, correlates, and recommends actions IT staff approves and initiate actions System monitors, correlates and takes actions IT staff manages performance against Service Level Agreements (SLAs) Integrated components dynamically managed by business rules/policies IT staff focuses on enabling business needs Greater system awareness Improved productivity Reduced dependency on deep skills Faster and better decision making IT agility and resiliency with minimal human interaction Business policy drives IT management Business agility and resilience
CS 395/495 Autonomic Computing Systems EECS, Northwestern University 7 Engineering challenges Design, test and verification Installation and configuration Monitoring, problem determination, upgrading Managing the life cycle –Autonomic systems will have multiple elements at different stages, handling multiple tasks, … how to handle all? Relationships among autonomic elements –Specification of services needed/provided; ways to locate providers; ways to establish SLA; … Robustness against self-management-based attacks Goal specification and robustness to wrongly specified goals
CS 395/495 Autonomic Computing Systems EECS, Northwestern University 8 Scientific challenges How to understand, control, and design emergent behavior –Understanding the mapping from local to global behavior is not enough Develop a theory of robustness –Beginning with a definition Learning and optimization theory –Machine learning by a single element in static environment is just the basic – multiagent systems in dynamic environments Negotiation theory –How should the multiple elements negotiate? Automated statistical modeling –Statistical modeling for detection/prediction of performance models; ways to aggregate statistical variables to reduce dimensionality