(Slides are taken from the presentations by Alan Ganek, Alfred Spector, Jeff Kephart of IBM)
1 Trillions of heterogeneous computing devices connected to the Internet Dream of Pervasive Computing … or Nightmare!
2 Core of the Problem Complexity in systems themselves and in the operating environment –As systems become more interconnected and diverse, architects are less able to anticipate and design interactions among components push to runtime, late binding e.g., hot-plug, JVM, JIT compilation, service discovery, mobile agents, … Complexity management human intervention and IT costs
3 Need Complexity Management But complexity is beyond that human can handle Human out of the control loop autonomic Even though we are moving along this direction, is there any systematic way of addressing this issue? Autonomic Computing
4
5
6 Complex Heterogeneous Infrastructures Are a Reality!
7
8 Industry Trends Administration of systems is increasingly difficult –100s of configuration, tuning parameters for DB2 Heterogeneous systems are increasingly connected –Integration becoming ever more difficult Architects can't plan interactions among components –Increasingly dynamic; frequently with unanticipated components More burden must be assumed at run time –But human administrators can't assume the burden 6:1 cost ratio between storage admin and storage 40% outages due to operator error Need self-managing computing systems –Behavior specified by sys admins via high-level policies –System and its components figure out how to carry out policies
9 Autonomic Computing Vision “Intelligent” open systems that… –Manage complexity –“Know” themselves –Continuously tune themselves –Adapt to unpredictable conditions –Prevent and recover from failures –Provide a safe environment Self-management: –free administrators from details of operations –provide peak performance 24/7 –Concentrate on high-level decisions and policies
10 Increase Responsiveness Adapt to dynamically changing environments Business Resiliency Discover, diagnose, and act to prevent disruptions Operational Efficiency Tune resources and balance workloads to maximize use of IT resources Secure Information and Resources Anticipate, detect, identify, and protect against attacks Self-managing Systems That … Aware/Proactive
11 Self-Configuring Example: DB2 Configuration Advisor
12 Self-Healing Example: IBM Electronic Service Agent
13 Self-tuning, end-to-end performance management Dynamic allocation of network resources Workload balancing & routing Cross platform reporting Policy-based for various classes of users & applications Heterogeneous, distributed components working together Self Optimizing: Enterprise Workload Management
14 Rapid / automated analysis of complex situations Self-Protecting Example: IBM Tivoli Risk Manager
15 Evolving towards Self-management TodayThe Autonomic Future Self-configure Corporate data centers are multi-vendor, multi-platform. Installing, configuring, integrating systems is time- consuming, error-prone. Automated configuration of components, systems according to high-level policies; rest of system adjusts automatically. Seamless, like adding new cell to body or new individual to population. Self-heal Problem determination in large, complex systems can take a team of programmers weeks Automated detection, diagnosis, and repair of localized software/hardware problems. Self-optimize Software stacks (e.g., DB2) have hundreds of nonlinear tuning parameters; many new ones with each release. Components and systems will continually seek opportunities to improve their own performance and efficiency. Self-protect Manual detection and recovery from attacks and cascading failures. Automated defense against malicious attacks or cascading failures; use early warning to anticipate and prevent system-wide failures.
16 Manual Autonomic Benefits Skills Characteristics Managed Level 2 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Basic Level 1 Multiple sources of system generated data Requires extensive, highly skilled IT staff Basic Requirements Met Evolving to Autonomic Computing
17 Manual Autonomic Benefits Skills Characteristics Basic Level 1 Predictive Level 3 Adaptive Level 4 Autonomic Level 5 Multiple sources of system generated data Requires extensive, highly skilled IT staff Basic Requirements Met Managed Level 2 Consolidation of data and actions through management tools IT staff analyzes and takes actions Greater system awareness Improved productivity Evolving to Autonomic Computing
18 Manual Autonomic Benefits Skills Characteristics Basic Level 1 Managed Level 2 Adaptive Level 4 Autonomic Level 5 Multiple sources of system generated data Requires extensive, highly skilled IT staff Basic Requirements Met Consolidation of data and actions through management tools IT staff analyzes and takes actions Greater system awareness Improved productivity Predictive Level 3 System monitors, correlates and recommends actions IT staff approves and initiates actions Reduced dependency on deep skills Faster/better decision making Evolving to Autonomic Computing
19 Manual Autonomic Benefits Skills Characteristics Basic Level 1 Managed Level 2 Predictive Level 3 Autonomic Level 5 Evolving to Autonomic Computing Multiple sources of system generated data Requires extensive, highly skilled IT staff Basic Requirements Met Consolidation of data and actions through management tools IT staff analyzes and takes actions Greater system awareness Improved productivity System monitors, correlates and recommends actions IT staff approves and initiates actions Reduced dependency on deep skills Faster/better decision making Adaptive Level 4 System monitors, correlates and takes action IT staff manages performance against SLAs Balanced human/system interaction IT agility and resiliency
20 Manual Autonomic Benefits Skills Characteristics Basic Level 1 Managed Level 2 Predictive Level 3 Adaptive Level 4 Multiple sources of system generated data Requires extensive, highly skilled IT staff Basic Requirements Met Consolidation of data and actions through management tools IT staff analyzes and takes actions Greater system awareness Improved productivity System monitors, correlates and recommends actions IT staff approves and initiates actions Reduced dependency on deep skills Faster/better decision making System monitors, correlates and takes action IT staff manages performance against SLAs Balanced human/system interaction IT agility and resiliency Autonomic Level 5 Integrated components dynamically managed by business rules/policies IT staff focuses on enabling business needs Business policy drives IT management Business agility and resiliency Evolving to Autonomic Computing
21 IBM’s Architecture Model Intelligent control loop: –Implementing self-managing attributes involves an intelligent control loop
22 Control Loops Delivered in 2 Ways Combinations of Management Tools Resource Provider
23 Autonomic Element - Structure Fundamental atom of the architecture –Managed element(s) Database, storage –Autonomic manager Responsible for: –Providing its service –Managing own behavior in accordance with policies –Interacting with other autonomic elements An Autonomic Element Monitor Analyze Sensors Execute Plan Effectors Knowledge Autonomic Manager Managed Element SensorsEffectors
24 Alerts, events & problem analysis request interface SLA/Policy interface, interprets & translates into "control logic" Plan Policy Transforms Plan Generators Policy Interpreter Analyze Execute Service Dispatcher Distribution Engine Scheduler Engine Workflow Engine Monitor Metric Managers Filters Simple Correlators Knowledge Policy CalendarTopology Recent Activity Log SensorsEffectors Rules Engines Analysis Engines Policy Validations Policy Resolution Autonomic Manager Substructure
25 Autonomic Elements - Interaction Relationships –Dynamic, ephemeral –Formed by agreement May be negotiated –Full spectrum Peer-to-peer Hierarchical –Subject to policies
26 Multiple Contexts for Autonomic Behavior System Elements (Intra-element self-management) Groups of Elements (Inter-element self- management) Business Solutions (Business Policies, Processes, Contracts) Server Farm Enterprise Network Storage Pool Customer Relationship Management Enterprise Resource Planning ServersStorage Network Devices Middleware Database Applications
27 Levels of Maturity
28 Enabled capabilities Core technologies Administrative Console Policy Infrastructure Data Collection (Logging/Tracing) Infrastructure Provisioning Install/Dependency Management Heterogeneous Workload Management Solution Management Policy-based Management End-to-end Problem Determination Automated Root Cause Analysis Auto-Update Identity/Security Management Auto-Detection Dynamic Provisioning Autonomic Computing Requires Core Technologies
29 Integrated Solutions Console for Common System Administration Value: –One consistent interface across product portfolio –Common runtime infrastructure and development tools based on industry standards, component reuse –Provides a presentation framework for other autonomic core technologies Customer pain point: Complexity of operations Standards-based: J2EE, JSR168
30 Log and Trace Tool for Problem Determination Value: –Introduces standard interfaces and formats for logging and tracing –Central point of interaction with multiple data sources –Correlated views of data –Reduced time spent in problem analysis Customer pain point: Difficulty in analyzing problems in multi- component systems Standards-based: JSR47, Apache
31 Install/Config Package for new Solutions Value: –One consistent software installation technology across all products –Consistent and up-to-date configuration and dependency data, key to building self-configuring autonomic systems –Reduced deployment time with less errors –Reduced software maintenance time, improved analysis of failed system components –Component-based install for IBM and non- IBM products Customer pain point: Difficulty of deployment in complex systems Standards-based: OGSA, Web Services Partnering with InstallShield
32 Policy Tools for Policy-based Management Value: –Uniform cross-product policy definition and management infrastructure, needed for delivering system-wide self- management capabilities –Simplifies management of multiple products; reduced TCO –Easier to dynamically change configuration in on-demand environment Customer pain point: Complexity of product and systems management Adaptation Definition Validation Local Reposito ry Distribution Enforcement Point Push or pull Activate Implement M O N I T O R Facts Analysis Resource … … Enforcement Point Resource
33 Technologies for Implementing Autonomic Managers Value: Components to simplify the incorporation of autonomic functions into applications –Building blocks for self-management –Monitoring, analysis, planning and execution components –Including autonomic computing technologies, grid tools, and services Pluggable –Defines interfaces and provides implementations for each major toolkit component Customer pain point: How to implement end-to-end autonomic solutions Standards-based: OGSA, W3C
34 Summary of Autonomic Computing Architecture Based on a distributed, service-oriented architectural approach –Every component provides or consumes services –Policy-based management Autonomic elements –Make every component resilient, robust, self-managing –Behavior is specified and driven by policies Relationships between autonomic elements –Based on agreements established and maintained by autonomic elements –Governed by policies –Give rise to resiliency, robustness, self-management of system
35 The Metaphor Without requiring our conscious involvement - when we run, it increases our heart and breathing rate
36 Integrating Biology and Information Technology: The Autonomic Computing Metaphor Current programming paradigms, methods, management tools are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees –self configuring, self adapting, self optimizing, self healing, self protecting, highly decentralized, heterogeneous architectures that work Goal of autonomic computing is to build a self- managing system that addresses these challenges using high level guidance –Unlike AI, duplication of human thought is not the ultimate goal