Improving System Availability in Distributed Environments Sam Malek with Marija Mikic-Rakic Nels Beckman Nenad Medvidovic
Motivation How good is this deployment architecture? What are its properties? How should it be modified to ensure higher availability?
Effect of Deployment on Availability Bad deployment Low availability Better deployment Higher availability Redeployment Redeployment to maximize the availability –Frequency and volume of interactions, reliability and capacity of network links Hard to determine a good deployment in large scale distributed systems –In the small example above, there are 3 10 = possible deployments
Availability Definition The degree to which the system is operational and accessible when required for use
System Model Parameters Software component properties Memory requirements Frequency of interaction Size of the exchanged data Hardware host properties Memory capacity Network reliability Network bandwidth Constraints Location Co-location
Problem Definition Find a system deployment architecture such that: It adheres to the system model parameters and constraints It has the greatest availability
Problem Break Down 1)Lack of knowledge about runtime system parameters –System model parameters not known at the time of initial deployment –System model parameters change at runtime Reliability of links, frequencies of interaction, etc. –Prism-MW monitoring support 2) Exponentially complex problem –n components and k hosts = k n possible deployments –DeSi’s polynomial time approximating algorithms 3) Solution analysis –Comparison of different solutions and algorithms –Centralized vs. Decentralized, performance vs. complexity, etc –DeSi’s visualization and comparison utilities 4) Effecting the selected solution –Redeploying components –Requires an automated solution –Prism-MW deployment support
DeSi Approach Prism-MW 2) Monitoring Data 1) Monitor 4) Redeployment Data 3) Analyze
Prism-MW –An architectural middleware that enables efficient implementation, deployment, and execution of distributed systems in terms of their architectural elements: components, connectors, configurations, etc. –Support for monitoring –Support for redeployment Simplified Class Diagram of Prism-MW
Prism-MW’s Role DeSi Prism-MW 2) Monitoring Data 1) Monitor 4) Redeployment Data 3) Analyze Supports: Step 1 by monitoring events in the system and calculating the system parameters Step 4 by providing an API for the redeployment of components and meta-level components to automate the tasks
Maximizing Availability A family of centralized algorithms Exact – exponential Stochastic – quadratic Adaptive greedy – cubic A family of decentralized algorithms DecAp: Auction-based – cubic A set of clustering techniques –Reduce complexity –Improve performance
Algorithms’ Results
Assessing the Algorithms Efficiency –Execution time vs. precision Applicability –Centralized vs. Decentralized Effect of system characteristics Impact of individual parameter changes Addition of new system parameters Application to new system properties Requires “what if” scenario exploration In comes DeSi!
DeSi’s Architecture Key properties: Tailorability Scalability Efficiency Explorability
DeSi’s View (1)
DeSi’s View (2)
DeSi’s View (3)
DeSi’s View (4)
DeSi’s View (5)
DeSi’s Role DeSi Prism-MW 2) Monitoring Data 1) Monitor 4) Redeployment Data 3) Analyze Supports: Step 3 by providing several redeployment algorithms and various visualization utilities Steps 2 and 4 by providing the appropriate middleware adapter
Conclusion Suite of automated tools and techniques for improving the availability of a distributed system Currently extending the tools to model, analyze, and improve other non-functional aspects of a distributed system: security, latency, etc.
Questions?