A Scalable Approach to Architectural-Level Reliability Prediction Leslie Cheung Joint work with Leana Golubchik and Nenad Medvidovic
Motivation Many design decisions are made early in the software development process These decisions affect software quality Need to assess software quality early If problems are discovered later (e.g., after implementation), they may be costly to address
Motivation We focus on assessing software reliability using architectural models in this talk Reliability: the fraction of time that the system operates correctly Architectural models: describes system structure, behavior, and interactions
Case Study: MIDAS Measure room temperature and adjust the temperature according to a user-specified threshold by turning on/off the AC Sensor: measures temperature and sends the measured data to a Gateway Gateway: aggregates and translates the data and sends it to a Hub Hub: determines whether it should turn the AC on or off AC: Control the AC GUI: View current temperature, and change thresholds 4
Motivations Existing approaches for concurrent systems: keeps track of the states of all components MIDAS Example State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC)
(Taking Measurements, idle, idle, idle, Processing User Request, idle) Motivations (Taking Measurements, idle, idle, idle, Processing User Request, idle) Existing approaches for concurrent systems: keeps track of the states of all components MIDAS Example State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC)
(Failed!, idle, idle, idle, Processing User Request, idle) Motivations (Failed!, idle, idle, idle, Processing User Request, idle) Existing approaches for concurrent systems: keeps track of the states of all components MIDAS Example State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC)
Motivations Problem: Scalability e.g., 2 Gateways,10 Sensors each >5000 states How about real-world applications, which may have 100s of Sensors and Gateways? The models are too big to solve (Taking Measurements, idle, idle, idle, Processing User Request, idle) Existing approaches for concurrent systems: keeps track of the states of all components MIDAS Example State: (Sensor1, Sensor2, Gateway, Hub, GUI, AC)
The SHARP Framework SHARP: Scalable, Hierarchical, Architectural-Level Reliability Prediction Framework Idea: generate part of the system model at a time by leveraging use-case scenarios Solving many smaller models is more efficient than solving one huge model
MIDAS Use-Case Scenarios MIDAS example Sensor Measurement GUI Request Control AC
The SHARP Framework Modeling concurrency: instances of scenarios may run simultaneously MIDAS Example Processing a GUI request while processing sensor measurements Sensor Measurement and GUI request scenarios run simultaneously Multiple sensors Multiple instances of the Sensor Measurement scenario
The SHARP Framework Generate and solve submodels according to the system’s use-case scenarios Generate and solve a coarser-level model for system reliability Describe what happens when multiple instances of scenarios are running Make use of results from the submodels
The SHARP Framework
The SHARP Framework R1 m1
The SHARP Framework R2 m2 R3 m3
The SHARP Framework Generate and solve submodels according to the system’s use-case scenarios Generate and solve a coarser-level model for system reliability Describe the number of active instances of each scenarios Make use of results from the submodels
The SHARP Framework
The SHARP Framework
The SHARP Framework
The SHARP Framework m1 R1 m2 R2 m3 R3
The SHARP Framework m1 R1 m2 R2 R m3 R3
Evaluation To show… Experiments SHARP has better scalability than a flat model that can be derived from existing approaches, and SHARP is accurate, using results from the flat model as “ground truth” Experiments Computational cost in practice Sensitivity analysis
Computational cost in practice Example: MIDAS system, varying the number of Sensor component (x-axis) Y-axis: number of operations needed to solve the model
Sensitivity Analysis We are primarily interested in what-if analysis Is Architecture A “better” than Architecture B? but not Will my system’s reliability greater than 90%? What is the probability that I can run my system for 100 hours without any failure? Focusing on trends is meaningful at the architectural level 24
Sensitivity Analysis “Ground truth”: results from the flat model Vary Sensor failure rate 25
Conclusions Assessing software quality early is desirable Scalability is a major challenge in reliability prediction of concurrent systems using architectural models We tackle address this challenge by leveraging a system’s use-case scenarios in SHARP Future Work: Contention modeling Work thus far: assume no contention However, concurrency contention
The End
Defects Architectural: Mismatches between architectural models e.g., An interaction protocol mismatch between 2 comps System: Limitations of components e.g., Sensor has limited power Allow system designers to evaluate how much reliability will improve if defects are addressed Cost