Download presentation
Presentation is loading. Please wait.
Published byRaymond Cooper Modified over 9 years ago
1
Cloud Computing and Architecture Architectural Tactics (Tonight’s guest star: Availability)
2
Quality framework (Bass et al.) Central quality attributes –Availability –Interoperability –Modifiability –Performance –Security –Testability –Usability Other qualities –Portability –Scalability –Variability –Flexibility –Cost –Time to market –… Strongly recommended reading!
3
A Writing Template 3 Source of stimulus. This is some entity (a human, a computer system, or any other actuator) that generated the stimulus. Stimulus. The stimulus is a condition that needs to be considered when it arrives at a system. Environment. The stimulus occurs within certain conditions. The system may be in an overload condition or may be running when the stimulus occurs, or some other condition may be true. Artifact. Some artifact is stimulated. This may be the whole system or some pieces of it. Response. The response is the activity undertaken after the arrival of the stimulus. Response measure. When the response occurs, it should be measurable in some fashion so that the requirement can be tested.
4
Example: World of Warcraft CS@AUHenrik Bærbak Christensen4
5
Example: SkyCave Quality attributeAvailability SourceInternal to the system StimuliA crash ArtifactDatabase server EnvironmentNormal operation ResponseDetects events, record it in log, continues in normal operation Response MeasureWithin 3 seconds CS@AUHenrik Bærbak Christensen5 Quality attributePerformance Source1000 independent clients StimuliGenerate on average 2 character events per second ArtifactSkyCave App server EnvironmentNormal operation ResponseEvents are processed, cave state is updated Response MeasureWith maximal 5 seconds latency
6
Tactic –A design decision that influences the achievement of a quality attribute response Example of modifiability tactic: –Encapsulate: Introduce explicit interface to module CS@AUHenrik Bærbak Christensen6
7
CloudArch Core Focus Discussion If a system is not available, what is the point of all other QAs? Security ? – Equals slowness CS@AUHenrik Bærbak Christensen7 System quality attributes –Availability –Modifiability –Performance –Security –Testability –Usability –Interoperability –Scalability
8
Availability CS@AUHenrik Bærbak Christensen8
9
Definition(s) Availability (1): Property of software that it is there and ready to carry out its task when you need it to be Availability (2): Ability of a system to mask or repair faults such that the cumulative service outage period does not exceed a required value over a specified time interval CS@AUHenrik Bærbak Christensen9 Nygard Stability (resilience, longevity): Ability to keep processing for a long time even when there are transient impulses, persistent stresses, or component failures
10
Measurements MTBF: Mean time between failure MTTR: Mean time to repair But often we talk in percentages! –99%3d 15h downtime per year –99,9%8h 1m –99,99%52m –99,9999%32 seconds (!) CS@AUHenrik Bærbak Christensen10
11
Tactics Lots of techs! CS@AUHenrik Bærbak Christensen11
12
Tactics Categories –Fault detection –Recovery Preparation+Repair Reintroduction –Prevention CS@AUHenrik Bærbak Christensen12
13
Detection Ping-echo MonitorNagios – Zabbix - … Exceptions –Time out CS@AUHenrik Bærbak Christensen13
14
Recover: Prep and Repair Active redundancyHot standby –All receive and process all events Millisecond failover Passive redundancyWarm standby –Master-slave Minute failover SpareCold standby –”I think we have an extra machine in the cellar” CS@AUHenrik Bærbak Christensen14
15
Recover: Prep and Repair Exceptions Rollback –Used in DB and [exercise: where else?] –Check pointing Retry Degradation CS@AUHenrik Bærbak Christensen15 Which Nygard patterns?
16
Recover: Reintroduction Shadow –Run in shadow mode until ‘up-to-speed’ State Resync –Typical DB behaviour Cold slaves must catch up with primary –EcoSense db war storyStale DB CS@AUHenrik Bærbak Christensen16
17
Preventing Removal from service –‘scrubbing’ –Use to be that Tomcat server would respawn every 12 hours Easiest way to fix the numerous memory leaks! Transactions –ACID guaranties CS@AUHenrik Bærbak Christensen17
18
Summary All things bad can and will happen to real systems having real users operating in the real world! You systems should strive for high availability and graceful degradation –If you want to keep your customers! The architectural tool box is big! CS@AUHenrik Bærbak Christensen18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.