Download presentation
Presentation is loading. Please wait.
Published byDeirdre Strickland Modified over 9 years ago
1
This presentation can be distributed under a Creative Commons License
2
Image: xkcd.com Dependable Cloud Architecture @mikewo Mike Wood http://mvwood.com
3
Questions @mikewo Mike Wood http://mvwood.com Tack
4
“Failure is always an option.” Image: Discovery Channel, Fair Use
5
What are we looking for? Check out: http://bit.ly/wazbizcont Images: Office ClipArt & Godzilla Releasing Corp (Fair Use) Hardware FailureData Corruption Network Failure Loss of Facilities
6
Image: FOX, Fair Use Human Error
7
What we’re trying to achieve 1.Monitoring 2.Resilient Solutions Image: Cohdra
8
Image: Office ClipArt Cost vs Risk 99.999 % $1, …,000.00 To get more 9’s here add more 0’s here.
9
Image: NASA
10
Functional Transparency Image: Office ClipArt Logging Messages Hardware Health Dependent Services Health
11
Telemetry
12
Image: NASA Analyze your Data
13
Image: Office ClipArt
14
Remember: Failure is always an option. Common Points of Failure Machine\application crashes Throttling (exceeding capacity) Connectivity\Network External service dependencies
15
Try/catch != Resilient private void createFile() { string fileName = @"c:\workingDirectory\someFileName.txt"; try { File.Create(fileName); } catch (DirectoryNotFoundException ex) { Trace.WriteLine( String.Format("Unable to create {0}. {1}", fileName, ex)); throw; }
16
Image: Michael Wood Decompose your system…
17
Capacity Buffering Content Delivery Networks (CDN’s) Distributed Application Cache Local Content Cache Enables recovery during outages or spikes in load Image: jepler
18
Always carry a spare 75% Capacity, half of our load 50% more capacity then needed Can absorb of temporary spikes Time to react if need to add capacity 100% of load, 150% Capacity 0% Capacity, redirect all load Over allocated, but still functioning Degrade, but don’t fail SYSTEM FAILURE!!! Image: Kevin Rosseel
19
Request Buffering Image: Joe Shlabotnik Queues Retry Policies Async Workloads
20
Dept. of Redundancy Dept. Have a backup, somewhere else More than one? Cost to benefit Ratio? Ready State Hot = full capacity Warm = scaled down, but ready to grow Cold = mothballed, starts from zero Image: Mr. White
21
Redundancy - Its about probability 95% uptime 1 box : 5% downtime or 438hrs per year 2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year 4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000,000 0.000625% downtime or 3.285 MINUTES per year (that’s 18 ½ days!)
22
Total Outage duration = Time to Detect + Time to Diagnose + Time to Decide + Time to Act Image: Office ClipArt
23
Dynamic Addressing & Configuration
24
What about your data? Image: barrymieny
25
Availability via Degradation Image: Michael Wood
26
Images: Gizmodo Virtualization and Automation
27
Images: Orion Pictures owns Terminator Franchise
28
The “HI” Point Images: Office Clip Art
29
Image: NASA
30
“Don't be too proud of this technological terror you've constructed…” ADMIT: Your Solution WILL fail at some point You can learn from others just as well as yourself DO: Root cause analysis Read other root cause analysis Plan for failure DON’T: Get cocky Stick your head in the sand Images: LucasFilm, Fair Use
31
Questions @mikewo Mike Wood http://mvwood.com http://bit.ly/CloudFailSafe Tack
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.