Presentation is loading. Please wait.

Presentation is loading. Please wait.

This presentation can be distributed under a Creative Commons License.

Similar presentations


Presentation on theme: "This presentation can be distributed under a Creative Commons License."— Presentation transcript:

1 This presentation can be distributed under a Creative Commons License

2 Image: xkcd.com Dependable Cloud Architecture @mikewo Mike Wood http://mvwood.com

3 Questions @mikewo Mike Wood http://mvwood.com Tack

4 “Failure is always an option.” Image: Discovery Channel, Fair Use

5 What are we looking for? Check out: http://bit.ly/wazbizcont Images: Office ClipArt & Godzilla Releasing Corp (Fair Use) Hardware FailureData Corruption Network Failure Loss of Facilities

6 Image: FOX, Fair Use Human Error

7 What we’re trying to achieve 1.Monitoring 2.Resilient Solutions Image: Cohdra

8 Image: Office ClipArt Cost vs Risk 99.999 % $1, …,000.00 To get more 9’s here add more 0’s here.

9 Image: NASA

10 Functional Transparency Image: Office ClipArt Logging Messages Hardware Health Dependent Services Health

11 Telemetry

12 Image: NASA Analyze your Data

13 Image: Office ClipArt

14 Remember: Failure is always an option. Common Points of Failure Machine\application crashes Throttling (exceeding capacity) Connectivity\Network External service dependencies

15 Try/catch != Resilient private void createFile() { string fileName = @"c:\workingDirectory\someFileName.txt"; try { File.Create(fileName); } catch (DirectoryNotFoundException ex) { Trace.WriteLine( String.Format("Unable to create {0}. {1}", fileName, ex)); throw; }

16 Image: Michael Wood Decompose your system…

17 Capacity Buffering Content Delivery Networks (CDN’s) Distributed Application Cache Local Content Cache Enables recovery during outages or spikes in load Image: jepler

18 Always carry a spare 75% Capacity, half of our load 50% more capacity then needed Can absorb of temporary spikes Time to react if need to add capacity 100% of load, 150% Capacity 0% Capacity, redirect all load Over allocated, but still functioning Degrade, but don’t fail SYSTEM FAILURE!!! Image: Kevin Rosseel

19 Request Buffering Image: Joe Shlabotnik Queues Retry Policies Async Workloads

20 Dept. of Redundancy Dept. Have a backup, somewhere else More than one? Cost to benefit Ratio? Ready State Hot = full capacity Warm = scaled down, but ready to grow Cold = mothballed, starts from zero Image: Mr. White

21 Redundancy - Its about probability 95% uptime 1 box : 5% downtime or 438hrs per year 2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year 4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000,000 0.000625% downtime or 3.285 MINUTES per year (that’s 18 ½ days!)

22 Total Outage duration = Time to Detect + Time to Diagnose + Time to Decide + Time to Act Image: Office ClipArt

23 Dynamic Addressing & Configuration

24 What about your data? Image: barrymieny

25 Availability via Degradation Image: Michael Wood

26 Images: Gizmodo Virtualization and Automation

27 Images: Orion Pictures owns Terminator Franchise

28 The “HI” Point Images: Office Clip Art

29 Image: NASA

30 “Don't be too proud of this technological terror you've constructed…” ADMIT: Your Solution WILL fail at some point You can learn from others just as well as yourself DO: Root cause analysis Read other root cause analysis Plan for failure DON’T: Get cocky Stick your head in the sand Images: LucasFilm, Fair Use

31 Questions @mikewo Mike Wood http://mvwood.com http://bit.ly/CloudFailSafe Tack


Download ppt "This presentation can be distributed under a Creative Commons License."

Similar presentations


Ads by Google