This presentation can be distributed under a Creative Commons License.

Slides:



Advertisements
Similar presentations
Security, Compliance & Reliability in Cloud Services.
Advertisements

Presented by Mr.Vihang S. Kathe IBC High availability Solution High performing IT Solutions.
Alert radio repeater automated backup, failover, recovery David Leader HydroLynx Systems.
Maximizing Uptime and Your Firm's Bottom Line: Understanding risk and budget when evaluating business continuity & disaster recovery protocols Michael.
Local Touch – Global Reach Avoiding the Chaos Monkey Brent Stineman – National Cloud Solution Specialist.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
ITC561 Cloud Computing Topic 4: Cloud Architecture …. Continue….
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
1 CSSE 377 – Intro to Availability & Reliability Part 2 Steve Chenoweth Tuesday, 9/13/11 Week 2, Day 2 Right – Pictorial view of how to achieve high availability.
Implementing Failover Clustering with Hyper-V
National Manager Database Services
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Piilo Makes HR Easy for Businesses of Any Size, Thanks to the Convenience of Its Mobile App and the Power of the Microsoft Azure Cloud Platform MICROSOFT.
Diameter Agent Overload IETF 88 - Vancouver 1. Goal Get consensus from the working group that Agent overload needs to be addressed If so, get guidance.
Barracuda Load Balancer Server Availability and Scalability.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
The Role of High Availability Software in Quality of Service Joe McFadden Vice President, Marketing, Nuasis.
Service Overview CA- IROD- Instant Recovery on Demand CRITICAL SERVER CONTINUITY, NON-STOP OPERATIONS, TOTAL DATA PROTECTION Turnkey solution that provides.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Stratus Continuous Process System COSC513 Presentation By Ying Li & Kunyu Zheng.
FireProof. The Challenge Firewall - the challenge Network security devices Critical gateway to your network Constant service The Challenge.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Data Sharing. Data Sharing in a Sysplex Connecting a large number of systems together brings with it special considerations, such as how the large number.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
Why Cloud Architecture is Different! Michael Stiefel Architecting For Failure.
DotSlash – or how to deal with 15 minutes of fame Weibin Zhao Henning Schulzrinne Columbia University CATT/WICAT Annual Research Review November 14, 2003.
Private and Personal Cloud Examining the options of private and personal cloud SharePoint Saturday Calgary Presented Jun Version 1.1.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Mahesh Krishnan Architecting highly resilient applications on Azure ARC42 7.
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
Scalability == Capacity * Density.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
Ben FletcherRonnie Altit Getting the rest of your Data into Office 365 – archive and offline import introduction and real world experiences PRD23 3.
Building Cloud Solutions Presenter Name Position or role Microsoft Azure.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
SQL Server High Availability Introduction to SQL Server high availability solutions.
MICROSOFT AZURE APP BUILDER PROFILE: RAVERUS LTD. Raverus is a customer-driven company engaged in providing software applications designed to improve and.
Windows Server 2012 Overview Michael Leworthy Senior Product Manager Microsoft Corporation WSV205.
Oracle & HPE 3PAR.
Oracle Database High Availability
Backups for Azure SQL Databases and SQL Server instances running on Azure Virtual Machines Session on backup to Azure feature (manual and managed) in SQL.
WANTED: CLOUD SOLUTION
Quality Attribute Modeling and Analysis
Fault Tolerance Comparison
MOBILE NETWORKS DISASTER RECOVERY USING SDN-NFV
Build a low-touch, highly scalable cloud with IBM SmartCloud Provisioning Academic Initiative © 2011 IBM Corporation.
Maximum Availability Architecture Enterprise Technology Centre.
Bharath Ram Ramanathan, Storage Solutions TME,
VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)
GlassFish in the Real World
Oracle Database High Availability
SQL Server High Availability Amit Vaid.
Replace with Application Image
What I Learned Making a Global Web App
Be Better: Achieve Customer Service Excellence and Create a Lean RMA and Returns Process with Renewity RMA and the Power of Microsoft Azure MICROSOFT AZURE.
SharePoint Administrative Communications Planning: Dynamic User Notifications for Upgrades, Migrations, Testing, … Presented by Robert Freeman (
Data Security for Microsoft Azure
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
Partner Logo Azure Provides a Secure, Scalable Platform for ScheduleMe, an App That Enables Easy Meeting Scheduling with People Outside of Your Company.
Cloud Computing Architecture
Increase and Improve your PC management with Windows Intune
Working With Cloud - 3.
Presentation transcript:

This presentation can be distributed under a Creative Commons License

Image: xkcd.com Dependable Cloud Mike Wood

Mike Wood Tack

“Failure is always an option.” Image: Discovery Channel, Fair Use

What are we looking for? Check out: Images: Office ClipArt & Godzilla Releasing Corp (Fair Use) Hardware FailureData Corruption Network Failure Loss of Facilities

Image: FOX, Fair Use Human Error

What we’re trying to achieve 1.Monitoring 2.Resilient Solutions Image: Cohdra

Image: Office ClipArt Cost vs Risk % $1, …, To get more 9’s here add more 0’s here.

Image: NASA

Functional Transparency Image: Office ClipArt Logging Messages Hardware Health Dependent Services Health

Telemetry

Image: NASA Analyze your Data

Image: Office ClipArt

Remember: Failure is always an option. Common Points of Failure Machine\application crashes Throttling (exceeding capacity) Connectivity\Network External service dependencies

Try/catch != Resilient private void createFile() { string fileName try { File.Create(fileName); } catch (DirectoryNotFoundException ex) { Trace.WriteLine( String.Format("Unable to create {0}. {1}", fileName, ex)); throw; }

Image: Michael Wood Decompose your system…

Capacity Buffering Content Delivery Networks (CDN’s) Distributed Application Cache Local Content Cache Enables recovery during outages or spikes in load Image: jepler

Always carry a spare 75% Capacity, half of our load 50% more capacity then needed Can absorb of temporary spikes Time to react if need to add capacity 100% of load, 150% Capacity 0% Capacity, redirect all load Over allocated, but still functioning Degrade, but don’t fail SYSTEM FAILURE!!! Image: Kevin Rosseel

Request Buffering Image: Joe Shlabotnik Queues Retry Policies Async Workloads

Dept. of Redundancy Dept. Have a backup, somewhere else More than one? Cost to benefit Ratio? Ready State Hot = full capacity Warm = scaled down, but ready to grow Cold = mothballed, starts from zero Image: Mr. White

Redundancy - Its about probability 95% uptime 1 box : 5% downtime or 438hrs per year 2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year 4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000, % downtime or MINUTES per year (that’s 18 ½ days!)

Total Outage duration = Time to Detect + Time to Diagnose + Time to Decide + Time to Act Image: Office ClipArt

Dynamic Addressing & Configuration

What about your data? Image: barrymieny

Availability via Degradation Image: Michael Wood

Images: Gizmodo Virtualization and Automation

Images: Orion Pictures owns Terminator Franchise

The “HI” Point Images: Office Clip Art

Image: NASA

“Don't be too proud of this technological terror you've constructed…” ADMIT: Your Solution WILL fail at some point You can learn from others just as well as yourself DO: Root cause analysis Read other root cause analysis Plan for failure DON’T: Get cocky Stick your head in the sand Images: LucasFilm, Fair Use

Mike Wood Tack