Can Your Data Center Recover from a Disaster?. Presenters Rick Boyer Senior Technology Sales Specialist Technology Sales Group GE Healthcare IT

Can Your Data Center Recover from a Disaster?

Presenters Rick Boyer Senior Technology Sales Specialist Technology Sales Group GE Healthcare IT rick.boyer@ge.com Tim Darling Senior Technology Sales Specialist Technology Sales Group (TSG) GE Healthcare IT tim.darling@ge.com

©2015 General Electric Company – All rights reserved. The results expressed in this document may not be applicable to a particular site or installation and individual results may vary. This document and its contents are provided to you for informational purposes only and do not constitute a representation, warranty or performance guarantee. GE disclaims liability for any loss, which may arise from reliance on or use of information, contained in this document. All illustrations are provided as fictional examples only. Your product features and configuration may be different than those shown. Information contained herein is proprietary to GE. No part of this publication may be reproduced for any purpose without written permission of GE. DESCRIPTIONS OF FUTURE FUNCTIONALITY REFLECT CURRENT PRODUCT DIRECTION, ARE FOR INFORMATIONAL PURPOSES ONLY AND DO NOT CONSTITUTE A COMMITMENT TO PROVIDE SPECIFIC FUNCTIONALITY. TIMING AND AVAILABILITY REMAIN AT GE’S DISCRETION AND ARE SUBJECT TO CHANGE AND APPLICABLE REGULATORY CLEARANCE. GE, the GE Monogram, Centricity, and imagination at work are trademarks of General Electric Company. All other product names and logos are trademarks or registered trademarks of their respective companies. General Electric Company, by and through its GE Healthcare division. 3

Agenda Definitions Cost of downtime Risks – causes of downtime Proactive planning for availability Planning for disasters - Failure and outage scenarios Q & A

Definitions

Business continuity – A general term that refers to the planning and design required to maintain the highest level of service consistency, availability, and recoverability Fault tolerance – Totally redundant system and application design with near 100% availability For example, the Centricity Enterprise Non-Stop is a fault tolerant hardware and software platform designed to provide 5 or 6 “9’s” uptime High-availability – Designing an IT infrastructure with minimal single points of failure, includes clustering or failover features

Definitions Disaster – Several definitions we’ve found include: A natural or man-made catastrophic event resulting in severe damage and/or loss of life; e.g. “Sandy” A crisis causing widespread damage which far exceeds the ability to easily recover A calamitous event causing considerable outage and/or unrecoverable disruption of services

Definitions Disaster recovery – The design and preparation needed to recover from a disaster within an organization’s recovery guidelines, goals, and objectives as defined by: RPO – Recovery Point Objective – Amount of data loss your business can sustain based on your business’ recovery rules RTO – Recovery Time Objective – Amount of downtime your business is capable of enduring

Definitions MTD – Maximum Tolerable Downtime* Maximum perceived downtime your users experience when trying to use a system that is available, or, length of time access to the applications users need to perform their job is unavailable (aka – the maximum downtime allowed by management before you get fired) * Thanks to Dave Hoffman from Storex @ IBM Enterprise 2014

Cost of Downtime

Cost of downtime Questions for the audience: How quickly does your management expect your applications and systems to recover from an outage? RTO How much data is your business willing to lose during an outage? RPO What do you think your organization’s MTD is? How much do you think downtime costs your business per hour or per day, or does anyone in the audience know?

Cost of downtime Typical RTO and RPO after unplanned outage High-availability (clustered) system: RTO – approximately 60 sec to 10 min, RPO minimal data loss – depends on replication method & cause of outage * Disaster recovery with hot site: RTO – approximately 1 hour to >4 hours, RPO 24 hours * * Note: recovery depends on how your databases and systems will resume after a failure, and how quickly your IT staff can initialize failover processes and/or DR plans. No guarantees are provided or implied.

Cost of downtime Some estimates range from $50K/hr to an average of $200K/hr¹ or more, depending on type and size of the business – varies by business segment Some costs cannot be estimated Loss of reputation and “bad press” in healthcare, an outage can potentially result in loss of life Perception of downtime varies by role within any organization Management, front desk users, clinicians, etc. ¹From Availabilitydigest.com, : http://www.availabilitydigest.com/private/0206/benchmarking.pdf - page 3http://www.availabilitydigest.com/private/0206/benchmarking.pdf

Cost of downtime

From GE Web site: https://www.gesoftware.com/ge-predictivity-infographic

Cost of downtime According to a Healthcare IT News article on 12/4/2013 entitled: Data center outages come with whopping $8K per minute price tag*: “Healthcare organizations face average costs of $690,000 per outage incident, according to the findings of a new Ponemon Institute/Emerson Network Power report, roughly a 41 percent increase since 2010. Larger groups with more extensive IT systems, however, could pay out nearly $1.74 million per incident” *http://www.healthcareitnews.com/news/data-center-outages-come-monster- pricetag?topic=06,17,19

Cost of downtime Developing a solid ROI will help to justify budgeting for a Business Continuity plan Wide range of options are available, but first you need to have reasonable estimates of what downtime costs your business Example: 9 hours of unplanned outage ( 3 “9’s” or 99.9% uptime) @ $200K per hour = $1.8M of financial exposure per year Unfortunately, most customers experience more unplanned downtime per year so it is the IT manager/director’s job to estimate annual exposure to outages so a reasonable ROI can be developed

Risks – Causes of Downtime

Risks – causes of downtime Operator error or human behavior Software or Application failures Servers, storage, power/AC, networks, physical security failures Risk aversion planning IT processes Testing Quality control Redundancy DR planning Service Contracts Facility Mgmt Monitoring Training Security

Proactive planning for availability

Application and/or software failures Establish consistent processes and quality control Strive to remain current on latest software versions Develop complete version testing and validation procedures Practice repeatable upgrade processes Invest in user training and/or certification

Proactive planning for availability Data Center Infrastructure, servers, storage, power, A/C, networking, physical security, etc. Configure redundancy of all major components in your infrastructure, eliminating single points of failure Develop a disaster recovery planning or failover processes –Clustering, load balancing, remote replication, DR warm or hot site, etc. Enhanced vendor services contracts for all hardware and infrastructure components and operating systems

Proactive planning for availability Data Center Infrastructure, servers, storage, power, A/C, networking, physical security, etc. Incorporate Service Oriented Architecture (SOA) facility management tools, such as GE Proficy SOA* to improve interoperability, monitoring, and control. * http://www.ge-ip.com/download/five-essential-components-for-highly-reliable-data- centers/12850/0/

Proactive planning for availability Use SOA tools for facility management to consolidate operation activities for enhanced data center reliability, monitoring and control From GE White Paper found at: http://www.ge-ip.com/download/five-essential-components-for-highly-reliable-data-centers/12850/0/

Proactive planning for availability Operator error or human behavior Deliver proper training & maintain accurate and updated documentation for all staff Set clear policies & procedures that the IT staff must follow Install and maintain enhanced system monitoring and alerting tools for notification and logging of system access, configuration changes, and resolutions to system issues

Planning for disasters Failure and outage scenarios

Failure and outage scenario #1 Software/application outage scenario: A weekend application upgrade causes major issues with the Scheduling application on Monday morning

Failure and outage scenario #1 Processes or steps you can implement do help avoid risk: IT upgrade procedures must in place to fully test and certify changes before code is merged into production There must be at least a separate test database, or preferably a test/development system, where all changes are validated IT needs to maintain the ability to revert back to the original environment in the event an issue occurs after upgrading

Failure and outage scenario #1 (Cont’d) Ensure all associated software versions are current and/or compatible, including operating systems, database software, etc. Develop repeatable processes that standardize all steps in upgrade planning, testing, and adoption methods

Failure and outage scenario #2 Data Center Hardware Infrastructure outage scenario: Production site server failure impacts user access to the applications.

Failure and outage scenario #2 Protecting against production data center hardware failures Configure redundant hardware and failover software to enhance availability Users connecting to the cluster are configured with load balancing to the remaining, redundant server(s)

Failure and outage scenario #2 (cont’d) Clustering type (active/active or active/passive) is determined by the operating system and clustering software installed Reduce single points of failure by designing your infrastructure with the highest level of redundancy available

Web Server Failure

Cache ’ Server Failur e

Failure and outage scenario #3 Data Center Hardware Infrastructure outage scenario: Production site experiences a catastrophic data center failure - fire, flood, explosion, tornado, or other event

Failure and outage scenario #3 Protecting against a catastrophic data center failure Databases should be periodically replicated to the DR site using some form of DR replication for all databases, with frequency determined by RPO Manual recovery of affected systems is recommended using proven scripts, procedures, and vendor assistance RPO and RTO depends on frequency of replication and failover processes

IDRR = Intelligent Disaster Recovery Replication

For more information on ISS IDRR and other related tools, please visit the ISS booth in the vendor exhibitor area.

Failure and outage scenario #4 Human-initiated failure scenario: A SAN storage manager discovers he is being terminated and deletes database files on a business critical system storage subsystem

Failure and outage scenario #4 Steps you can take to mitigate risk Install a secure, centralized access control tool for all consoles with audit trail monitoring and event alerting A product like TDi ConsoleWorks will provide this level of monitoring, audit trail tracking, and much more Ensure multiple points of secure database redundancy Organize IT for redundancy – more than one person should be responsible for access control and system security

Risks – causes of downtime Operator error or human behavior Software or Application failures Servers, storage, power/AC, networks, physical security failures Risk aversion planning IT processes Testing Quality control Redundancy DR planning Service Contracts Facility Mgmt Monitoring Training Security

Takeaways You should now be familiar with: 1)The different levels of availability – fault tolerance, high- availability, and disaster recovery and some available options to satisfy your organization’s RPO and RTO, 2)Some of the major causes of downtime and how they can impact your business, 3)Options your IT department can use to design an infrastructure, processes, and monitoring that can help reduce unplanned downtime and risk

Questions ?

Can Your Data Center Recover from a Disaster?. Presenters Rick Boyer Senior Technology Sales Specialist Technology Sales Group GE Healthcare IT

Similar presentations

Presentation on theme: "Can Your Data Center Recover from a Disaster?. Presenters Rick Boyer Senior Technology Sales Specialist Technology Sales Group GE Healthcare IT"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Can Your Data Center Recover from a Disaster?. Presenters Rick Boyer Senior Technology Sales Specialist Technology Sales Group GE Healthcare IT

Similar presentations

Presentation on theme: "Can Your Data Center Recover from a Disaster?. Presenters Rick Boyer Senior Technology Sales Specialist Technology Sales Group GE Healthcare IT"— Presentation transcript:

Similar presentations

About project

Feedback