Download presentation
Presentation is loading. Please wait.
Published byClarissa Lewis Modified over 9 years ago
1
IT Business Continuity Briefing March 3, 2011
2
Incident Overview Improving the power posture of the Primary Data Center STAGEnet Redundancy Telephone Redundancy Secondary Data Center and Recovery Point Objectives (RPO) Secondary Data Center and Recovery Time Objectives (RTO) Customer communications during outage incidents Agenda
3
SYSTEMS & DATANETWORK SERVICESPOWER & ENVIRONMENTALSFACILITIES & STAFF IT Business Continuity Dependencies
4
SYSTEMS & DATANETWORK SERVICESPOWER & ENVIRONMENTALSFACILITIES & STAFF Incident Impact
5
ITD powered down servers and equipment in the Primary Data Center to minimize data loss. ITD started to provision equipment to allow the Secondary Data Center to assume the role of the primary data center. Initial time estimates projected power being restored to the Primary Data Center by 6:00 pm. Power restored at 5:50 pm, email and core network services restored at 6:30 pm, final systems/applications completed by 11:30 pm. January 18 th Incident Response
6
Primary Data Center and Secondary Data Center both have generators to provide backup power. ITD is working with Facilities Management and Sirius Computer Solutions to identify and implement solutions that will provide a second redundant power source to the Primary Data Center. Hoping to be completed by the end of 2011. Power Posture Improvements
7
Four Quadrant RPR Ring provides redundancy on the statewide ring by allowing traffic to automatically failover if a core node fails. The Network Point of Presence in each quadrant has equipment architected for High Availability and backup power generation. Internet Gateways in Bismarck and Fargo are load balanced and architected to provide failover if one of the Internet Gateways fails. Agencies should coordinate with ITD if they require redundancy (network diversity) at individual endpoint locations. STAGEnet Redundancy
8
Current Design is a Standard Digital Design Dependent on the PBX serving the endpoint The PBX has high availability components Does not provide redundant service if the PBX fails There is a service agencies can purchase to re-route critical numbers (e.g. Crisis Hotlines) in the event of a disaster. Telephone Redundancy - Current
9
New Voice over IP (VoIP) design during the next two years. As part of the standard VoIP design we will have four redundant Call Managers on STAGEnet which provide failover if the primary Call Manager serving a site fails. Provides the ability to relocate telephone numbers to other sites with network connectivity. Provides redundant core services for dial tone, call center and automatic call distribution (ACD). Will not initially provide redundancy for voice mail, mobility and Interactive Voice Response (IVR). Telephone Redundancy - VoIP
10
Recovery Point Objective (RPO) Recovery Time Objective (RTO)
11
The Recovery Point Objective (RPO) – the point in time to which you must go back to recover data when a loss incident occurs. RPO focuses on data is independent of the time it takes to get a non-functional system back on-line (the Recovery Time Objective or RTO). Generally a definition of what an agency determines is an “acceptable loss" in a disaster situation. The value of the data in the “acceptable loss” window can then be weighed against the cost of the additional loss- prevention measures that would be necessary to narrow the window. Recovery Point Objective (RPO)
12
Generally speaking backups are performed on a nightly basis to tape at our Secondary Data Center. Databases have full weekly backups and nightly incremental backups. Other data – only items that have changed during the day are backed-up. Generally speaking the RPO or potential loss window for most data is one day – a Tuesday 4 pm disaster would require you to restore the Monday night back up and the activity for Tuesday is lost. Agencies whose business requirements don’t allow for this potential data loss implement data replication. Recovery Point Objective (RPO)
13
Recover Time Objective (RTO) – a measure of how long it takes for a system to resume normal operations to avoid unacceptable business impacts. Prior to 2006 ITD contracted for an out of state disaster recovery hot site with a best case mainframe RTO of 72 hours. With the deployment of online applications and multiple platforms a contracted hot site with adequate network bandwidth and processing capacity became unaffordable. ITD invested in a second data center to improve the State’s RPO and moved to a four hour RTO for core network services. Recovery Time Objective (RTO)
14
Now looking to improve the RTO of the second data center from four hours to a matter of minutes for core network services. Base services that will be up within the first hour: E-Mail File and print services AS/400 platform and applications Current replicated hardware Disaster Recovery Web Site – basic information Recovery Time Objective (RTO)
15
Base services that will be up within four to twelve hours: Mainframe (must IPL) / DELA ConnectND Selected shared services and some agencies have development and/or test environments residing at the second data center. These environments will be converted to assume the role of production servers in a disaster scenario. Recovery Time Objective (RTO)
16
Agencies that do not invest in replicated data solutions and backup processing capacity will need to wait for additional storage and servers to be shipped and provisioned. Estimated RTO of 3 weeks to 8 weeks for production systems depending on hardware availability, staffing priorities and the amount of data to restore. Agencies that invest in replicated data solutions but no backup processing capacity will need to wait for servers to be shipped and provisioned. Estimated RTO of 2 weeks to 4 weeks depending on hardware availability and staffing priorities. Recovery Time Objective (RTO)
17
We feel we can improve our communications process during any future disaster events. Planned communication avenues: DR Website E-mail Customer Service Desk Notifind – currently used to communicate with our staff We may be asking for emergency contacts for critical applications Disaster Recovery Communications
18
Questions ITD Contingency Planning Contact Larry Leelalee@nd.gov328-2721
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.