DATA CENTER OUTAGE BRIEFING JANUARY 10–11, 2014 INFORMATION AND EDUCATIONAL TECHNOLOGY.

Slides:



Advertisements
Similar presentations
Kuali Rice at UC Davis UC Davis, Information & Educational Technology.
Advertisements

FMS. 2 Fires Terrorism Internal Sabotage Natural Disasters System Failures Power Outages Pandemic Influenza COOP/ Disaster Recovery/ Emergency Preparedness.
Information Technology Disaster Recovery Awareness Program.
1 SUNGARD AVAILABILITY SERVICES Messaging and Collaboration - Availability Service - Notification Service.
Confirmation and Clearing Contents  Approach to Confirmation and Clearing (C&C)  Operational strategy  Capacity management  Delivery assurance.
Pace University Rebounding from the World Trade Center Disaster Copyright Barbara Cunningham, This work is the intellectual property of the author.
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
Sutter Health Incident Command System Purpose: Provide an organized structure to assist Affiliates in maintaining optimal patient care in the event of.
TIES M ANAGED S ERVICES TIES Conference Product Offerings TIES Conference 2011 TIES Consulting IT Projects Managed Services Audits Staff Support.
May 22, 2002 Joint Operations Group Discussion Overview Describe the UC Davis Security Architecture Describe Authentication Efforts at UC Davis Current.
Financial Upgrade 8.8 Update. Status Fit Gap Complete Fit Gap Complete Design in Progress Design in Progress Development in Progress Development in Progress.
SAPbiz Disaster Recovery Update Bil Huxley – IS&T Administrative Computing 14-September-2005.
Microsoft Virtual Server 2005 Product Overview Mikael Nyström – TrueSec AB MVP Windows Server – Setup/Deployment Mikael Nyström – TrueSec AB MVP Windows.
Update: M-Pathways Financials v.9.1 Upgrade Financial Unit Liaisons July 20, 2011.
Mapping an Electronic Research Administration System Discussion and Review Points for UC Davis.
University of Michigan Administrative Information Services MAIS/MCIT Arbor Lakes Parallel Data Center Mark Linsenman Manager Computer Operations MAIS Joint.
1.  Consider: What are my Hazard Risks & consequences?  Awareness to storm and flood risks ◦ Winds ◦ Floods ◦ Nor’easter ◦ Snow/Ice Storms ◦ Hurricane.
Data Center and Network Planning and Services Mark Redican IET CCFIT Update Feb 13, 2012.
Presented by INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used?
Technical Support Windows Server Support Operations Backup & Recovery Services.
Data Center Migration Update 0 IT Infrastructure Transformation Agency Coordination with RPB Mainframe and Server Migration December 6, 2007.
November 2009 Network Disaster Recovery October 2014.
Caring. Learning. Integrity. Respect. Technology Support for Algonquin’s Mobile Learning Environment Version 2 September Edu-t ne.
TRC Project Director Webinar February 2-3,2012. Greetings  Agenda for Today:  Eighteenth Annual Meeting  Grant Update ( )  RFA Update ( )
Cloud Services How Do We Decide? Charley Kneifel Trask Technology 8/13/13.
Confidential Crisis Management Innovations, LLC. CMI CrisisPad TM Product Overview Copyright © 2011, Crisis Management Innovations, LLC. All Rights Reserved.
©2006 Academic Computing Services, NJIT ©2011 Academic Computing Services, NJIT Academic Computing Services Efficient Use of Computing at NJIT.
New Course Management and Collaboration Tools for UC Davis Faculty Kirk Alexander Initial Pilot Users Meeting January 20, 2006.
Storage Security and Management: Security Framework
Contacting the UCO Help Desk Welcome to English 101! This brief lesson will explain options for receiving technical support as you participate in this.
1. Agenda Service Utilization Service Level Metrics Service Performance Review Incident Review Current / Future Service Plan Service Improvement Plan.
ENP Study Group Disaster Planning Session #6 BROUGHT TO YOU BY: THE FLORIDA NENA EDUCATION COMMITTEE.
IT Business Continuity Briefing March 3,  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone.
SMART METER TEXAS Status Update July 29, AGENDA Release 1 Smart Meter Texas Online Portal Update – SMT Solution Update – Registration Statistics.
PAR CONFERENCE Homeland Defense A Provider’s Perspective Lessons from TMI Dennis Felty November 15, 2001.
Corporate Information Systems Delivery of Infrastructure IT Services.
-SIG Information Systems & Computing University of Pennsylvania December 16, /13.
SECURITY & THE UNIVERSITY INCLUDING A HOSPITAL October 3, 2008 Doyle Friskney Chief Technology Officer University of Kentucky.
IT Briefing March IT Briefing Agenda 3/16/06 Security Announcements eResearch Overview Housing Overview & Demo Update on current performance problems.
Co-location Sites for Business Continuity and Disaster Recovery Peter Lesser (212) Peter Lesser (212) Kraft.
 What is intranet What is intranet  FeaturesFeatures  ArchitectureArchitecture  MeritsMerits  applicationsapplications  What is ExtranetWhat is.
RMS Update to TAC May 8, RMS Update to TAC ► At April 9 RMS Meeting:  Antitrust Training  RMS Voting Items: ► NPRR097Changes to Section 8 to Incorporate.
Unit 6b System Security Procedures and Standards Component 8 Installation and Maintenance of Health IT Systems This material was developed by Duke University,
KAPLAN SCHOOL OF INFORMATION SYSTEMS AND TECHNOLOGY Unit 4 IT 484 Networking Security Course Name – IT Networking Security 1203C Term Instructor.
2011 Windstorm 2011 Windstorm After-Action Report - Update December 3, 2012.
Information Technology Cost Pool Council of Research Associate Deans March 12, 2009.
Data Pipeline May 10, 2013 Lisa Bradley IMS, Project Manager
EGEE04 Pisa 27 Oct Planning for emergencies Grid security, just another case for emergency preparation? Pål S. Anderssen CERN - IT.
Office of Emergency Management University of Houston-Clear Lake Business Continuity Planning.
Rob Davidson, Partner Technology Specialist Microsoft Management Servers: Using management to stay secure.
Power Outages and Communications Jack Brown, Director, Arlington County Office of Emergency Management November 13, 2012.
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
WINS Monthly Meeting 02/04/2011
Virtual Machine Movement and Hyper-V Replica
Business Continuity Disaster Planning
IS3220 Information Technology Infrastructure Security
Microsoft Azure and ServiceNow: Extending IT Best Practices to the Microsoft Cloud to Give Enterprises Total Control of Their Infrastructure MICROSOFT.
GRC: Aligning Policy, Risk and Compliance
Information Technology Support Services Focusing on our customers 1.
Answer Me 13Text Ltd. Answer Me... Do you need to RELY on your critical calls being answered by your Support Team each and every time, be they internal.
Recovery from the earthquake Takashi Sasaki. Disaster recovery “Disaster” comes from human error or hardware failure was considered before We were preparing.
© ITT Educational Services, Inc. All rights reserved. IS3220 Information Technology Infrastructure Security Unit 10 Network Security Management.
Disaster Recovery Prepared by Mark Lomas Mark Lomas IT Infrastructure Consultant Storage & Servers.
City of Hyattsville City Council IT Briefing October 19, 2015 dataprise.com | #ITinRealLife.
CET4884 Dr. Nabeel Yousef.  Dr. Nabeel Yousef  Located at the ATC campus room 107Q  Phone number 
Mission Continuity Program Tabletop Exercise FY 2017.
13Text Ltd Phone Broadcast.
Exam in just 24 hours!!! Pass your exam in first attempt by the help of our latest braindumps
Mission Continuity Program
COR1000 Telecoms Project UK Link Committee Update
Presentation transcript:

DATA CENTER OUTAGE BRIEFING JANUARY 10–11, 2014 INFORMATION AND EDUCATIONAL TECHNOLOGY

Review of Events Cause Analysis and Current Efforts Communications Vulnerabilities Mitigation Plans Lessons Learned Communication Improvements Agenda

INFORMATION AND EDUCATIONAL TECHNOLOGY Summary: 3 incidents Friday, Jan 10: Virtualization and uConnect firewall Saturday, Jan 11: Virtualization Virtualization outage affected most major systems on campus Some mitigation lessened impact on Saturday uConnect firewall outage Extended , authentication and DNS service outage for uConnect users (additional 4 hours) Review of Events:

INFORMATION AND EDUCATIONAL TECHNOLOGY 11 am 12 pm 1 pm 2 pm 3 pm 4 pm 5 pm 6 pm 7 pm 8 pm 12 am 1 am 2 am 3 am 4 am 5 am 6 am 7 am 8 am 9 am 10 am 11 am 12 pm 1 pm 2 pm Outage Timeline: 3 incidents Virtualization Outage Friday, January 10th Saturday, January 11th uConnect Firewall Outage CAS & Smartsite restored routing restored VM guests started to restore Services Most services restored except uConnect Virtualization Degradation (critical services stable) Virtualization Outage VM hosts rebooted VM guests started to restore Services Most services restoredAll services restored VM hosts rebooted Firewall fail over to secondary w/o success Hard power cycle restores firewall and uConnect Services 1 2 3

INFORMATION AND EDUCATIONAL TECHNOLOGY Services Impacted Admissions Banner Central Authentication Services (CAS)* Computing Accounts Electronic Death Registry System Data Center File Services DaFIS DavisMail Data Center Virtualization Final Grade Submission Geckomail Kuali Financial Services Identity and Access Management IET Web Sites MyInfoVault MyUCDavis ServiceNow and SSC Case Management Shibboleth Smartsite* Time Reporting System Web Content Management System uConnect Services UC Davis Directory Listings UC Davis Home Site * CAS was restored to physical hardware on Fri 1:40pm which restored dependent services such as departmental applications and Smartsite.

INFORMATION AND EDUCATIONAL TECHNOLOGY Regular outage communication channels were unavailable Websites (status page, Communications issued Automated notices on IT-Express phone system (updated 3 times) Twitter updates (8 on 01/10; 5 on 01/11) Progress updates on Status web page (status.ucdavis.edu) starting Friday mid-afternoon to 300+ campus technologists (01/11) Communications

INFORMATION AND EDUCATIONAL TECHNOLOGY Hardware is redundant, but many services are hosted in single location on a single SAN Critical uConnect directory services reside on a single network The system status page is dependent on the local infrastructure IET is not aware of all critical services that rely on our infrastructure Vulnerabilities

INFORMATION AND EDUCATIONAL TECHNOLOGY SAN Software Upgrade completed Implement diversification for critical services (Authentication, uConnect Directory Services, Status Page, WWW) Integrate cloud services to improve diversity Develop process to identify critical campus services dependent on IET infrastructure. Mitigation Plans

INFORMATION AND EDUCATIONAL TECHNOLOGY Move from disaster recovery to business continuity Normal communication channels were unavailable Communication and decision-making protocols when normal channels unavailable Not prepared for normal channels being unavailable Lessons Learned

INFORMATION AND EDUCATIONAL TECHNOLOGY Review service outage communication protocols, contacts, and venues Ensure multiple modes of communication (text, cell, , web, phone, social media) are available; leverage new WarnMe system extension for non-emergency notifications Closer collaboration with Emergency Manager and Strat Comm Ensure broad awareness of outage communication channels Launch cloud-based status page – Status Page I/O Leverage AggieFeed for broader communication Communication Improvements

INFORMATION AND EDUCATIONAL TECHNOLOGY Status Page I/O

INFORMATION AND EDUCATIONAL TECHNOLOGY Architecture