Disaster Documents The role of documentation in disaster recovery by Ray Kim
About me Ray Kim LinkedIn: (yes, I am a musician in my spare 25+ years experience working various tech positions (software developer, web developer, systems analyst, technical writer, adjunct instructor, computer operator) BS in computer science, Syracuse University MS in technical communication, Rensselaer Polytechnic Institute Member of CASSUG (Albany SQL user group) and Albany UX/UI user group, former member of Society for Technical Communication Interests: my wife and two cats, playing music (I play four different instruments), listening to music (classical, jazz, progressive rock), Syracuse football and basketball (GO ORANGE!!!), RPI ice hockey (GO RED!!!), NY Yankees baseball, fantasy football, CrossFit, KKΨ band fraternity
About this presentation This is NOT a technology-specific presentation This presentation is based on personal experience I don’t like to lecture – I prefer to discuss issues and act as a facilitator Please ask questions! Feel free to engage!
About this presentation This presentation does NOT discuss disaster recovery plans, strategies, or technologies. Instead, this presentation focuses on documentation in and of itself. This presentation is a narrative of the events that occurred after a disaster, and offers some lessons learned.
Abstract Here’s what we’ll discuss: The backstory (company info, environment, etc.) About the Server Team About the documents What was missing Where we got lucky After the planes hit Aftermath Lessons learned
The backstory... I was an employee of Empire Blue Cross on Sept. 11, 2001, working primarily out of the Albany, NY office. Empire had an office in the World Trade Center. I worked for a department called Enterprise Server Technologies (a.k.a. the “Server Team”) The Server Team supported several hundred servers, including large data centers in Albany and the World Trade Center. Our company lost nine employees that day.
About the Server Team Manager Department leader Server jockeys Responsible for building and maintaining the server environment Me – analyst My job was to provide the rest of the Server Team whatever information (or resources) they needed in any way possible I provided documentation and data I did a lot of the Server Team’s behind-the-scenes “dirty work” The Server Team was a support department my job was to support the support department.
The calm before the storm I had developed these documents for the Server Team: Server installation checklists Server room maps Instructions for sending and obtaining tape backups to and from Iron Mountain Vendor contact lists Internal telephone contact list At the time, all of these documents existed primarily as hardcopy, PDF, Word, or Visio files. Online documentation was still a foreign concept. All of these documents were critical when disaster struck.
Server installation checklists Why it was important: build info for server allowed us to rebuild the server Created in MS Word Checklist was completed for every server that was built Included information about each server, including role, hardware information, operating system, configurations, and installed software Hardcopy checklists were stored in a file cabinet; checklists were also scanned and stored on a hard drive as PDF files Checklists later evolved into an online version using a SQL Server back-end, and formed the basis for an online server inventory tracking system
Server room maps Why it was important: provided a list of what was in each server room Created in Visio Provided server room layouts, including rack and server locations Multiple hardcopies were kept in all server rooms for use by Server Team personnel Manually updated roughly every few weeks (the technology to automate this didn’t yet exist) Maintained by yours truly, going from rack to rack while carrying a clipboard Eventually evolved into a SQL Server-based server inventory database that included a clickable HTML server room image map on the front-end
Instructions for Iron Mountain Why it was important: provided instructions on how to obtain backup tapes from offsite storage Provided names and contact information Also included instructions on how to prepare backup tapes for pickup, as well as instructions for requesting backups from Iron Mountain We had a regular weekly backup pickup schedule. Obtaining backups was done on an as-needed basis
Vendor contact lists Why it was important: provided contact info for suppliers Created in MS Word (might have been Excel – I don’t remember) Included vendor names and contacts with whom we maintained a business relationship Hardware, software, Microsoft products, even some swag
Internal telephone contact list Why it was important: provided contact info for key employees Created in MS Word Small business card-sized document that would fit into an ID badge holder or a wallet Included office, home, and mobile phone numbers for all members of the Server Team, as well as other important internal business contacts Distributed to all Server Team members and key staff members
What we didn’t have Documented business continuity plan At the departmental level, we didn’t have a publically available document that explained business roles If there was such a document at higher levels, we didn’t know about it Documented disaster recovery plan The documents we did have could’ve been included in a DR plan Backup departmental intranet/file server What if our intranet/file server had been in the WTC?
Where we got lucky The original documents were on my hard drive – which was in Albany, not the WTC office. Documents were also accessible on a departmental intranet server located in Albany, not in the WTC. We had hardcopies of nearly all these documents. File copies were in multiple locations Multiple people had copies of these documents. We had a highly-functioning (and close-knit) team. What we didn’t have didn’t kill us.
After the planes hit... All productivity came to a halt Server Team members and corporate leaders from other office locations – including NYC, Harrisburg, and Syracuse – convened in Albany We made the decision – and came up with a strategy (remember: we didn’t have a BC or DR plan) – to rebuild critical server infrastructure in Albany. Domain controllers came first. Critical data servers would be next. Additional servers were prioritized based on business importance. The first document I was asked to distribute were the contact lists – I was asked to print out as many as possible, and make sure everyone got one.
The next two weeks... We worked around-the-clock every day for the next two weeks to rebuild the lost infrastructure. Server room maps told us what had been in the WTC. Vendors supplied us with what we needed. Hardware and software We started with existing inventory Requests were expedited Server jockeys rebuilt servers from installation checklists. Iron Mountain provided data backups.
The next two weeks... We worked in shifts – workers would build servers, then the next shift picked up where they left off. Everyone wanted to do their part. One Server Team member was on vacation when the planes hit – he was on an island unable to catch a flight home. Even I had to be told, “go home and get some sleep!” All attempts were made to maintain morale. Workers from out-of-town were allowed to take trips back home. We were encouraged to maintain normalcy – home, family, extracurricular activities, etc. Breakfast was provided – lots of pancakes, eggs, sausage, and bacon.
Aftermath After two weeks... Server infrastructure was operational so that the company could conduct business. Out-of-town workers returned home and were temporarily assigned to other regional offices. Long term... NYC office relocated to a new location in Brooklyn (MetroTech) New office included state-of-the-art technology Documentation went online (SQL Server back-end)
Some takeaways from the experience Write it down!!! Paper documents are NOT obsolete!!! “The Dallas’ chart table was a new gadget…the display moved as the Dallas moved. This made paper charts obsolete, though they were kept anyway. Charts can’t break.” – Tom Clancy, The Hunt for Red October Store documents where they’re secure and accessible. Have multiple copies, multiple locations, and backups. Make sure everyone knows how to get or where to find the documents. Sometimes, the most innocuous documents can make a big difference.
Some takeaways from the experience Communication and teamwork are important! We had a tight-knit group* that could turn on a dime. *This is not always a good thing – check out my Groupthink presentation sometime. Document your plan, and make sure people know what it is! Could we have performed better with a documented business continuity or disaster recovery plan? Maintaining morale during a tough time can go a long way. Maintain as much normalcy as possible. Rest!
Be prepared! Disaster documentation is worthless if: It’s destroyed in the disaster Nobody can use it Nobody can access it Nobody understands it It doesn’t exist Don’t wait until it’s too late to find out how important documentation is for disaster recovery!