Download presentation
Presentation is loading. Please wait.
Published byJemimah Lucas Modified over 8 years ago
1
1 Taming the Data Center SDR 1.3 Butch Adkins Infrastructure & Operations
2
2 Data Center World – Certified Vendor Neutral Each presenter is required to certify that their presentation will be vendor-neutral. As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.
3
3 Agenda Background Outages Outages prevented Takeaways 1.Know your environment 2.Establish relationships 3.Plan 4.Communicate
4
4 Backstory Eastern Kentucky University 1979 33 rd year at UK 7 significant sets of job responsibilities 12 offices 18 managers [four multiple times] April 2001
5
5 McVey Hall Data Center: 2001 20 staff Computing facility with about 60 servers Mainframe Multiple tape robots Research Computer Printing Bubble sheet scanning Building
6
6 McVey Hall Data Center McVey Hall was built in 1928 Computing Center began development in the basement in the late 50s Five expansions, the last in 1987 7,400 sq ft raised floor space
7
7 McVey Hall
8
8 McVey Hall Data Center McVey Hall was built in 1928 Computing Center began development in the basement in the late 50s Five expansions, the last in 1987 7,400 sq ft raised floor space
9
9 McVey Hall Data Center
10
10 Challenges Computer Room vs Data Center Increased hardware count [100 to over 1000] Decommissioned mainframe [2006] Research Computing power requirements [195kVA] Power [725 kVA] Air conditioning [155 tons] Space Virtualization [700 VMs] Cloud Uptime Expectations
11
11
12
12 Five 9s…really? 90%36.5 days 99%3.65 days 99.9%8 hours, 46 minutes 99.99%52.5 minutes 99.999%5 minutes, 46 seconds
13
13 Power UPS C UPS Battery relocation Talk-a-phone Dual feed outage
14
14
15
15 UPS battery relocation UPS A utility by-pass breaker closed UPS A “fails” UPS C loses power MDP2 breaker opens 60% of the building load dropped Power restored; 8 minutes Failure was caused by A and C phases on UPS A being reversed The original UPS installation was verified…sort of
16
16 Power UPS C UPS Battery relocation Talk-a-phone Dual feed outage
17
17
18
18 Power Dual feed, main-tie-main UPS C UPS Battery relocation Talk-a-phone Dual feed outage
19
19
20
20 April 2013, just another Tuesday 8:33 a.m. McVey Hall loses power from both feeds 8:45 a.m. Called UPS CE – Called UPS vendor for support 8:51 a.m. Power restored to the building 8:57 a.m. Called UPS CE for reset procedure 9:09 a.m. First UPS reset 9:14 a.m. Second UPS reset 9:20 a.m. All PDU breakers reset 10:08 a.m Requested estimate to replace batteries 10:30 a.m. Different UPS CE returns call 11:37 a.m. Received battery quote for next day install 11:45 a.m. CE arrives onsite [3 hour response] 12:45 p.m. Essential functions restored 4 hours 15 minutes 4:15 p.m. = newspaper headlines
21
21 Planned Data Center Outage UPS C EOL notification extension expiring Reduced need for power UPS C 25% rack penetration Additional circuit breaker installation Required maintenance on rack-out breakers
22
22 UPS C Decommissioning TCO 2013-2016 Projected UPS C costsCost 2013-14 maintenance$7,058 2014-15 maintenance$7,058 2015-16 maintenance$7,058 2013 batteries$27,000 2015 capacitor replacement$12,400 Estimated Total Cost$60,574 Cost of Decommissioning UPS CCost PPD Electricians$7,470 PPD Materials$3,391 Total Cost$10,861 7 Month 2013 Maintenance Cost for UPS C$4,118 Net Savings$45,595
23
23
24
24 2012 NFPA 70E
25
25 Arc Flash Arc flash is the light and heat produced from an electric arc supplied with sufficient electrical energy to cause substantial damage or harm, fire or injury.
26
26 December 28
27
27 Planned Data Center Outage August 2012 – Began discussing UPS C Spring 2013 – Began strategic planning July 2013 – Requested estimates September 2013 – Plan is blessed October 2013 – Plan is unveiled November 2013 – Network is not redundant December 28, 2013 – Plan is executed
28
28 Stats page Staff directly involved – over 200 Number of VMs shut down – over 700 Initial shutdown of systems – 7 p.m. December 27 Planed Outage Start – 11 a.m. December 28 Actual Outage Start – 11:32 a.m. [+ 32] 12:02 p.m. – Only fallout; no power to Infoblox 12:37 p.m. – PPD Electricians complete switchgear maintenance 3:25 p.m. – PPD Electricians complete the UPS A “backfeed” 5:37 p.m. – All power restored [- 1:23] Estimated Outage Completion - 7 p.m. 10:00 p.m. - Substantial return to operation
29
29 Air Conditioning Chiller Hi-density in-row cooling + 3 CRAHs Power sensitivity Service clearance Thermal expansion valve failure Storage Fuse lugs System thresholds Relationships
30
30 Chiller Maintenance
31
31 Chiller Maintenance
32
32 Air Conditioning Chiller Hi-density in-row cooling + 3 CRAHs Power sensitivity Service clearance Thermal expansion valve failure Storage Fuse lugs System thresholds Relationships
33
33 Fire/Power/AC/Storage/VMs 2:03 p.m. Two CRACs lose power 2:35 p.m. Temperatures increased, reported units down 3:40 p.m. First warning from storage 4:01 p.m. First system shuts down SQL VMfarm Blackboard Drupal Account Manager Sharepoint myUK portal 4:04 p.m. AC technician arrives 5:23 p.m. Electrician arrives 5:51 p.m. Temporary fix in place 6:14 p.m. Cool enough to restart storage systems 8:45 p.m. Major systems restored +4:44
34
34
35
35 Fire Outages 1986 cooling fan Halon 2001 Administration Building Popcorn CRAC Lug Window AC unit fan motor PC
36
36
37
37 Water Roof drain Water fountain Transformer vault Drainage Sump Pump Steam Leak
38
38 Takeaways 1.Know your environment 2.Establish relationships 3.Plan 4.Communicate
39
39 Questions???
40
40 Thank you Butch Adkins Infrastructure & Operations University of Kentucky butch@uky.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.