Presentation is loading. Please wait.

Presentation is loading. Please wait.

25-29 May 2009, HEPiX Spring ASGC Site Report Jason Shih ASGC/OPS HEPiX Fall 2009 Umea, Sweden.

Similar presentations


Presentation on theme: "25-29 May 2009, HEPiX Spring ASGC Site Report Jason Shih ASGC/OPS HEPiX Fall 2009 Umea, Sweden."— Presentation transcript:

1 25-29 May 2009, HEPiX Spring ASGC Site Report Jason Shih ASGC/OPS HEPiX Fall 2009 Umea, Sweden

2 25-29 May 2009 HEPiX Spring Overview Fire incident Hardware Network Storage Future remarks

3 25-29 May 2009 HEPiX Spring Fire incident – event summary Damage Analysis: fire was limited at the power room Severe damage of UPS wiring of power system, AHR Smoke dust pervaded and smudged almost every where, including computing & storage systems History and Planning 16:53 Feb. 25 UPS battery burning 19:50 Feb. 25 Fire extinguishment by Fire department 10:00 Feb. 26 Fire scene investigation by Fire department 15:00 Feb 26 ~ Mar 23 DC cleaning, re-partitioning, re-wiring, deoderization, and re-installation from ceiling to ground under raised floor, from power room to machine room, from power system, air conditioning, fire prevention system to computing system All facilities moved outside to cleaning Mar 23 Computing System installation Mar 23 ~ Apr 9 Recovery of Monitoring, Environment control and Access control system

4 25-29 May 2009 HEPiX Spring Fire incident – recovery plan DC Consultant will review the re-design on Mar. 11, schedule will be revised based on the inspection Tier1/Tier2 services will be collocated at IDC for 3 months from Mar. 20

5 25-29 May 2009 HEPiX Spring Fire incident – review/lessons (I) DC Infrastructure Standards to comply with ANSI TIA/EIA ASHRAE thermal guideline for data processing env. Guidelines for green data centers are available, e.g., LEED NFPA: Fire suppression system Capacity and type of UPS (min. scale) Vary by the responding time of generators Adjust rating of all breaks (NFB and ACB) Location of UPS (open space & outside PR) Regular maintenance of batteries Inner resistance measurement

6 25-29 May 2009 HEPiX Spring Fire incident – review/lessons (II) Smoke damage: Fire stopping Improvement of monitoring system Re-design the monitoring sys. Earlier pre-action: consider: VESDA Emergent response and procedures Routine Fire drill is indispensable Disaster Recovery plan is necessary Other improvement: PP and H/C aisle splitting Fiber panels: MDF and FOR OH cable tray (exist: PWR tray in subfloor)+ Fiber guide Raised floor grommets

7 25-29 May 2009 HEPiX Spring Move out all facilities for cleaning Container as storage and humidification Protect Racks from Dust Ceiling Removal

8 25-29 May 2009 HEPiX Spring Fire incident - Tape system Snapshots of decommissioned tape drives after the incident

9 25-29 May 2009 HEPiX Spring DC recovered – mid of May FOR in area #1 MDF move to center of DC area H/C aisle fully split Plan to replace racks to provide 1100mm depth

10 25-29 May 2009 HEPiX Spring IDC Collocation (I) Site selection and paper processing - one week Preparation at IDC – one week 15R + reservation for tape system (6R) Power (14kW per racks) cooling (perforated raise floor) 10G protection SDH STM-64 networking between IDC and ASGC

11 25-29 May 2009 HEPiX Spring IDC collocation (II) Relocation of 50+% computing/storage – one week 2k job slots (3.2MSI2K), 26 chassis of blade servers 2.3PB storage (1PB allocated dynamically) Cabling + setup + reconfiguration – one week

12 25-29 May 2009 HEPiX Spring IDC collocation (III) Facility install complete at Mar 27 th Tape system delay after Apr 9 th Realignment RMA for faulty parts

13 25-29 May 2009 HEPiX Spring T1 performance 7G peak reach to Amsterdam 9G peak observed between IDC/ASGC

14 25-29 May 2009 HEPiX Spring Network – before May KREONET2 CSTNet HARNet GE HKIX M120 Pacnet IP Transit APAN-JP KEK GE JPIX SINet WIDE GE GE*2 NUS GE AARNet 2.5G WL non-protect NCIC - 2.5G(STM-16) SDH 622M(STM-4) SDH on APCN2 100M M120 M20 M320 CERNet TWGate IP Transit 100M JP, KDDI Otemachi Sinica, Taipei HK, Mega-iAdvantage SG, KIM CHUNG

15 25-29 May 2009 HEPiX Spring Network - 2009 KREONET2 CSTNet HARNet GE HKIX M120 Pacnet IP Transit APAN-JP KEK GE JPIX SINet WIDE GE GE*2 SingAREN GE AARNet NUS GE STM-16 SDH 2.5G(STM-16) SDH 622M(STM-4) SDH on EAC 100M M120 M20 M320 CERNet TWGate IP Transit 100M Sinica, Taipei HK, Mega-iAdvantage JP, KDDI Otemachi Singapore, Global Switch

16 25-29 May 2009 HEPiX Spring ASGC Resource Level Targets DateCPU (MSI2k)Disk (PB)Tape (PB) Current 2.41.20.8 Year End 5.62.41.3 MoU 2009 7.553.152.1 2008 0.5PB expansion of Tape system in Q2 Meet MOU target mid of Nov. 1.3MSI2k per rack base on recent E5450 processor. 2009 150 QC blade servers 2TB per drives for raid subsystem 42TB net capacity per chassis and 0.75PB in total

17 25-29 May 2009 HEPiX Spring Hardware Profile and Selection (I) CPU: 2K8 Expansion: 330 blade server provide 3.6KSI2k 7U height chassis SMP Xeon E5430 processors, 16GB FB-DIMM each blade provide 11KSI2k 2 blade/U density, Web/SOL management current capacity: 2.4MSI2k Year end total computing power: ~5.6MSI2k 22KSI2k/U (24 chassis in 168U)

18 25-29 May 2009 HEPiX Spring Tape system Before incident: LTO3 * 8 + LTO4 * 4 720TB with LTO3 530TB with LTO4 May 2009: Two loan LOT3 drives MES: 6 LTO4 drives end of May Capacity: 1.3PB (old) + 0.8PB (LTO4) New S54 model introduced 2K slots with tier model Upgrade ALMS Enhanced gripper

19 25-29 May 2009 HEPiX Spring Roadmap – Host I/F 2009 Q1Q2Q3Q4 4G FC ( ≈ 400 MB/sec) 8G FC ( ≈ 800 MB/sec) SAS 3G (4-lane ≈ 1200 MB/sec) iSCSI – 1Gb U320 - SCSI ( ≈ 320 MB/sec) iSCSI – 10 Gb SAS 6G (4-lane ≈ 2400 MB/sec) 3U16bay FC-SAS in May, 2U/12 and 4U/24 bay in June

20 25-29 May 2009 HEPiX Spring Roadmap – Drive I/F 2009 Q1Q2Q3Q4 4G FC SAS 3G SAS 6G U320 - SCSI SATA-II 2.5” SSD (B12F series)

21 25-29 May 2009 HEPiX Spring Est. Density 2009 H1 1TB, 1 rack (42U)= 240TB 2009 H2 2TB, 1 rack (42U)= 480TB 2010 H1 2TB, 1 rack (42U)= 480TB 2010 H2 3TB, 1 rack (42U)= 720TB 2012 5TB…..

22 25-29 May 2009 HEPiX Spring Future remarks DC full restore end of May Restart run-the-clock operation Resources relocated fully involved in STEP09 Facility relocation end of Jun from IDC New resource expansion end of Jul Improve DC monitoring

23 25-29 May 2009 HEPiX Spring Water mist Fire suppresion system Review the implementation of Gas supression system Consider water mist in power room Wall cabinet outside data center area

24 25-29 May 2009 HEPiX Spring Water mist – design plan


Download ppt "25-29 May 2009, HEPiX Spring ASGC Site Report Jason Shih ASGC/OPS HEPiX Fall 2009 Umea, Sweden."

Similar presentations


Ads by Google