Presentation is loading. Please wait.

Presentation is loading. Please wait.

PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory.

Similar presentations


Presentation on theme: "PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory."— Presentation transcript:

1 PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory

2 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC2 Who are we? t Relativistic Heavy-Ion Collider @ BNL –Four experiments: Phenix, Star, Phobos, Brahms. –1.5PB per year. –~500MB/sec. –>20,000SpecInt95. t Startup in May 2000 at 50% capacity and ramp up to nominal parameters in 1 year.

3 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC3 Overview t Data Types: –Raw: very large volume (1.2PB/yr.), average bandwidth (50MB/s). –DST: average volume (500TB), large bandwidth (200MB/s). –mDST: low volume (<100TB), large bandwidth (400MB/s).

4 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC4 Data Flow (generic) RHIC File Servers (DST/mDST) Reconstruction Farm (Linux) Analysis Farm (Linux) Archive raw DST mDST DST 35MB/s 50MB/s 10MB/s 200MB/s 400MB/s

5 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC5 The Data Store t HPSS (ver. 4.1.1 patch level 2) –Deployed in 1998. –After overcoming some growth difficulties we consider the present implementation successful. –One major/total reconfiguration to adapt to new hardware (and system understanding). –Flexible enough for our needs. One shortage: preemptable priority schema. –Very high performance.

6 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC6 The HPSS Archive t Constraints - large capacity & high bandwidth: –Two types of tape technology: SD-3 (best $/GB) & 9840 (best $/MB/s). –Two tape layers hierarchies. Easy management of the migration. t Reliable and fast disk storage: –FC attached RAID disk. t Platform compatible with HPSS: –IBM, SUN, SGI.

7 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC7 Present Resources t Tape Storage: –(1) STK Powderhorn silo (6000 cart.) –(11) SD-3 (Redwood) drives. –(10) 9840 (Eagle) drives. t Disk Storage: –~8TB of RAID disk. 1TB for HPSS cache. 7TB Unix workspace. t Servers: –(5) RS/6000 H50/70 for HPSS. –(6) E450&E4000 for file serving and data mining.

8 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC8

9 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC9

10 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC10

11 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC11 HPSS Structure t (1) Core Server: –RS/6000 Model H50 –4x CPU –2GB RAM –Fast Ethernet (control) –OS mirrored storage for metadata (6pv.)

12 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC12 HPSS Structure t (3) Movers: –RS/6000 Model H70 –4x CPU –1GB RAM –Fast Ethernet (control) –Gigabit Ethernet (data) (1500&9000MTU) –2x FC attached RAID - 300GB - disk cache –(3-4) SD-3 “Redwood” tape transports –(3-4) 9840 “Eagle” tape transports

13 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC13 HPSS Structure t Guarantee availability of resources for a specific user group separate resources separate PVRs & movers. t One mover per user group total exposure to single-machine failure. t Guarantee availability of resources for Data Acquisition stream separate hierarchies. t Result: 2PVR&2COS&1Mvr per group.

14 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC14 HPSS Structure

15 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC15 HPSS Topology M3M2M1Core Net 2 - Control (100baseT) Net 1 - Data (1000baseSX) STK 10baseT N x PVR pftpd Client (Routing)

16 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC16 HPSS Performance t 80 MB/sec for the disk subsystem. t ~1 CPU per 40MB/sec for TCPIP Gbit traffic @ 1500MTU or 90MB/sec @ 9000MTU t >9MB/sec per SD-3 transport. t ~10MB/sec per 9840 transport.

17 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC17 I/O Intensive Systems t Mining and Analysis systems. t High I/O & moderate CPU usage. t To avoid large network traffic merge file servers with HPSS movers: –Major problem with HPSS support on non-AIX platforms. –Several (Sun) SMP machines or Large (SGI) Modular System.

18 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC18 Problems t Short lifecycle of the SD-3 heads. –~ 500 hours < 2 months @ average usage. (6 of 10 drives in 10 months). –Built a monitoring tool to try to predict transport failure (based of soft error frequency). t Low throughput interface (F/W) for SD-3: high slot consumption. t SD-3 production discontinued?! t 9840 ???

19 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC19 Issues t Tested the two tape layer hierarchies: –Cartridge based migration. –Manually scheduled reclaim. t Work with large files. Preferable ~1GB. Tolerable >200MB. –Is this true with 9840 tape transports? t Don’t think at NFS. Wait for DFS/GPFS? –We use exclusively pftp.

20 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC20 Issues t Guarantee avail. of resources for specific user groups: –Separate PVRs & movers. –Total exposure to single-mach. failure ! t Reliability: –Distribute resources across movers share movers (acceptable?). –Inter-mover traffic: 1 CPU per 40MB/sec TCPIP per adapter: Expensive!!!

21 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC21 Inter-Mover Traffic - Solutions t Affinity. –Limited applicability. t Diskless hierarchies (not for DFS/GPFS). –Not for SD-3. Not enough tests on 9840. t High performance networking: SP switch. (This is your friend.) –IBM only. t Lighter protocol: HIPPI. –Expensive hardware. t Multiply attached storage (SAN). Most promising! See STK’s talk. Requires HPSS modifications.

22 CHEP 2000 -- PadovaPetaByte Storage Facility at RHIC22 Summary t HPSS works for us. t Buy an SP2 and the SP switch. –Simplified admin. Fast interconnect. Ready for GPFS. t Keep an eye on the STK’s SAN/RAIT. t Avoid SD-3. (not a risk anymore) t Avoid small file access. At least for the moment.

23 Thank you! Razvan Popescu popescu@bnl.gov


Download ppt "PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory."

Similar presentations


Ads by Google