Download presentation
Presentation is loading. Please wait.
Published byAnna George Modified over 9 years ago
1
Mass Storage @ RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory
2
SLAC -- October 1999Mass Storage @ RCF2 Overview t Data Types: –Raw: very large volume (xPB), average bandwidth (50MB/s). –DST: average volume (x00TB), large bandwidth (x00MB/s). –mDST: low volume (x0TB), large bandwidth (x00MB/s).
3
SLAC -- October 1999Mass Storage @ RCF3 Data Flow (generic) RHIC File Servers (DST/mDST) Reconstruction Farm (Linux) Analysis Farm (Linux) Archive (HPSS) raw DST mDST DST 35MB/s 50MB/s 10MB/s 200MB/s 400MB/s
4
SLAC -- October 1999Mass Storage @ RCF4
5
SLAC -- October 1999Mass Storage @ RCF5 Present resources t Tape Storage: –(1) STK Powderhorn silo (6000 cart.) –(11) SD-3 (Redwood) drives. –(10) 9840 (Eagle) drives. t Disk Storage: –~8TB of RAID disk. 1TB for HPSS cache. 7TB Unix workspace. t Servers: –(5) RS/6000 H50/70 for HPSS. –(6) E450&E4000 for file serving and data mining.
6
SLAC -- October 1999Mass Storage @ RCF6 The HPSS Archive t Constraints - large capacity & high bandwidth: –Two types of tape technology: SD-3 (best $/GB) & 9840 (best $/MB/s). –Two tape layers hierarchies. Easy management of the migration. t Reliable and fast disk storage: –FC attached RAID disk. t Platform compatible with HPSS: –IBM, SUN, SGI.
7
SLAC -- October 1999Mass Storage @ RCF7 HPSS Structure t (1) Core Server: –RS/6000 Model H50 –4x CPU –2GB RAM –Fast Ethernet (control) –Hardware RAID (metadata storage)
8
SLAC -- October 1999Mass Storage @ RCF8 HPSS Structure t (3) Movers: –RS/6000 Model H70 –4x CPU –1GB RAM –Fast Ethernet (control) –Gigabit Ethernet (data) (1500&9000MTU) –2x FC attached RAID - 300GB - disk cache –(3-4) SD-3 “Redwood” tape transports –(3-4) 9840 “Eagle” tape transports
9
SLAC -- October 1999Mass Storage @ RCF9 HPSS Structure t Guarantee availability of resources for a specific user group separate resources separate PVRs & movers. t One mover per user group total exposure to single-machine failure. t Guarantee availability of resources for Data Acquisition stream separate hierarchies. t Result: 2PVR&2COS&1Mvr per group.
10
SLAC -- October 1999Mass Storage @ RCF10 HPSS topology M3M2M1Core Net 2 - Control (100baseT) Net 1 - Data (1000baseSX) STK 10baseT N x PVR pftpd Client (Routing)
11
SLAC -- October 1999Mass Storage @ RCF11 HPSS Performance t 80 MB/sec for the disk subsystem. t 1 CPU per 40MB/sec for TCPIP (Gbit) traffic (1500MTU). t ~8MB/sec per SD-3 transport. t ? per 9840 transport.
12
SLAC -- October 1999Mass Storage @ RCF12 I/O intensive systems t Mining and Analysis systems. t High I/O & moderate CPU usage. t To avoid large network traffic merge file servers with HPSS movers: –Major problem with HPSS support on non-AIX platforms. –Several (Sun) SMP machines or Large (SGI) Modular System.
13
SLAC -- October 1999Mass Storage @ RCF13 I/O intensive systems t (6) NFS file servers for workareas –(5) x E450 + (1) x E4000 –4(6) x CPU; 2GB RAM; Fast/Gbit Ethernet. –2 x FC attached hardware RAID - 1.5TB t (1) NFS Home directory server (E450). t (3+3) AFS Servers (code dev. & home dirs.) –RS/6000 model E30 and 43P t (NFS to AFS migration)
14
SLAC -- October 1999Mass Storage @ RCF14 Problems t Short lifecycle of the SD-3 heads. –~ 500 hours < 2 months @ average usage. (6 of 10 drives in 10 months) t Low throughput interface (F/W) for SD-3 -> high slot consumption. t 9840 ??? t HPSS: tape cartridge closure @ transport error. –Built a monitoring tool to try to predict transport failure (based of soft error frequency). t SFS response when heavy loaded - no graceful failure (timeouts & lost connections).
15
SLAC -- October 1999Mass Storage @ RCF15 Issues t Partially tested two tape layer hierarchies: –Cartridge based migration. –Manually scheduled reclaim. t Integration of file server and mover functions on the same node: –Solaris mover port. –Not an objective anymore.
16
SLAC -- October 1999Mass Storage @ RCF16 Issues t Guarantee avail. of resources for specific user groups: –Separate PVRs & movers. –Total exposure to single-mach. failure ! t Reliability: –Distribute resources across movers share movers (acceptable?). –Inter-mover traffic: 1 CPU per 40MB/sec TCPIP per adapter: Expensive!!!
17
SLAC -- October 1999Mass Storage @ RCF17 Inter-mover traffic - solutions t Affinity. –Limited applicability. t Diskless hierarchies. –Not for SD-3. Not tested on 9840. t High performance networking: SP switch. –IBM only. t Lighter protocol: HIPPI. –Expensive hardware. t Multiply attached storage (SAN). –Requires HPSS modifications.
18
SLAC -- October 1999Mass Storage @ RCF18 Multiply Attached Storage (SAN) Mover 1Mover 2 Client (!) 1 2
19
SLAC -- October 1999Mass Storage @ RCF19 Summary t Problems with divergent requirements: –Cost effective archive capacity and bandwidth. Two tape hierarchies: SD-3 & 9840. Test the configuration. –Availability and reliability of HPSS resources. Separated COS and shared movers. Inter-mover traffic ?!? t Merger of file servers and HPSS movers?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.