Presentation is loading. Please wait.

Presentation is loading. Please wait.

01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.

Similar presentations


Presentation on theme: "01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT."— Presentation transcript:

1 01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT

2 01. December 2004Bernd Panzer-Steindel, CERN/IT2 Concentrate on linear tape technology, not helical scan (Exabyte, AIT..) Today’s choices : IBM 3592 drives  40 MB/s and 300 GB cartridges 25 KCHF and 0.7 CHF/GB StorageTek 9940B drives  30 MB/s and 200 GB cartridges 35 KCHF and 0.6 CHF/GB LTO consortium (HP, IBM, Certance) LTO-2 drives  20 MB/s and 200 GB cartridges 15 KCHF and 0.4 CHF/GB (decreased by factor 2 during last 18 month) LTO-3 drives are available since about 3 weeks, 40-60 MB/s and 400 GB cartridges  0.4 CHF/GB the standalone drive costs about 6 KCHF, modified robotic drives are available 3-6 month later and the modifications (fibre channel, extra mechanics,etc.) adds another ~ 10 KCHF to the cost the error on the drive costs is up to 50%, media costs varies by 10%

3 01. December 2004Bernd Panzer-Steindel, CERN/IT3 There are very little choices in the large robotic tape storage area. 6500 cartridge silo including robotics ~ 1 MCHF +- 30 % (the best, flexible in the market is currently probably the STK 8500 tape library) an old STK powderhorn silo (5500 slots) costs about 200 KCHF, but does not support LTO or the new IBM drives to be considered : single large installation or distributed, separate installations locality of data, load balancing for reading is defined at writing time regular physical movement of tapes is not really an option the pass-through mechanism between silos has still ‘locality’ restrictions  one move of a tape = one mount of a tape prices of robots and drives have large error margins (50%), because these are non-commodity products and depend heavily on the negotiations with the vendors (level of discount)

4 01. December 2004Bernd Panzer-Steindel, CERN/IT4 Time matrix of drives STK : 9940B, 200 GB cassettes, 30 MB/s speed, today STK : STKA, 500 GB cassettes, 120 MB/s speed, mid 2005 STK : STKB, 1000 GB cassettes, 240 MB/s speed, beg. 2008 ? IBM : 3592A, 300 GB cassettes, 40 MB/s speed, today IBM : 3592B, 600 GB cassettes, 80 MB/s speed, beg. 2006 LTO : LTO2, 200 GB cassettes, 20 MB/s speed, today LTO : LTO3, 400 GB cassettes, 60 MB/s speed, mid 2005 LTO : LTO4, 800 GB cassettes, 120 MB/s speed, beg. 2008 Boundaries :  would like to ‘see’ the drive for about one year in the market  5 years max lifetime for a drive technology  one year overlap of old and new tape drive installations  have a new service ready for 2007  not easy to achieve

5 01. December 2004Bernd Panzer-Steindel, CERN/IT5 Copying of tape data if one assumes a 4 year effective lifetime of tape drives and a linear data growth, then the average lifetime of tape data is less than 3 years (  disk lifetime) cost of the copy, example : Assume : 10 PB of data, new cartridges cost 0.2 CHF/GB, copy needs to be done within 300 days, drive performance is 25 MB/s (inefficiencies included) Gain : double density tapes, 10 PB free space == 2 MCHF Loss : 10 PB in 300 days == 390 MB/s copy speed == 16 input drives (model A) + 16 output drives (model B) ~ 1.3 MCHF 1 FTE + 48h disk buffer + extra tapes/slots ~ 0.3 MCHF  Gain 2 MCHF Loss 1.6 MCHF Copy is mandatory at the end of a tape drive technology cycle. In-between it might be cost effective to re-use the already existing tapes at double density There are some uncertainties in the calculations and it might not be cost-effective

6 01. December 2004Bernd Panzer-Steindel, CERN/IT6 Comparing disk and tape storage A few example calculations (a bit provocative): need 1 PB of storage with 1 GB/s performance (today) Tape : 50 tape drives with 35 MB/s at 40 KCHF each (including infrastructure), 50 % efficiency 5000 cartridges, 200 GB, 0.5 CHF/GB + 5000 slots in a robotic silo, 1 MCHF  total 3.5 MCHF Disks (1) : NAS server with 2 TB each, RAID5, 3 CHF/GB  3 MCHF, 500 server, 50 GB/s Disk (2) : NAS server, each with 100 * 200 GB USB/Firewire disks, RAID5, ~1.5 CHF/GB 65 server + 6500 disks  325 KCHF + 1.5 MCHF ~ 1.8 MCHF (6.5 GB/s) (feasibility study under way…) need 10 PB with 5 GB/s Tape : 10 MCHF (silos+robotic) + 5 MCHF (tapes) + 10 MCHF (drives) == 25 MCHF disk (1) : NAS server ~ 30 MCHF disk (2) : NAS server USB/Firewire ~ 18 MCHF But the cost of complexity is NOT included and the cost uncertainties are large. This just shows that disk and tape storage costs are already comparable

7 01. December 2004Bernd Panzer-Steindel, CERN/IT7 Disk space versus tape space Application access pattern dependent Re-use of cached files if low (e.g. random sampling over large data sets, reprocessing production)  minimum disk space == average aggregate drives speed * ~24h (e.g. 100 MB/s == 10 TB) Correlated to the efficient use of tape drives at speeds higher than ~ 50 MB/s Number of I/O streams from batch jobs : ranges from sequential access of large files to sequential access of small files, ‘hoping’ through a large number  random access on disks Consequence : moving from a fully distributed, ‘averaged’ access to a more synchronous access with guaranteed stream speeds  much more sophistication needed in file system, disk server and HSM tuning and organization

8 01. December 2004Bernd Panzer-Steindel, CERN/IT8 speed of the robot distribution of tapes in silos (at the time of writing the data) number of running batch jobs internal organization of jobs ( exp.) (e.g. just request file before usage) priority policies (between exp. and within exp.) HSM scheduling implementation tape drive speed tape drive efficiency disk server filesystem OS + driver HSM database performance HSM load balancing mechanism monitoring Fault Tolerance disk server optimization data layout on disk exp. policy access patterns (exp.) performance overall file size bugs and features Mass storage performance First set of parameters defining the access performance of an application to the mass Storage system

9 01. December 2004Bernd Panzer-Steindel, CERN/IT9 Tape storage Analyzing the Lxbatch inefficiency trends, wait time due to tape queues stager hits #files limit Example, influence on batch system

10 01. December 2004Bernd Panzer-Steindel, CERN/IT10 Average file size on disk ATLAS 43 MB ALICE 27 MB CMS 67 MB LHCb 130 MB COMPASS 496 MB NA48 93 MB large amounts < 10MB Example : File sizes

11 01. December 2004Bernd Panzer-Steindel, CERN/IT11 file size [MB] efficiency [%] Analytical calculation of tape drive efficiencies tape mount time ~ 120 s file overhead ~ 4.4 s average # files per mount ~ 1.3 large # of batch jobs requesting files, one-by-one 1 7 100 10 40 2 5 3 2020

12 01. December 2004Bernd Panzer-Steindel, CERN/IT12 Storage performance Currently quite some effort is put into the analysis of all the available monitoring information to understand much better the influence of the different parameters on the overall performance. the goal is to be able to calculate the cost of data transfers from tape to the application  CHF per MB/s for volume of X TB combination of problems example : small files + randomness of access possible solutions :  concatenation of files on application or MSS level  extra layer of disk cache, Vendor or ‘home-made’  hierarchy of fast and slow access tape drives  very large amounts of disk space  ……….

13 01. December 2004Bernd Panzer-Steindel, CERN/IT13 Summary  cost calculations are difficult and are site specific  access patterns are very important, but hard to predict  disk versus tapes needs more investigations  the ratio disk space / tape space depends on the access patterns  storage performance efficiencies depend on a lot of parameters more work needed, close collaboration between experiments (computing model implications) and the IT side (Tiers and CERN)


Download ppt "01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT."

Similar presentations


Ads by Google