Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storage for Run 3 Rainer Schwemmer, LHCb Computing Workshop 2015.

Similar presentations


Presentation on theme: "Storage for Run 3 Rainer Schwemmer, LHCb Computing Workshop 2015."— Presentation transcript:

1 Storage for Run 3 Rainer Schwemmer, LHCb Computing Workshop 2015

2 Current Situation daqarea –Mix of 2 and 4 TB disks –340 TB –Max Sustained read or write ~ 7 GB/s –Mixed read and write ~ 3 GB/s farm –Mix of 2 and 4 TB disks –HLT1 reduces data rate by a factor of approximately 4 (250 kHz) –Writing at ~7 MB/s –No problem for current gen disk drives –Min write and read ~ 60 MB/s 1 –Concurrent read/write ~ 20 MB/s 2 1) I know that disks can do 150 MB/s, but this is only true on the very outside of the platters

3 Problems with the current system daqarea –File fragmentation due to excessive number of parallel streams O(100- 1000) –Throughput goes to below what’s needed if more than 70% full –Have to write all data once (1 GB/s) –Need to read all data twice (+2 GB/s) for verification/checksums and castor copy –System needs to be severely overdesigned to cope with mixed read/write workload Farm –Every node has its own, individual file system  Every farm node processes runs at its own pace  Which leads to fragmentation problem in daqarea –Failing drives/farm nodes Every failing node delays processing of all the runs it has files of 3

4 Future 4

5 Farm HLT1 output rate is estimated at 1 MHz  100 Gbyte /s instead of current 13 Gbyte/s Need to scale everything up by about a factor 10! If the size of the farm is comparable –4 TB disks  40 TB disks –7 MB/s  70 MB/s + 70 MB/s concurrent reading 5

6 Kryder’s law 6 http://www.jcmit.com/diskprice.htm

7 Farm cont. Biggest drives on the market today: 8 TB –Will most certainly not increase by x5 over the next 3-4 years Disk throughput is already not increasing at the rate of capacity –70 MB/s mixed read/write is certainly not possible  Depending on farm size (< ~8000 nodes) we will not be able to store HLT1 output locally anymore The current model of individual, local file systems per node is imho already not sustainable –Too much manual intervention –If we do continue with local storage in farm nodes we need a better system of securing data against node failure –Processing of runs needs to become more synchronized 7

8 daqarea Projected output rate (at 100 kHz): 10 Gbyte/s –Need to read data at least twice and write once  30 Gbyte/s minimum  Need O(200-400) Gbyte/s of aggregated disk performance Individual disk might get to 100-120 MB/s –Need O(3000) drives for throughput reasons alone –Currently 140 8

9 Discussion We need to overcome the storage gap created by the slow down in disk capacity increase It might be worth looking into common storage for deferred data and output data –Requirements are similar –Everything on the surface in the future anyway –It seems very unlikely that we can stuff all the deferred data into the farm nodes Possibly have individual “small” storage clusters on a sub-farm level for deferred and data output Look into rotated reading/writing ala Alice to cut down on overdesign for mixed read/write rates 9


Download ppt "Storage for Run 3 Rainer Schwemmer, LHCb Computing Workshop 2015."

Similar presentations


Ads by Google