Kian-Tat Lim Offline Computing November 12 th, LCLS Offline Data Management
Kian-Tat Lim Offline Computing November 12 th, Data Requirements At full capacity, 120 Hz, we will see: Up to 240 MB/s per experiment. Up to 100 TB/day across entire system. 400–600 TB raw data per run, but only expect 10% of data to be useful. We have designed and are building a storage system able to scale to these volumes. (Capacity depends on budget.)
Kian-Tat Lim Offline Computing November 12 th, Offline System Architecture
Kian-Tat Lim Offline Computing November 12 th, File Handling: Export Interface HDF5 files plus metadata from science metadata database and electronic logbook. Network transport: Implemented using GridFTP, scp, bbcp. Disk transport: Implemented using e-SATA or USB 2.0.
Kian-Tat Lim Offline Computing November 12 th, Export Times Entire datasets are too large for disk export. Assume one run is copied at 100 MB/s (very-high-speed network). 40 TB takes 5.8 days. 600 TB takes 87 days. Can possibly overlap export with data-taking.
Kian-Tat Lim Offline Computing November 12 th, Analysis Requirements 2-D FFTs on each of 30 million frames, 100 MFLOP/frame = 3000 TFLOP. To complete analysis in 1 day requires 35 GFLOPS. Three levels of sophistication: Analyze off-site after exporting data. Analyze on-site using external code running on SLAC facilities. Analyze on-site using external code written with SLAC frameworks running on SLAC facilities.
Kian-Tat Lim Offline Computing November 12 th, Processing Components We have proposed a placeholder Processing Cluster and Workflow Manager, to be tightly integrated with the data storage.
Kian-Tat Lim Offline Computing November 12 th, We are building a large-scale data storage infrastructure. Export of full datasets is impractical. Initial analysis should be done on-site. Analysis facilities can be supported on the current design but: They are not fully defined. They are not funded. An LCLS computing coordinator is needed immediately to prepare an analysis plan to avoid having science limited by computing rather than the accelerator. Summary