15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management1 Mass Storage Management improvised report for LHC Computing Review Software Panel Manuel Delfino CERN IT Division Disclaimer: I wrote this mostly from memory and had no time to check with experts. I am sure general ideas are right, some details may be slightly misrepresented
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management2 What is the problem ? Copying a file between disk tape is easy Except if the tape breaks Except if the drive is being used by someone Except if you have forgotten the tape number Except if you have forgotten the file number (maybe) except if you need high performance (maybe) except if hunting needle in haystack Another major issue is “caring for the media” Tapes should be wound at least once a year Migrate between media to follow technology Replicate media between regional centers
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management3 What is the solution ? Managed Storage (of a kind we can performance tune) If fully deployed, it is a bit more than Hierarchical Storage Management Inventory Error statistics and failure prediction Automatic migration and replication One question: How much functionality do we need vs. how much we can easily afford ?
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management4 Why multiple solutions ? History dating back many years Each major lab has ~ 3 FTEs on this Most of their effort goes into running the production systems Enough left to do something Probably not enough to “productize” Historically software very tied to the hardware – (not so true anymore) Needs pushed emphasis on performance rather than functionality – (n.s.t.a) Lack of coordination of “next step”
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management5 Eurostore (I and II) Beginning of HERA era, a glimer of hope: Lachman OSM commercial HSM Alas, it went bankrupt… Released the code but: Unmaintainable Missing functionality DESY leads EU project to continue OSM CERN involved in testing Eurostore did not manage to deliver a production quality system Hence Eurostore II CERN’s commitment is in testing (again) Main test will be ALICE Data Challenge 2001
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management6 HPSS from IBM Government Sys Another possible commercial solution, but… “Works best” on IBM computers/tapes Dominated by supercomputer crowd Is it really a mainstream IBM product ? CERN involvement: Evaluate product Port some components to Digital Unix (project funded by Digital/Compaq) Real tests with production: NA45 data Current backend support for “hsm” command Objectivity (interface developed orig. by SLAC)
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management7 CASTOR Modernize and cleanup CERN Stage Move from “Tape oriented” to “Managed Storage” concepts Complete backward compatibility “Phase I” functionality now being deployed is enough for LEP, fixed target Not a “black box” anymore, but a set of well defined components that interact Preparation for ALICE Data Challenge 2000 achieved >100 MB/s sustained Surprise ! See performance limitations due to memory-bandwidth in the machines
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management8 Others… interSTAGE: Attempt by CERN and FNAL Introduce requirement of WAN transfers Try to entice all labs to work together on “next step” Mostly fizzled out ENSTORE: FNAL based on OSM design “No name”: Vestige of interSTAGE, agreement to produce a set of APIs for HEP mass storage management
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management9 Which way in future ? Lots of experience being gained ALICE data challenges COMPASS and HARP RUN II, BaBar, HERA, etc… Relatively relaxed about it because: The big problem is disk network program Nevertheless, future plans should include: Demonstrate simple sure shot for CDR Decide needed functionality and have one system used by all LHC experiments if poss