Hall D Computing Facilities Ian Bird 16 March 2001
Overview Comparisons – Hall D computing Estimates of needs –As an illustration – but actual needs require a model –Costs –Staffing –Timeline Other projects – Data Grids Some comments
Some comparisons: Hall D vs other HENP Data Volumes (tape) TB/year Data rates MB/s Disk Cache TB CPU SI95/year People CMS2 000 (total) ~1800 US Atlas (Tier 1) ~500 (?) STAR20040>207000~300 D0/CDF Run II 300~500 BaBar300~500 Not just an issue of equipment. These experiments all have the support of – large dedicated computing groups within the experiments – well defined computing models JLAB– current ~240 (CLAS) Hall D
Process For CDR; computing/analysis chapter –Define the Hall-D computing model Distributed architecture (facilities) Data model Software architecture Collaboratory tools and infrastructure Estimate of costs and funding profile, and management plan Can set bounds (best/worst case) based on technology guesses –All this must be based on: Analysis models – e.g. how will the PWA be done, etc. –Needs strong management – hire “now” a computing professional to lead this Write a Computing Technical Design Report –This can come after the CDR, fixes the ideas from the CDR, provides a detailed implementation plan
JLAB Facilities for Hall D Some crude estimates – –No computing model – has to come first –Maybe too soon to fix technologies
Mass storage – at JLAB How much needs to be on tape – depends on computing model and how well managed the activities are: Assume: –0.75 PB/year raw data –0.75 PB/year reconstructed –0.3 PB/year other –All simulated data stored off-site –1.8 PB/year (minimum) to be stored 300 GB/tape = 6000 tapes = 1 silo 750 GB/tape = 2400 tapes = ½ silo –Need other tapes (DST on fast access, lower density) –Keep data available for 2 years: Pessimistic – 2 silos, Optimistic – 1 silo (for Hall D) –Realistic guess – Hall D should have at least 2 dedicated silos –Number of drives – depends on technology and access model Experience shows need at least 30 drives Lab needs more for other parts of the program
Other storage Disk –Again amount and type depends strongly on the computing model –Not unreasonable to expect to want 20% of data on disk 200 TB ? –Current costs – 1TB/ $10K (IDE), - expect 10? Cost and type depends on requirements
CPU & Networks Not a computing problem for reconstruction –All significant computing is in simulation – most not at JLAB? Level 3 trigger farm –It will be cheaper to compute more and store (and move) less –Conservative assumption – 500 SI95/processor 2 procs in 1u rack = 40,000 SI95/rack Networking –Will be of critical importance to success of Hall D Distributed computing model Transparent access to all data for all users –Expect 10-Gigabit Ethernet (perhaps first deployment of subsequent generation) –Assume JLAB will have OC12 (622 Mb/s) to ESNet Even today just a configuration change
Staffing Experiment needs to have a strong dedicated computing group Computer Center – needs depends on facilities – depends on computing model Estimate: –Support of Hall D Level 3 farm:0.5 –Support of offline MSS, farm:3.0 –Additional network support:0.5 –Development/experiment support:2.0 »Total 6.0
Costs Real cost will be >> $3M in report – probably closer to $5-6M –Cf. RHIC computing facilities was $12M project over 5 years Costs cannot be defined without a clear vision for the computing model New Computer Center is already in lab building plan
Integration Technologies will be there Challenge is in software (middleware) in integrating all the distributed pieces into a seamless system that is useable and responsive
Development activities Grid computing, collaboratory environments and Data Grids
LHC Concept of Computing Hierarchy – Data Grid LHC Grid Hierarchy Example Tier0: CERN Tier1: National “Regional” Center Tier2: Regional Center Tier3: Institute Workgroup Server Tier4: Individual Desktop Total 5 Levels
Data Grid activities Particle Physics Data Grid (PPDG) –DOE funded – labs (inc JLAB) + universities GriPhyN (Grid Physics Network) –NSF funded Computing grids are heavily funded –US, Europe, Japan, –LHC computing relies on these technologies –Not just academic interest - industry
PPDG Has been funded for last 2 years New PPDG proposal using DOE SciDac funds – just submitted –Other (complementary) proposals relevant to JLAB or Hall D: FSU/IU proposal – Hall D portal FIU proposal LQCD PPDG will: –“…provide a distributed (grid-enabled) data access and management service for the large collaborations of current and future particle and nuclear physics experiments. It is a collaborative effort between physicists and computer scientists at several DOE laboratories and universities. This is accomplished by applying existing grid middleware to current problems and providing feedback to middleware developers on additional features required or shortcomings in the current implementations.” –For JLAB will provide directly useful services for current program These funds could be targeted next year for Hall D development activities (needs a context first)
Comments Absolute requirement: –Need a clear vision for the computing/analysis model –Computing requires a dedicated group within Hall D – the leader of that group should be found now Management –Badly managed computing costs real money Well managed – calibration & reconstruction are immediate – need less long-term storage; do not need to keep simulated data,… Badly managed software architecture will kill the L3 trigger – you have to trust it –Computing task is not trivial, but not overwhelming, but is at least as complex as the detector Must be recognized and treated as such by the collaboration