CDF Grid Status Stefan Stonjek 05-Jul th GridPP meeting / Durham
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)2 Outline SAM: Sequential Access via Metadata file catalogue metadata CAF: Central Analysis Farm JIM: Job Information and Monitoring Lessons learned Summary
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)3 CDF is running CDF is an experiment currently taking data For a limited time Stable offline computing is high priority Limited resources for Grid development Limited possibilities to introduce new software New software is accepted if it provides new functionality CDF is using some Grid technology Large parts of the software will stay non-Grid aware We can learn from the experience gained at CDF
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)4 SAM SAM is currently used by DØ, CDF and MINOS SAM was originally developed for DØ SAM is used in production at CDF Production output is going directly into SAM SAM is now the only supported data- handling system at CDF Some users know how to circumvent SAM
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)5 SAM problems Performance problems with db-servers db-server = CORBA to SQL bridge Large queries (many files) consume much memory Currently solved by creating multiple db-server instances, this is not optimal Recover from failed projects Project covers many input files in many jobs SAM “thinks” file based Several input, one output file and crash in the middle causes a problem
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)6 SAM points of failure SAM strongly depends on central services Database is single point of failure SAM writes to the database for every action To solve the problem complete replication (with write access) distributed database No “of the shelf” solution CORBA naming service is single point of failure Needed by every client to talk to the rest of the SAM universe To solve the problem redundant naming service distributed naming service Not enough manpower
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)7 SAM upload Tool to insert files into SAM from arbitrary nodes Important for the acceptance of SAM at CDF Intense use Causes performance problems Each client starts thread in db-server
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)8 Metadata SAM selects files based upon file metadata Two types of metadata Physical file parameters (file size, checksum etc.) Physics file parameters (run and event numbers, event information, time etc.) Only physical file parameters schema is fix Physics file parameter schema has to be dynamic (many changes required)
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)9 Metadata (cont.) SAM uses metadata query language Called “dimensions” Protect user from SQL difficulties Protect database from user mistakes Therefore less flexible that plain SQL Require constant adoption to new requirements
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)10 Leason Learned (SAM, metadata) Avoid single point of failure Not new, but difficult with database Keep a many information a possible local Minimizing the impact of problems in the central database Need a flexible metadata query language
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)11 CAF CAF Central (or CDF) Analysis Farm Good sandbox technology Good graphical job submission interface Does job multiplication for the user Submit once, execute multiple times
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)12 CAF (cont.) Distributed CAF (DCAF) Many sites around the world In use for Monte-Carlo production Human based resource brockering CondorCAF (Glide ins) New CAF version uses Condor Allow Glide-Ins GridCAF “edg-*” compatibale job submission CAF-GUI submits to the grid, no job-multiplication
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)13 JIM JIM: Job Information and Monitoring Together with SAM the system which produces CDF Monte-Carlo Requires additional software being installed on Grid sites SAM Small differences in resource advertising Working towards interoperability between JIM and LCG-Grid sites
Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)14 Summary CDF is using some Grid-tools LHC experiments can learn from CDF experience SAM central database metadata CAF submission GUI job multiplication