Presentation is loading. Please wait.

Presentation is loading. Please wait.

CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham.

Similar presentations


Presentation on theme: "CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham."— Presentation transcript:

1 CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham

2 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)2 Outline  SAM: Sequential Access via Metadata  file catalogue  metadata  CAF: Central Analysis Farm  JIM: Job Information and Monitoring  Lessons learned  Summary

3 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)3 CDF is running  CDF is an experiment currently taking data  For a limited time  Stable offline computing is high priority  Limited resources for Grid development  Limited possibilities to introduce new software  New software is accepted if it provides new functionality  CDF is using some Grid technology  Large parts of the software will stay non-Grid aware  We can learn from the experience gained at CDF

4 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)4 SAM  SAM is currently used by DØ, CDF and MINOS  SAM was originally developed for DØ  SAM is used in production at CDF  Production output is going directly into SAM  SAM is now the only supported data- handling system at CDF  Some users know how to circumvent SAM

5 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)5 SAM problems  Performance problems with db-servers  db-server = CORBA to SQL bridge  Large queries (many files) consume much memory  Currently solved by creating multiple db-server instances, this is not optimal  Recover from failed projects  Project covers many input files in many jobs  SAM “thinks” file based  Several input, one output file and crash in the middle causes a problem

6 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)6 SAM points of failure  SAM strongly depends on central services  Database is single point of failure  SAM writes to the database for every action  To solve the problem  complete replication (with write access)  distributed database  No “of the shelf” solution  CORBA naming service is single point of failure  Needed by every client to talk to the rest of the SAM universe  To solve the problem  redundant naming service  distributed naming service  Not enough manpower

7 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)7 SAM upload  Tool to insert files into SAM from arbitrary nodes  Important for the acceptance of SAM at CDF  Intense use  Causes performance problems  Each client starts thread in db-server

8 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)8 Metadata  SAM selects files based upon file metadata  Two types of metadata  Physical file parameters (file size, checksum etc.)  Physics file parameters (run and event numbers, event information, time etc.)  Only physical file parameters schema is fix  Physics file parameter schema has to be dynamic (many changes required)

9 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)9 Metadata (cont.)  SAM uses metadata query language  Called “dimensions”  Protect user from SQL difficulties  Protect database from user mistakes  Therefore less flexible that plain SQL  Require constant adoption to new requirements

10 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)10 Leason Learned (SAM, metadata)  Avoid single point of failure  Not new, but difficult with database  Keep a many information a possible local  Minimizing the impact of problems in the central database  Need a flexible metadata query language

11 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)11 CAF  CAF  Central (or CDF) Analysis Farm  Good sandbox technology  Good graphical job submission interface  Does job multiplication for the user  Submit once, execute multiple times

12 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)12 CAF (cont.)  Distributed CAF (DCAF)  Many sites around the world  In use for Monte-Carlo production  Human based resource brockering  CondorCAF (Glide ins)  New CAF version uses Condor  Allow Glide-Ins  GridCAF  “edg-*” compatibale job submission  CAF-GUI submits to the grid, no job-multiplication

13 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)13 JIM  JIM: Job Information and Monitoring  Together with SAM the system which produces CDF Monte-Carlo  Requires additional software being installed on Grid sites  SAM  Small differences in resource advertising  Working towards interoperability between JIM and LCG-Grid sites

14 Tue 05-Jul-2005CDF Grid status report (Stefan Stonjek)14 Summary  CDF is using some Grid-tools  LHC experiments can learn from CDF experience  SAM  central database  metadata  CAF  submission GUI  job multiplication


Download ppt "CDF Grid Status Stefan Stonjek 05-Jul-2005 13 th GridPP meeting / Durham."

Similar presentations


Ads by Google