3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow
3rd June 2004 CDF Grid Contents CDF Computing Goals SAM CAF DCAF JIM How it all fits together SAM TV
3rd June 2004 CDF Grid CDF Computing Goals The CDF experiment intend to have: –25% of computing offsite by June 2004 –50% by June 2005 To achieve these goals several components are being developed and deployed: –SAM – data handling system –CAF & DCAF – batch systems –JIM – Grid extension to SAM –SAM TV – monitoring for SAM Stations
3rd June 2004 CDF Grid SAM Sequential Access via Metadata Mature data handling system Users can start SAM projects, e.g. running AC++Dump. Large volumes of files (in datasets) may be requested by SAM and are processed by the SAM projects. These are transferred from either the main cache at Fermilab, or from neighbouring SAM stations.
3rd June 2004 CDF Grid CAF The original CDF Analysis Farm The CAF is a 600 CPU farm of computers running Linux Access to the CDF data handling system and databases to allow CDF collaborators to run batch analysis jobs. Since standard Unix accounts are not created for users (i.e. you cannot ``log into'' the CAF), custom software provides remote job submission, control, monitoring, and output interface for the user Strongly authenticated via kerberos.
3rd June 2004 CDF Grid CAF Users compile and link their analysis jobs on their desktop. The required files are archived into a temporary tar file and copied to the CAF head node. Jobs are executed using a distributed batch system Farm Batch System Next Generation (FBSNG) Output is tarred up and either received back on the users desktop or saved to scratch space on the CAF FTP server, for later retrieval. A cdfsoft installation is required to submit jobs. Two 8-way Linux SMP systems are provided for users without cdfsoft on their local desktops, and for general reference for users having problems with their local installations.
3rd June 2004 CDF Grid CAF
3rd June 2004 CDF Grid CAF Initially configured to favour large reads and small writes (e.g. produce small skims, histograms, etc from official secondary datasets). Extensions have been made to allow users to store their output files back into the SAM data handling system allowing jobs with larger writes to run easily. CAF has also been used for large-scale Monte Carlo and tertiary data set production. Users typically use CAF GUI, though command line job submissions are also possible.
3rd June 2004 CDF Grid CAF Monitoring
3rd June 2004 CDF Grid DCAF Decentralised CDF Analysis Farm CAF implemented at several remote sites from Taiwan to Canada Rollout began in January 2004 Core set of 6 DCAF sites provide backbone New sites continually being added User selects site on which to run
3rd June 2004 CDF Grid DCAF Hardware Resources site GHz now TB now GHz Summer TB Summer Notes INFN Priority to INFN users; Pinned data sets exist Taiwan Pinned data sets exist Korea Running MC only now UCSD Pools resources from several US groups. Min guaranteed from x2 larger farm (CDF+CMS) Rutgers In-kind, will do MC production TTU DCAFs, test site + CDF+CMS cluster Germany GridKa ~20016~24018 Min. guaranteed CPU from x8 larger pool. Open to all by ~Dec (JIM) Canada In-kind, doing MC production, + common pool Japan Under construction Cantabria ~1 month away MIT ~1 month away UK Open to all by ~Dec (JIM), + common pool
3rd June 2004 CDF Grid DCAF Recent DCAF report (1 st June): –Taiwan DCAF has finished copying and pinning 3 large muon datasets with no major problems. –Request for ~600GHz of MC production for June has been received. –Storing MC results in a timely way was a priority. –The MC producers have been educated in storage of files through SAM (web-pages, tutorials), requiring only the CDF dataset name or MC request ID. –Request for ~600GHz of MC production for June has been received.
3rd June 2004 CDF Grid JIM Job and Information Management Grid extension to SAM allowing users to submit jobs using a local thin client. Remote broker assigns each job to an execution site based on where the most data is present and the queue is the shortest. Job progress can be monitored through a web page. Job output can be downloaded from using a web browser.
3rd June 2004 CDF Grid JIM
3rd June 2004 CDF Grid JIM JIM can run on shared resources, and can interface with most batch systems CDF environment can be tar-balled, for running Monte Carlo on non-CDF equipment. D0 have successfully run large Monte Carlo CDF Monte Carlo has been run interactively on D0 cluster. Next step is JIM submission.
3rd June 2004 CDF Grid How it all fits together
3rd June 2004 CDF Grid SAM TV Adam Lyon at Fermilab has created a set of web pages that can be used to monitor SAM stations and projects. Demo: /samTV.html
3rd June 2004 CDF Grid SAM TV Snapshot summaries – lists the stations with a pie- chart showing the number of file transfers. SAM project snapshot – all the projects on the selected station with a plot of file delivery/time. Project details – including time and plot of last file delivery Consumer and process – consumer and process Ids, application, node, user, etc. Files – list of files desired by a project
3rd June 2004 CDF Grid SAM TV
3rd June 2004 CDF Grid Challenges and Future Work Implementation and rollout of JIM for MC More DCAF installations Encourage user migration Solve fragmented disks and caches problem (suggestions welcome!)