SAMGrid for CDF MC (and beyond) Igor Terekhov, FNAL/CD/CCF/SAM for JIM team.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Rod Walker IC 13th March 2002 SAM-Grid Middleware  SAM.  JIM.  RunJob.  Conclusions. - Rod Walker,ICL.
A Computation Management Agent for Multi-Institutional Grids
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Workload Management Massimo Sgaravatto INFN Padova.
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Grid Job, Information and Data Management for the Run II Experiments at FNAL Igor Terekhov et al (see next slide) FNAL/CD/CCF, D0, CDF, Condor team, UTA,
SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
CDF Grid Status Stefan Stonjek 05-Jul th GridPP meeting / Durham.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Deploying and Operating the SAM-Grid: lesson learned Gabriele Garzoglio for the SAM-Grid Team Sep 28, 2004.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Distribution After Release Tool Natalia Ratnikova.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
The european ITM Task Force data structure F. Imbeaux.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
The SAM-Grid and the use of Condor-G as a grid job management middleware Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Dzero MC production on LCG How to live in two worlds (SAM and LCG)
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status.
The SAM-Grid / LCG Interoperability Test Bed Gabriele Garzoglio ( ) Speaker: Pierre Girard (
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Metadata Mòrag Burgon-Lyon University of Glasgow.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
Grid Job, Information and Data Management for the Run II Experiments at FNAL Igor Terekhov et al FNAL/CD/CCF, D0, CDF, Condor team.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,
The GridPP DIRAC project DIRAC for non-LHC communities.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Presentation transcript:

SAMGrid for CDF MC (and beyond) Igor Terekhov, FNAL/CD/CCF/SAM for JIM team

Plan of Attack General (but technical!) intro into Grid computing Overview of some of the benefits of SAMGrid computing, for CDF MC etc. Architectural perspective SAMGrid as a whole SAM data handling JIM job submission A more practical, detailed description of CDF MC process with JIM/SAMGrid JIM project status

Global and Grid Computing in HEP: the Evolution I.Globally distributed computing II.Automated, Grid-like Globally Distributed Computing III. True Grid computing SAMGrid

Globally Distributed Computing Multiple participating sites (especially MC) Experts on sites Centrally provided KITS and other s/w repositories Locally developed/modified infrastructure for production tracking, workflow and job management, etc and phone communications (what to install, how to patch, who’s doing what)

Grid-like GDC: SAMGrid Sites have “standard” infrastructure – SAM stations and other SAMGrid servers, but no pre-installed D0/CDF software or data All data files are delivered from the SAM data grid: D0 example: minbias mix-in files used to be all different JIM uses a SAM dataset thus guaranteeing consistency All job files are delivered from the SAM data grid: Release files are globally distributed and cached, no need for explicit software synchronization Remote job submission, with placement directed by the system or user: brokering in (D)CAF, peer->peer submission In JIM, client->system->execution site Spooling of small input and output files For fun: web-based retrieval of output Expertise on sites <1 person, almost never beyond the initial SAM install phase Monitoring of the overall state of the system

La Grille Pure Computer Science origins Common “middleware” infrastructure True distributed ownership of resources Run MC on a biologist’s cluster No preinstalled software except “standard” tools like Globus A bit of utopia?

We’re at midpoint -- SAMGrid Principal benefits for you, CDF MC physicists: Higher degree of automation makes MC easier and “more fun” Considerably higher degree of consistency and independence of physics from site (job/data files, request details in DB) Better utilization of resources (eventually) Reduction of expertise at sites from O(N) -> O(1) Core SAMGrid software (SAM+JIM) – common across D0 and CDF (but not necessarily with LHC) – CD support etc Possible future of SAMGrid and Run II computing I’m not authorized to predict it Full integration into The Grid (la grille pure) unlikely IMHO; will probably continue to run on resources at least partially affiliated with Run II experiments Gradual convergence with LHC technologies, while prividing stable services to Run II physicists And/or integration into US grid efforts (Openscience Grid)

Routing + Caching = Replication Data Site WAN Data Flow

Computing Element Submission Client User Interface Queuing System User Interface Broker Match Making Service Information Collector Execution Site #1 Submission Client Match Making Service Computing Element Grid Sensors Execution Site #n Queuing System Grid Sensors Storage Element Computing Element Storage Element Data Handling System Storage Element Informatio n Collector Grid Sensor s Computin g Element Data Handling System

SAM Batch Adapters BS Idealizers JIM Sandboxing SAMGrid Job Managers Local Batch System The Grid Grid to Fabric Job Submission

Enough of General Stuff Install and configure SAMGrid software at participating sites SAM station JIM++ software. Very good document, d0.fnal.gov/computing/grid/SAMGridManual.htmhttp://www- d0.fnal.gov/computing/grid/SAMGridManual.htm Prepare an input sandbox!!! Create a request in the SAM DB! Write a small job description file (JDF) Do “samg submit” Et voila, see etc.

job_type = cdfmc # Experiment and universe sam_experiment = cdf sam_universe = prd # SAM group and station group = test station_name = samgfarm # CDF job details requestid = 34 numevts = 1000 events_per_job = 500 job_specification = cdf_mc_jobspec.xml input_sandbox_tgz = /tmp/cdfuser.tar.gz # Jobfile dataset jobfiles_dataset = jobset_igor_2 instances = 1 Sample CDF MC JDF

Present CDF features Takes a “job dataset” and delivers to worker node Takes job specification files, an XML map: run number -> number of events (if you prefer, a list of run/numevents pairs) Accepts user.tar.gz (will transfer to the worker node) Having routed the job to an execution site, will compute the detailed plan Each local job is assigned 1 or more (run, event range) pairs Total number of local jobs is a function of both the job specification (total number of events) and the site’s capabilities (e.g. “optimal” CPU per local job). User-supplied “run1run” script is invoked for each run’s event range All output data files stored back to SAM Output non-data files (stdout, logs, etc) are viewable on the Web. Output data files can be merged later (see next slide)

Output merging(concatenation) One of JIM/SAMGrid benefits The problem is caused by the existing Storage Systems being unable to swallow large number of small files Important for both D0 and CDF. Our plan hes been implemented for D0 (CDF to come): Put output data files to “durable storage”, sam store –dest=XXX Define a SAMGrid job that looks like a SAM analysis job, taking a SAM dataset as input Submit it to any execution site (possibly site of original production will be preferred) Merged output is automagically stored back to SAM Principal benefits: Can merge files produced at very different times/places Bookkeeping, robustness features of SAM are leveraged Difficulties: Bookkeeping backfires (mix of merged/unmerged files) All at once approach overfills scratch space, need real streaming (as in true SAM) Core SAM is enhanced accordingly to overcome issues/improve service

Near (and not) Future for CDF MC Decouple MC production phases: Be able, for example, to retrieve generated files that were previously produced Read that input from SAM Has been in D0 JIM for quite a while already Improve concatenation (first implement it for CDF) Fuller MC request system, integration with CDF JIM Incorporate any new requirements from you, the users Perhaps “workflow manager” (application manager) such as D0/CMS mc_runjob Perhaps full-fledged brokering (employ multiple sites for a single large request) Continuous monitoring improvements Understand relation with CAF

Manpower resources Unfortunately, I am moving out of SAMGrid The remaining person (Gabriele Garzoglio), and two JIM students will have to be split between D0 and CDF Expertise must grow within the experiment to: Setup new sites Understand the JIM software and tweak the “job managers” etc accordingly Morag Burgon-Lyon and Valeria Bartsch are ramping up. Ulrich Kerzel is expanding expertise SAM->SAMGrid CD/Run II department/SAMGrid project (co-led by Rick St Denis and Wyatt Merritt) will cough up other resources But once again, this will die if the experiment doesn’t pick up!