LcgCAF:CDF submission portal to LCG

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Sep Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
22/04/2005Donatella Lucchesi1 CDFII Computing Status OUTLINE:  New CDF-Italy computing group organization  Usage status at FNAL and CNAF  Towards GRID:
A tool to enable CMS Distributed Analysis
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
LcgCAF:CDF submission portal to LCG Federica Fanzago for CDF-Italian Computing Group Gabriele Compostella, Francesco Delli Paoli, Donatella Lucchesi, Daniel.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
CDF Grid at KISTI 정민호, 조기현 *, 김현우, 김동희 1, 양유철 1, 서준석 1, 공대정 1, 김지은 1, 장성현 1, 칸 아딜 1, 김수봉 2, 이재승 2, 이영장 2, 문창성 2, 정지은 2, 유인태 3, 임 규빈 3, 주경광 4, 김현수 5, 오영도.
Movement of MC Data using SAM & SRM Interface Manoj K. Jha (INFN- Bologna) Gabriele Compostella, Antonio Cuomo, Donatella Lucchesi,, Simone Pagan (INFN.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
A New CDF Model for Data Movement Based on SRM Manoj K. Jha INFN- Bologna 27 th Feb., 2009 University of Birmingham, U.K.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
May Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Integration of Physics Computing on GRID S. Hou, T.L. Hsieh, P.K. Teng Academia Sinica 04 March,
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
CDF Monte Carlo Production on LCG GRID via LcgCAF Authors: Gabriele Compostella Donatella Lucchesi Simone Pagan Griso Igor SFiligoi 3 rd IEEE International.
Condor Week 2006, University of Wisconsin 1 Matthew Norman Using Condor Glide-ins and GCB to run in a grid environment Elliot Lipeles, Matthew Norman,
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
IFAE Apr CDF Computing Experience - Gabriele Compostella1 IFAE Apr CDF Computing Experience Gabriele Compostella, University.
Practical using C++ WMProxy API advanced job submission
Dynamic Extension of the INFN Tier-1 on external resources
Gri2Win: Porting gLite to run under Windows XP Platform
Workload Management Workpackage
Grid2Win Porting of gLite middleware to Windows XP platform
A Grid Job Monitoring System
Real Time Fake Analysis at PIC
Eleonora Luppi INFN and University of Ferrara - Italy
StoRM: a SRM solution for disk based storage systems
Overview of the Belle II computing
WP1 WMS release 2: status and open issues
Grid2Win: Porting of gLite middleware to Windows XP platform
Sergio Fantinel, INFN LNL/PD
Grid2Win: Porting of gLite middleware to Windows XP platform
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Job workflow Pre production operations:
Artem Trunov and EKP team EPK – Uni Karlsruhe
Conditions Data access using FroNTier Squid cache Server
Gri2Win: Porting gLite to run under Windows XP Platform
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Initial job submission and monitoring efforts with JClarens
DØ MC and Data Processing on the Grid
Gridifying the LHCb Monte Carlo production system
Job Application Monitoring (JAM)
Data Processing for CDF Computing
The LHCb Computing Data Challenge DC06
Presentation transcript:

LcgCAF:CDF submission portal to LCG Federica Fanzago for CDF-Italian Computing Group Gabriele Compostella, Francesco Delli Paoli, Donatella Lucchesi, Daniel Jeans, Subir Sarkar and Igor Sfiligoi

Main Injector & Recycler CDF experiment and data volumes D0 CDF Tevatron Main Injector & Recycler p source Lpeak: 2.3x1032s-1cm-2 1.96 TeV 36 bunches, 396 ns crossing time Chicago Integrated Luminosity (fb-1) 9 8 7 6 5 4 3 2 1 We are here today By FY07: Double data up to FY06 By FY09: Double data up to FY07 ~ 2 ~ 5.5 ~ 9 data ~ 20 ~ 23 cpu needs 30 mA/hr 25 mA/hr 20 mA/hr 15 mA/hr 9/30/03 9/30/04 9/30/05 9/30/06 9/30/07 9/30/08 9/30/09 November 11-17 SC06 Tampa

CDF Data Handling Model: how it was Data processing: dedicated production farm at FNAL User analysis and Monte Carlo production: CAF Central Analysis Facility. - FNAL hosts data->farm used to analyze them - Remote sites decentralized CAF (dCAF)  produce Monte Carlo, some sites (ex. CNAF) have also data. CAF and dCAF are CDF dedicated farms structured: Developed by: H. Shih-Chieh, E. Lipeles, M. Neubauer, I. Sfiligoi, F. Wuerthwein Dedicated CAF Worker Node User desktop is a CAF GUI, where the user with a command line Sets the cdf environment. A Gui is opened and here the user can select the dedicated CAF where to run jobs. User defines the dir path where its tarball is and selects if he want receive a mail when the job finish. The submitter uses condor. The submitter send the user tarball with the cafexe. The cafexe “contains” part of CDF software, the most used libraries Cafexe: Setup cfd environment on WN Job monitor of running job Allows interactive command on running job Writes a log Myjob.sh setup user environment Monitor Head Node User Desktop GUI Submitter CafExe MyJob.sh MyJob.exe User tarball (Myjob.exe) condor Mailer November 11-17 SC06 Tampa

Moving to a GRID Computing Model Reasons to move to a GRID computing model Need to expand resources, luminosity expect to increase by factor ~4 until CDF ends. Resource were in dedicated pools, limited expansion and need to be maintained by CDF persons GRID resources can be fully exploited before LHC experiments will start analyze data CDF has no hope of have a large amount of resources without GRID! LcgCAF November 11-17 SC06 Tampa

LcgCAF: General Architecture GRID site User Desktop GUI job … job cdf-ui ~ cdf-head … GRID site CDF user must have access to CAF clients and Specific CDF certificate output CDF-UI is a central User interface (grid). The user tarball is send to the CDF-UI as in the past. A big work was done to interface the CDF-UI with the grid, mantaining the basic idea of the old CAF-headnode. In order to access to the grid, user has to have a grid certificate output … Robotic tape Storage (FNAL) CDF-SE (SE of GRID site) November 11-17 SC06 Tampa

Job Submission and Execution CafExe cdf-ui ~ cdf-head Monitor Grid site … Job WMS Grid site Submitter Mailer Daemons running on CDF head: Submitter: receive user job, create its wrapper to run it on GRID environment, make available user tarball Monitor: provide job information to user Mailer: send e-mail to user when job is over with a job summary (useful for user bookkeeping) Job wrapper (CafExe): run user job, copy whatever is left at the end of user code to user specified location November 11-17 SC06 Tampa

Job Monitor « « « « « « WMS: keeps track of job status on the GRID cdf-ui ~ cdf-head Job information « Information System WN information « WN: job wrapper collects information like load dir, ps, top.. « « « « Submission to the grid is done in a asynchronous way That means interactive command are not really interactive but their info are provided by different script forked with the user one Interactive job Monitor: running jobs information are read from LcgCAF Information System and sent to user desktop. Available commands: CafMon list CafMon kill CafMon dir CafMon tail CafMon ps CafMon top Web based Monitor: information on running jobs for all users and jobs/users history are read from LcgCAF Information System November 11-17 SC06 Tampa

CDF Code Distribution … … Lcg site: CNAF Proxy Cache /home/cdfsoft … … /home/cdfsoft Lcg site: Lyon Fermilab server Proxy Cache A CDF job to run needs CDF code and Run Condition DB available to WN, since we cannot expect all the GRID sites to carry them, Parrot is used as virtual file systemfor code and FroNTier for DB. Since most request are the same, a local cache is used: Squid web proxy cache parrot changes local bash command like ls in command on remote resources using different protocol as http (http://dcafmon.fnal.gov/cdfsoft) To avoid overloaded of the system, squid is used as chace. There is a cache for each nation November 11-17 SC06 Tampa

CDF Output storage … gridftp Job output Worker Nodes CDF-SE KRB5 rcp … CDF-Storage User job output is transferred to CDF Storage Element (SE) via gridftp or to a CDF-Storage location via rcp From CDF-SE or CDF-storage output copied using gridftp to FNAL fileservers. Automatic procedure copies files from fileservers to tape and declare them to the CDF catalogue November 11-17 SC06 Tampa

LcgCAF Usage: first GRID exercise LcgCAF used by power users for B Monte Carlo Production but generic users also start to use it! Last month running sections LcgCAF Monitor CDF VO monitored with standard LCG tools: GridIce 1 week Italian resources First exercise to use the grid to Montecarlo production November 11-17 SC06 Tampa

LcgCAF Usage CDF B Monte Carlo production: ~5000 jobs in ~1 month Samples Events on tape   with 1 Recovery Bs->Ds Ds->, Ds->K*K, Ds->KsK ~6,2 · 106 ~94% ~100% User jobs: ~1000 jobs in a week ~100% executed on these sites: Failures: GRID: site miss-configuration, temporary authentication problems, WMS overload LcgCAF: Parrot/Squid cache stacked Retry mechanism: GRID failures  WMS retry LcgCAF failures “home-made” retry (logs parsing) November 11-17 SC06 Tampa

Conclusions and Future developments CDF has developed a portal to access the LCG resources used to produce Monte Carlo data and for every-day user jobs  GRID can serve also a running experiment No policy mechanism on LCG, user policy rely on site rules, for LcgCAF user  “random priority” Plan to implement a mechanism of policy management on top of GRID policy tools Interface between SAM (CDF catalogue) and GRID services under development to allow: output file declaration into catalogue directly from LcgCAF  automatic transfer of output files from local CDF storage to FNAL tape data analysis on LCG sites November 11-17 SC06 Tampa

November 11-17 SC06 Tampa

More details … Currently CDF access the following sites: CDF has produced and is producing Monte Carlo data exploiting these resources demonstrating that GRID can serve also a running experiment Site Country KSpecInt2K CNAF-T1 Italy 1500 INFN-Padova 60 INFN-Catania 248 INFN-Bari 87 INFN-Legnaro 230 IN2P3-CC France 500 FZK-LCG2 Germany 2000 IFAE Spain 20 PIC 150 UKI-LT2-UCL-HE UK 300 Liverpool 100 November 11-17 SC06 Tampa

Performances (details) CDF B Monte Carlo production Samples Events on tape   with 1 Recovery Bs->Ds Ds-> 1969964 ~92% ~100% Bs->Ds Ds->K*K 2471751 ~98% Bs->Ds Ds->KsK 1774336 # segments succeeded # segments submitted e = GRID failures: site miss-configuration, temporary authentication problems, WMS overload LcgCAF failures: Parrot/Squid cache stacked Retry mechanism: necessary to achieve ~100% efficiency WMS retry for GRID failures and LcgCAF “home-made” retry based on logs parsing November 11-17 SC06 Tampa