F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;

Slides:



Advertisements
Similar presentations
1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
Workload Management meeting 07/10/2004 Federica Fanzago INFN Padova Grape for analysis M.Corvo, F.Fanzago, N.Smirnov INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
A tool to enable CMS Distributed Analysis
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Physicists's experience of the EGEE/LCG infrastructure usage for CMS jobs submission Natalia Ilina (ITEP Moscow) NEC’2007.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
L ABORATÓRIO DE INSTRUMENTAÇÃO EM FÍSICA EXPERIMENTAL DE PARTÍCULAS Enabling Grids for E-sciencE Grid Computing: Running your Jobs around the World.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
30/07/2005Symmetries and Spin - Praha 051 MonteCarlo simulations in a GRID environment for the COMPASS experiment Antonio Amoroso for the COMPASS Coll.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
1. 2 Overview Extremely short summary of the physical part of the conference (I am not a physicist, will try my best) Overview of the Grid session focused.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
CMS Production Management Software Julia Andreeva CERN CHEP conference 2004.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EXPERIENCE WITH ATLAS DISTRIBUTED ANALYSIS TOOLS S. González de la Hoz L. March IFIC, Instituto.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.
Grid Computing: Running your Jobs around the World
BaBar-Grid Status and Prospects
Real Time Fake Analysis at PIC
Eleonora Luppi INFN and University of Ferrara - Italy
ALICE Physics Data Challenge 3
Introduction to Grid Technology
CRAB and local batch submission
LHC DATA ANALYSIS INFN (LNL – PADOVA)
N. De Filippis - LLR-Ecole Polytechnique
LHC Data Analysis using a worldwide computing grid
Grid Computing in CMS: Remote Analysis & MC Production
Presentation transcript:

F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna; F.Farina – INFN Milano; O.Gutsche - Fermilab CRAB: a user-friendly tool to perform CMS analysis in grid environment CRAB and the CMS distributed analysis chain CMS computing model and grid infrastructure CMS collaboration is developing some tools, interfaced with grid services, to allow data analysis in a distributed environment They include: installation of CMS software via grid on remote resources “data transfer service” to move and manage a large flow of data among Tiers “data validation system” to ensure data consistency and readiness “data location system” to take trace of data available in each remote site, composed by different kind of catalogues: Dataset Bookkeeping System. It knows which data exist and contains CMS specific description of event data Data Location Service. It knows where data are stored. Mapping between file-blocks and SE Local file catalog: physical location of local data on remote SE job monitoring and logging- bookkeeping system A friendly interface to simplify the creation and the submission of analysis jobs to grid environments: CRAB (CMS Remote Analysis Builder). The purpose of CRAB is to allow users with no knowledge of grid infrastructure to run their analysis code on data available at remote sites as easily as in a local environment hiding grid infrastructure details Users have just to develop their analysis code in an interactive environment and decide which data to analyze. Data discovery on remote resources, resources availability, status monitoring and output retrieval of submitted jobs are fully handled by CRAB CRAB is able to submit analysis jobs to different kind of grid flavours as gLite, LCG and OSG. CRAB is able to create jobs using different CMS software (job type) as ORCA and CMSSW for analysis and FAMOS for fast simulation CRAB input: user has to provide ● Data parameters in the crab.cfg file: dataset name and number of events ● Analysis code and parameter cards ● Output file name and how to manage them (return file on UI or store into SE) Main CRAB functionalities : ● input data discovery: the list of sites (SEs name) where data are stored, querying “data location service” ● Packaging of user code: creation of a tgz archive with user code and parameters ● Job creation: Wrapper of user code executable to run on WN Jdl file: site location of data (SE name) is passed to RB as requirement to drive resources matchmaking Job splitting according to user request ● Job submission to the grid ● Monitoring of job status and output retrieval ● Handling of user output: copy to UI or to a generic Storage Element dataset n.of events Data Location System SE Local File Catalog Data Bookkeepi ng System jdl, job WMS jdl, job CE WN... Job output data CRAB UI SEs list CRAB usage During data acquisition data from detector that overhead different trigger level will be sent, stored and first step reconstructed at Tier-0. Then they will be spread over some Tiers depending on the kind of physics data Until real data are not available, the CMS community needs simulated data to study the detector response, the forseen physics interaction and to get experience with management and analysis of data. So a large number of simulated data are produced and distributed among computing centres. The grid infrastructure guarantees also enough computing power for simulation, processing and analysis data Amount of data (events) –~2 PB/year (assumes startup luminosity 2x10 33 cm -2 s -1 ) All events will be stored into files –O(10^6) files/year Files will be grouped in File-blocks (data location unit) –O(10^3) Fileblocks/year Fileblocks will be grouped in Datasets –O(10^3) Datasets (after 10 years of CMS) CMS (Compact Muon Solenoid) is one of the four particle physics experiment that will collect data at LHC (Large Hadron Collider) starting in 2007, aiming to discover the Higgs boson. CMS will produce a large amount of data that should be stored in many computing centres in the countries partecipating to the CMS collaboration and made available for analysis to world-wide distributed physicists. Large amount of data to be analyzed Large community of physicists which wants to access data Many distributed sites where data will be stored CMS will use a distributed architecture based on grid infrastructure to ensure remote resources availability and to assure remote data access to authorized user (belonging to CMS Virtual Organization). Tools for accessing distributed data and resources are provided by the World LHC Computing Grid (WLCG) that takes care about different grid flavours as LCG/gLite in Europe and OSG in the US Remote data accessible via grid Online system Online farm Tier 0 Tier 1 Tier 2 Tier 3 Tier 2 center Institute A Institute B... Workstation UI France Regional Center recorded data Fermilab Regional Center Italy Regional Center CERN Computer center... Tier 2 center Tier 2 The CMS offline computing system is arranged in four Tiers which are geographically distributed Resource Broker (RB) Workload Manageme nt System SE SESE UI Job submission tools UI Job submission tools Data location system Data location system Information Service collector Information Service collector Query for data Query for matchmaking CE Main LCG middleware components: Virtual Organizations (CMS...) Resource Broker (RB) Replica Catalog (LFC) Computing Elements (Ces) Storage Elements (Ses) Worker nodes (Wns) User Interfaces (UIs) Number of jobs submitted to different grid flavor Each bar represents the total number of jobs and it is divided into three categories: - jobs that produce user executable Exit Code equal to 0 - jobs that produce user executable Exit Status different from 0 - jobs that could not run due to the grid problems The job success rate is about 75%, where success means that jobs arrive to remote sites jobs produced outputs The remnant 25% aborts due to site setup problem or grid services failure Top 20 used CE and datasetNumber of jobs submitted each month Number of submitted sorted by jobtype Number of jobs submitted with CRAB and CRAB + jobRobot More then jobs were submitted to the grid using the CRAB tool. Tens of physicists are using CRAB to analyze remote data stored in LCG and OSG sites More or less 1500 jobs are submitted each day. Peaks of daily work were in March-April 06 for Physics Technical Report Design preparation, in October 05 for Service Challenge 3 (SC3) and August-September 06 for Service Challenge 4 When real data will be available the expected daily rate of submitted jobs is ~ CMSSW and ORCA are CMS software for analysis. ORCA works with the old CMS framework and is not anymore supported During SC3 and SC4 CRAB has been used through an automatic tool called jobRobot to spread jobs continously over all published data. This usage allows to understand the computing infrastructure weakness, site installation problem, analysis chain bottleneck and to test the Workload Management components CRAB tool is used to analyze remote data and to test distributed analysis chain CRAB proves that CMS users are able to use available grid services and that the full analysys chain works in a distributed environment