Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
ATLAS Data Challenge Production and U.S. Participation Kaushik De University of Texas at Arlington BNL Physics & Computing Meeting August 29, 2003.
Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
US-ATLAS Grid Efforts John Huth Harvard University Agency Review of LHC Computing Lawrence Berkeley Laboratory January 14-17, 2003.
U.T. Arlington High Energy Physics Research Summary of Activities August 1, 2001.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
13-May-2003Barcelona EDG Conference L.Perini 1 ATLAS Grid Planning ATLAS has used in “production mode” different Grids with simulation jobs –NorduGrid,
K. De UTA Grid Workshop April 2002 U.S. ATLAS Grid Testbed Workshop at UTA Introduction and Goals Kaushik De University of Texas at Arlington.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
U.S. ATLAS Grid Testbed Status and Plans Kaushik De University of Texas at Arlington DoE/NSF Mid-term Review NSF Headquarters, June 2002.
ATLAS Data Challenge Production Experience Kaushik De University of Texas at Arlington Oklahoma D0 SARS Meeting September 26, 2003.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
US ATLAS Grid Projects Rob Gardner Indiana University Mid Year Review of US ATLAS Computing NSF Headquarters, Arlington VA June 20, 2002
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
Atlas Grid Status - part 1 Jennifer Schopf ANL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
High Energy Physics & Computing Grids TechFair Univ. of Arlington November 10, 2004.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
US CMS Centers & Grids – Taiwan GDB Meeting1 Introduction l US CMS is positioning itself to be able to learn, prototype and develop while providing.
U.S. ATLAS Grid Testbed Status & Plans Kaushik De University of Texas at Arlington PCAP Review LBNL, November 2002.
Grid Production Experience in the ATLAS Experiment Horst Severini University of Oklahoma Kaushik De University of Texas at Arlington D0-SAR Workshop, LaTech.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.
K. De UTA Grid Workshop April 2002 ATLAS Pre-packaged Kaushik De University of Texas at Arlington.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
U.S. ATLAS Computing Facilities Bruce G. Gibbard GDB Meeting 16 March 2005.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
Overview of ATLAS Data Challenge Oxana Smirnova LCG/ATLAS, Lund University GAG monthly, February 28, 2003, CERN Strongly based on slides of Gilbert Poulard.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Planning Session. ATLAS(-CMS) End-to-End Demo Kaushik De is the Demo Czar Need to put team together Atlfast production jobs –Atlfast may be unstable over.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
U.S. ATLAS Grid Production Experience
US CMS Testbed.
U.S. ATLAS Testbed Status Report
Grid Canada Testbed using HEP applications
ATLAS DC2 & Continuous production
Presentation transcript:

Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003

K. De CHEP03 2 Multi-purpose experiment at the Large Hadron Collider, CERN 14 GeV c.m. pp collisions starting in 2007 Physics: Higgs, SUSY, new searches... Petabytes/year of data analyzed by >2000 physicists worldwide - need the GRID The ATLAS Experiment

March 27, 2003 K. De CHEP03 3 U.S. ATLAS Grid Testbed  BNL - U.S. Tier 1, 2000 nodes, 5% ATLAS, 10 TB  LBNL - pdsf cluster, 400 nodes, 5% ATLAS, 1 TB  Boston U. - prototype Tier 2, 64 nodes  Indiana U. - prototype Tier 2, 32 nodes  UT Arlington - 20 nodes  Oklahoma U nodes  U. Michigan - 10 nodes  ANL - test nodes  SMU - 6 nodes  UNM - new site

March 27, 2003 K. De CHEP03 4 U.S. Testbed Goals  Deployment  Set up grid infrastructure and ATLAS software  Test installation procedures (PACMAN)  Development & Testing  Grid applications - GRAT, Grappa, Magda...  Other software - monitoring, packaging...  Run Production  For U.S. physics data analysis and tests  Main focus - ATLAS Data Challenges  Simulation, pileup  Reconstruction  Connection to GRID projects  GriPhyN - Globus, Condor, Chimera… use & test  iVDGL - VDT, glue schema testbed, Worldgrid testbed, demos… use and test  EDG, LCG… testing & deployment

March 27, 2003 K. De CHEP03 5 ATLAS Data Challenges DC’s - Generate and analyse simulated data (see talk by Gilbert Poulard on Tuesday)  Original Goals (Nov 15, 2001)  Test computing model, its software, its data model, and to ensure the correctness of the technical choices to be made  Data Challenges should be executed at the prototype Tier centres  Data challenges will be used as input for a Computing Technical Design Report due by the end of 2003 (?) and for preparing a MoU  Current Status  Goals are evolving as we gain experience  Sequence of increasing scale & complexity  DC0 (completed), DC1 (underway)  DC2, DC3, and DC4 planned  Grid deployment and testing major part of DC’s

March 27, 2003 K. De CHEP03 6 GRAT Software  GRid Applications Toolkit  Used for U.S. Data Challenge production  Based on Globus, Magda & MySQL  Shell & Python scripts, modular design  Rapid development platform  Quickly develop packages as needed by DC  Single particle production  Higgs & SUSY production  Pileup production & data management  Reconstruction  Test grid middleware, test grid performance  Modules can be easily enhanced or replaced by Condor-G, EDG resource broker, Chimera, replica catalogue, OGSA… (in progress)

March 27, 2003 K. De CHEP03 7 GRAT Execution Model 1. Resource Discovery 2. Partition Selection 3. Job Creation 4. Pre-stage 5. Batch Submission 6. Job Parameterization 7. Simulation 8. Post-stage 9. Cataloging 10. Monitoring DC1 Prod. (UTA) Remote Gatekeeper Replica (local) MAGDA (BNL) Param (CERN) Batch Execution scratch 1,4,5,

March 27, 2003 K. De CHEP03 8 Middleware Evolution of U.S. Applications Used in current production software (GRAT & Grappa) Tested successfully (not yet used for large scale production) Under development and testing Tested for simulation (will be used for large scale reconstruction)

March 27, 2003 K. De CHEP03 9 Databases used in GRAT  MySQL databases central to GRAT  Production database  define logical job parameters & filenames  track job status, updated periodically by scripts  Data management (Magda)  file registration/catalogue  grid based file transfers  Virtual Data Catalogue  simulation job definition  job parameters, random numbers  Metadata catalogue (AMI)  post-production summary information  data provenance  Similar scheme being considered ATLAS- wide by the Grid Technical Board

March 27, 2003 K. De CHEP03 10 DC1 Production on U.S. Grid  August/September 2002  3 week DC1 production run using GRAT  Generated 200,000 events, using ~ 1,300 CPU days, 2000 files, 100 GB storage at 4 sites  December 2002  Generated 75k SUSY and Higgs events for DC1  Total DC1 files generated and stored > 500 GB, total CPU used >1000 CPU days in 4 weeks  January 2002  More SUSY sample  Started pile-up production on the grid, both high and low luminosity, for 1-2 months at all sites  February/March 2002  Discovered bug in software (non grid part)  Regenerating all SUSY, Higgs & pile-up samples  ~15TB data, 15k files, 2M events, 10k CPU days

March 27, 2003 K. De CHEP03 11 DC1 Production Examples Each production run requires development & deployment of new software at selected sites

March 27, 2003 K. De CHEP03 12 DC1 Production Experience  Grid paradigm works, using Globus  Opportunistic use of existing resources, run anywhere, from anywhere, by anyone...  Successfully exercised grid middleware with increasingly complex tasks  Simulation: create physics data from pre-defined parameters and input files, CPU intensive  Pile-up: mix ~2500 min-bias data files into physics simulation files, data intensive  Reconstruction: data intensive, multiple passes  Data tracking: multiple steps, one -> many -> many more mappings  Tested grid applications developed by U.S.  For example, PACMAN (Saul Youssef - BU)  Magda (see talk by Wensheng Deng)  Virtual Data Catalogue (see Poster by P. Nevski)  GRAT (this talk), GRAPPA (see talk by D. Engh)

March 27, 2003 K. De CHEP03 13 Grid Quality of Service  Anything that can go wrong, WILL go wrong  During 18 days of grid production (in August), every system died at least once  Local experts were not always be accessible  Examples: scheduling machines died 5 times (thrice power failure, twice system hung), Network outages multiple times, Gatekeeper died at every site at least 2-3 times  Three databases used - production, magda and virtual data. Each died at least once!  Scheduled maintenance - HPSS, Magda server, LBNL hardware, LBNL Raid array…  Poor cleanup, lack of fault tolerance in Globus  These outages should be expected on the grid - software design must be robust  We managed > 100 files/day (~80% efficiency) in spite of these problems!

March 27, 2003 K. De CHEP03 14 Conclusion  The largest (>10TB) grid based production in ATLAS was done by U.S. testbed  Grid production is possible, but not easy right now - need to harden middleware, need higher level services  Many tools are missing - monitoring, operations center, data management  Requires iterative learning process, with rapid evolution of software design  Pile-up was a major data management challenge on the grid - moving >0.5 TB/day  Successful so far  Continuously learning and improving  Many more DC’s coming up!