ATLAS and Grid Computing RWL Jones GridPP 13 5 th July 2005.

Slides:



Advertisements
Similar presentations
Applications Area Issues RWL Jones GridPP13 – 5 th June 2005.
Advertisements

LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
ATLAS-Specific Activity in GridPP EDG Integration LCG Integration Metadata.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Nick Brook Current status Future Collaboration Plans Future UK plans.
The first year of LHC physics analysis using the GRID: Prospects from ATLAS Davide Costanzo University of Sheffield
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 WP1 activity, achievements and plans.
Grid Workload Management Massimo Sgaravatto INFN Padova.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Dan Tovey, University of Sheffield User Board Overview Dan Tovey University Of Sheffield.
The Experiments – progress and status Roger Barlow GridPP7 Oxford 2 nd July 2003.
Zprávy z ATLAS SW Week March 2004 Seminář ATLAS SW CZ Duben 2004 Jiří Chudoba FzÚ AV CR.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
The ATLAS Grid Progress Roger Jones Lancaster University GridPP CM QMUL, 28 June 2006.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
15 December 2015M. Lamanna “The ARDA project”1 The ARDA Project (meeting with the LCG referees) Massimo Lamanna CERN.
Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
PCAP Close Out Feb 2, 2004 BNL. Overall  Good progress in all areas  Good accomplishments in DC-2 (and CTB) –Late, but good.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Enabling Grids for E-sciencE Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
Dario Barberis: ATLAS Computing TDR LHCC - 29 June ATLAS Computing Technical Design Report Dario Barberis (CERN & Genoa University) on behalf of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
Dario Barberis: Conclusions ATLAS Software Week - 10 December Conclusions Dario Barberis CERN & Genoa University.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.
Workload Management Workpackage
EGEE Middleware Activities Overview
Data Challenge with the Grid in ATLAS
Readiness of ATLAS Computing - A personal view
WP1 activity, achievements and plans
LHC Data Analysis using a worldwide computing grid
ATLAS DC2 & Continuous production
Presentation transcript:

ATLAS and Grid Computing RWL Jones GridPP 13 5 th July 2005

ATLAS Computing Timeline 2003 POOL/SEAL release (done) ATLAS release 7 (with POOL persistency) (done) LCG-1 deployment (done) ATLAS complete Geant4 validation (done) ATLAS release 8 (done) DC2 Phase 1: simulation production (done) DC2 Phase 2: intensive reconstruction (the real challenge!) LATE! Combined test beams (barrel wedge) (done) Computing Model paper (done) Computing Memorandum of Understanding (done) ATLAS Computing TDR and LCG TDR (in progress) Computing System Commissioning Physics Readiness Report Start cosmic ray run GO! NOW Commissioning takes priority!

Computing TDR structure The TDR describes the whole Software & Computing Project as defined within the ATLAS organization: Major activity areas within the S&C Project Liaisons to other ATLAS projects

Massive productions on 3 Grids

Massive productions on 3 Grids (3) July-September 2004: DC2 Geant-4 simulation (long jobs) –40% on LCG/EGEE Grid, 30% on Grid3 and 30% on NorduGrid February-May 2005: Rome production –70% on LCG/EGEE Grid, 25% on Grid3, 5% on NorduGrid LCG/EGEE Grid resources always difficult to saturate with “traditional” means –New approach (Lexor-CondorG) used Condor-G to submit directly to the sites in this way the job rate was doubled on the same total available resources –much more efficient usage of the CPU resources –the same approach is now evaluated also for the Grid3/OSG Grid job submission which suffered also from job rate problems

Massive productions on 3 Grids (4) 73 data sets containing 6.1M events simulated and reconstructed (without pile-up) Total simulated data: 8.5M events Pile-up done later (for 1.3M events done up to last week)

Experience with LCG-2 Operations Support for our productions was excellent from the CERN-IT- EIS team Other LCG/EGEE structures were effectively invisible (GOC, ROCs, GGUS etc) –no communication line between experiments and the Grid Operations Centres –operational trouble info always through the EIS group –sites scheduled major upgrades or downtimes during our productions no concept of “service” for the service providers yet! many sites consider themselves as part of a test structure set up (and funded) by EGEE but we consider the LCG Grid as an operational service for us! –many sites do not have the concept of “permanent disk storage” in a Storage Element if they change something in their filing system, our catalogue has to be updated!

Second ProdSys development cycle The experience with DC2 and the Rome production taught us that we had to re-think at least some of the ProdSys components The ProdSys review defined the way forward: –Frederic Brochu one of the reviewers –Keep the global ProdSys architecture (system decomposition) –Replace or re-work all individual components to address the identified shortcomings of Grid middleware: reliability and fault tolerance first of all –Re-design the Distributed Data Management system to avoid single points of failure and scaling problems Work is now underway –target is end of Summer for integration tests –ready for LCG Service Challenge 3 from October onwards

Distributed Data Management Accessing distributed data on the Grid is not a simple task Several central DBs are needed to hold dataset information “Local” catalogues hold information on local data storage The new DDM system (right) is under test this summer It will be used for all ATLAS data from October on (LCG Service Challenge 3) Affects GridPP effort

Computing Operations The Computing Operations organization likely to change: a)Grid Tools b)Grid operations : lTier-0 operations lre-processing of real and simulated data at Tier-1's ldata distribution and placement lSoftware distribution and installation lSite and software installation validation and monitoring lCoordination of Service Challenges in lUser Support lProposal to use Frederic Brochu in front-line triage lCredited contribution lContingent on Distributed Analysis planning

Software Installation Software installation continues to be a challenge –Rapid roll-out of release to the Grid important for ATLAS UK eScience goals (3.1.4) –Vital for user code in distributed analysis Grigori Rybkine (50/50 GridPP/ATLAS eScience): –Working towards 3.1.5, kit installation and package management in distributed analysis –Package manager implementation supports tarball and locally-built code –Essential support role –3.1.5 progressing well, may have some delays because of external effort in nightly deployable packages

Current plans for EGEE/gLite Ready to test new components as soon as they are released from the internal certification process –assume the LCG Baseline Services Only seen the File Transfer Service & LCG File Catalogue –both being actively tested by our DDM group –FTS will be field-tested by Service Challenge 3 starting in July –LFC is in our plan for the new DDM (Summer deployment) Not really seen the new Workload Management System nor the new Computing Element –some ATLAS informal access to pre-release versions As soon as the performance is acceptable we will ask to have them deployed –this is NOT a blank check!

Distributed Analysis System ATLAS and GANGA work now focused on Distributed Analysis LCG RTAG 11 in 2003 did not produce a common analysis system project as hoped. ATLAS therefore planned to combine the strengths of various existing prototypes: –GANGA provides a Grid front-end for Gaudi/Athena jobs –DIAL provides fast, quasi-interactive, access to large local clusters –The ATLAS Production System to interface to the 3 Grid flavours Alvin Tan –Work on the job-building GUI and Job Options Editor well received Wish from LBL to merge JOE with Job Options Tracer project –Monitoring work also well received – prototypes perform well. Frederic Brochu –Provided beta version of new job submission from GANGA direct to Production System

Distributed Analysis System (2) Currently reviewing this activity to define a baseline for the development of start-up Distributed Analysis System –All this has to work together with the DDM system described earlier –Decide a baseline “now”, so we can have a testable system by this autumn –The outcome of the review may change GridPP plans

Conclusions ATLAS is (finally) getting effective throughput from LCG The UK effort is making an important contribution The Distributed Analysis is continuing to pose a big challenge –ATLAS is taking the right management approach –GridPP effort will have to be responsive