University of Texas At Arlington Louisiana Tech University

Slides:

Advertisements

Similar presentations

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.

Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

The first year of LHC physics analysis using the GRID: Prospects from ATLAS Davide Costanzo University of Sheffield

DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.

GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.

F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.

BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.

High Energy Physics & Computing Grids TechFair Univ. of Arlington November 10, 2004.

February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.

TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.

U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

PCAP Close Out Feb 2, 2004 BNL. Overall  Good progress in all areas  Good accomplishments in DC-2 (and CTB) –Late, but good.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.

ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.

ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.

J. Shank DOSAR Workshop LSU 2 April 2009 DOSAR Workshop VII 2 April ATLAS Grid Activities Preparing for Data Analysis Jim Shank.

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

Supporting Analysis Users in U.S. ATLAS

Computing Operations Roadmap

Database Replication and Monitoring

Virtualization and Clouds ATLAS position

Belle II Physics Analysis Center at TIFR

U.S. ATLAS Tier 2 Computing Center

POW MND section.

David Adams Brookhaven National Laboratory September 28, 2006

Data Challenge with the Grid in ATLAS

5th DOSAR Workshop Louisiana Tech University Sept. 27 – 28, 2007

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

PanDA in a Federated Environment

Readiness of ATLAS Computing - A personal view

Zhongliang Ren 12 June 2006 WLCG Tier2 Workshop at CERN

Cloud Computing R&D Proposal

US ATLAS Physics & Computing

Preparations for the CMS-HI Computing Workshop in Bologna

LHC Data Analysis using a worldwide computing grid

DØ MC and Data Processing on the Grid

Roadmap for Data Management and Caching

ATLAS DC2 & Continuous production

The LHCb Computing Data Challenge DC06

Presentation transcript:

University of Texas At Arlington Louisiana Tech University ATLAS Grid Activities Mark Sosebee University of Texas At Arlington DOSAR Workshop V Louisiana Tech University September 27, 2007

Overview - Introduction ATLAS grid activities comprise a diverse range and scope of work both in the U.S. and world-wide Two of the primary activities we’re involved with in the U.S. are: Managed Monte Carlo production User analysis In the U.S. we utilize the resources at the Tier 1 center (BNL), five Tier 2’s, plus a mix of Tier 3 and non-ATLAS dedicated OSG sites A central component of U.S. grid operations is PanDA – more about this later Mark Sosebee Sept 27, 2007

ATLAS Grids – Big Picture ATLAS is divided into three production grids (makes life interesting…) EGEE, NDGF and OSG (10 Tier 1’s, ~40 Tier 2’s, +…) (Disk space is quoted for the Tier 1’s) NDGF (Nordic) ~500 CPU’s, ~60 TB disk OSG (U.S.) ~2500 CPU’s, ~500 TB disk EGEE ~3500 CPU’s, ~400 TB disk Common features: Task/Job definition, Job dispatch (supervisor), metadata (AMI) Access to data through ATLAS DDM system Differences: Production software (executors), grid middleware Independent operations team for each grid Service architecture, storage systems… Mark Sosebee Sept 27, 2007

ATLAS Production Computer System Commissioning (CSC) Software integration and operations exercise Started more than 1.5 years ago (since end of 2005) Distributed (MC) production goals Validation of Athena software using ‘standard’ samples Continuous production of high statistics physics samples To exercise widely distributed computing infrastructure Distributed Data Management (DDM) goals Worldwide data distribution and replication Controlled flow of data T0 => T1 => T2 => T1 Bring sites and services to steady operations mode Mark Sosebee Sept 27, 2007

Details of Production Operations First, some nomenclature Job – atomic unit of execution (‘n’ events processed by Athena using a single CPU, real or virtual) Task – collection of jobs through a processing step (evgen, recon, pileup, digit…) for a physics topic (top, Higgs, SUSY…) Dataset – collection of similar files from a single task MC Production steps Physics groups define tasks (assigned to a specific grid) Tasks are converted to jobs and stored centrally Shift teams manage successful execution of jobs DDM operations team monitors and manages data flow Physicists analyze the data on the grid Mark Sosebee Sept 27, 2007

CSC Production Statistics Many hundreds of physics processes have been simulated Tens of thousands of tasks spanning two major releases Dozens of sub-releases (about every three weeks) have been tested and validated Thousands of ‘bug reports’ fed back to software and physics 50M+ events done from CSC12 >300 TB of MC data on disk Impressive team effort… Mark Sosebee Sept 27, 2007

Resource Usage CPU and disk resources available to ATLAS rising steadily Production system efficiencies are increasing But much more will be required for data taking Additional resources are steadily coming online Mark Sosebee Sept 27, 2007

Resource Allocation (U.S.) All U.S. ATLAS Tier 1/2 sites provide resources For reliable storage of distributed data CPU’s for managed production (ATLAS-wide groups) CPU’s for regional/local production of large samples through PanDA CPU’s for user analysis through pathena CPU’s for interactive Athena testing / software development (unlikely to be available at all Tier 2’s – will be available at BNL) ROOT analysis of AANtuple’s is expected to be done on personal workstations, and Tier 3 sites U.S. Resource Allocation Committee (chaired by Jim Shank) oversees fair share usage of resources Set allocations between ATLAS-wide and U.S. usage Set allocations between different groups Set quotas for individual users Mark Sosebee Sept 27, 2007

Data Production / Processing (U.S.) ATLAS managed production (MC, reprocessing) Historically, U.S. has contributed ~25% of MC production Tier 1 and Tier 2’s provide dedicated queues and storage for this Physics groups directly manage task requests (we will have quotas/allocations per group arbitrated by RAC) Detector, calibration, particle ID, test beam, commissioning groups… will also have allocations Regional U.S. production Same as ATLAS managed production – physics groups define tasks needed by U.S. physicists with special group name (Ex. ushiggs) PanDA manages quota (currently 20-25% for U.S. production) So far, U.S. physicists have been slow in taking advantage of this (less than 25% of the quota allocated by RAC is being used) Mark Sosebee Sept 27, 2007

PanDA PanDA is the distributed computing service infrastructure deployed in the U.S. and Canada (joined over the Summer) Includes all U.S. and Canadian Tier 1’s and Tier 2’s Plus some opportunistic/shared sites Works both with OSG (U.S.) and EGEE (Canada) middleware PanDA uses pilot jobs and provide a single task queue Pilot jobs allow instant activation of highest priority tasks Task queue brings ‘batch system’ concept to distributed grids Athena jobs transparently become ‘pathena’ jobs running in PanDA PanDA provides an integrated software and computing system for U.S. / Canadian ATLAS sites, managing all production and user analysis activities Mark Sosebee Sept 27, 2007

More on PanDA PanDA Distributed Production and Distributed Analysis models tightly adhere to the ATLAS Computing Model Goals of PanDA model: Support ATLAS-wide multi-Tiered computing (no longer U.S. only) Support analysis needs of ATLAS physicists Data availability is critically important! Sufficient computing cycles for a variety of use cases PanDA and pathena are the only distributed processing tools currently supported by U.S. ATLAS computing Interactive Athena-based computing is provided primarily at BNL (and CERNCAF – competing with rest of ATLAS) Mark Sosebee Sept 27, 2007

PanDA (U.S. & Canada) Production Statistics PanDA has completed ~30M fully simulated physics events (simul+digit step), >30% of total central production Also successfully completed >15M single particle events Since November, all available CPU’s occupied (ran out of jobs only for few days, plus few days of service outages) About 400 TB of original data stored at BNL T1 (includes data generated on other grids) Additional ~100 TB of replicas kept at U.S. ATLAS Tier 2 sites Canadian sites are ramping up rapidly – large contributions Mark Sosebee Sept 27, 2007

Number of R12 Panda Jobs DONE 24783 susy 8123 standard 437435 physics 12263 ushiggs 14942 higgs 13917 panda 153460 prodsys 3505 usbphysics 131114 egamma 2148 btagging 4790 top 9771 bphysics 1323 admin 3996 exotics 11144 validation Mark Sosebee Sept 27, 2007

Panda Central Production Since 1/1/06 Since 3/1/07 Mark Sosebee Sept 27, 2007

DDM DDM – distributed data management DDM is critical for ATLAS Service to distribute data among Tiers of ATLAS according to the Computing Model (current implementation is called DQ2) DDM is critical for ATLAS To catalog and manage flow of data In some respects it has not performed at the level needed Main issues Robustness of site services (transfer agents) Quality of Service (completion of data movement tasks) Too much manual intervention required What can be done? Ask ATLAS management for more effort Develop backup plan for data movement (DQ2 catalogs are fine) Giving up on ATLAS computing model is not an option… These topics are actively under discussion Mark Sosebee Sept 27, 2007

Data Locations in the U.S. Tier 1 – main repository of data (MC & Primary) Store complete set of ESD, AOD, AANtuple & TAG’s on disk Fraction of RAW and all U.S. generated RDO data Tier 2 – repository of analysis data Store complete set of AOD, AANtuple & TAG’s on disk Complete set of ESD data divided among five Tier 2’s Data distribution to Tier 1 & Tier 2’s will be managed Tier 3 – unmanaged data -- matched to local interests Data through locally initiated subscriptions Mostly AANtuple’s, some AOD’s Tier 3’s will be associated with Tier 2 sites? Tier 3 model is still not fully developed – everyone’s input is needed Mark Sosebee Sept 27, 2007

Computing Service Model PanDA production (including pathena) is managed at Tier 1 and Tier 2 resources by U.S. Production Team, working with local site support teams We have started integrating Canadian sites (evolving) DDM Operations team (coordinated by Alexei Klimentov) manages data distribution, data access issues Production team operates shifts to provide QoS Hypernews, RT user support system, and Savannah bug reporting systems are available to users Various categories are available – PanDA, production, site issues, data management… Mark Sosebee Sept 27, 2007

Projections for U.S. Tier 2’s Totals include capacity committed to international ATLAS and capacity retained under U.S. control for U.S. physicists Most Tier 2’s are approximately at half the value shown below for 2007 Mark Sosebee Sept 27, 2007

Where Is This All Headed? Goal: steady ramp up of operations till the end of 2007 MC Production Achieve 10 Million evts/week by the end of 2007 All targets that were set in 2005 have been met – so far DDM Operations Steady operations Automation and monitoring improvements Computing Model data flow implemented by end of 2007 Mark Sosebee Sept 27, 2007

Conclusion Computing systems are generally in good shape – through long running commissioning (CSC) exercise 2006 – successful transition into operations mode 2007 – high volume production and DDM operations Successful distributed MC production for CSC notes But, many more challenges to come Need more participation by American physicists – as users, developers, shifters… source of feedback –what works, what doesn’t Lots of resources available for physicists Overall, progressing well towards full readiness for LHC data Mark Sosebee Sept 27, 2007

https://twiki.cern.ch/twiki/bin/view/Atlas/PanDA Links O’Interest BNL Physics, Software and Computing page: http://www.usatlas.bnl.gov/USATLAS_TEST/Physics.shtml The PanDA Twiki: https://twiki.cern.ch/twiki/bin/view/Atlas/PanDA U.S. ATLAS grid computing page – tons of info – maintained by Horst: http://www.usatlas.bnl.gov/computing/grid/ Obtaining a grid certificate: http://www.usatlas.bnl.gov/twiki/bin/view/AtlasSoftware/ObtainingGridCertificate.html Mark Sosebee Sept 27, 2007