Amber Boehnlein, FNAL Accelerator Based Physics: ATLAS CDF CMS DO STAR Amber Boehnlein OSG Consortium Meeting January 24, 2006
Amber Boehnlein, FNAL Particle Physics These five physics experiments are physics facilities with the intent of testing the Standard ModelStandard Model What are the questions: What are the questions What causes electroweak symmetry breaking ? Does Quantum Chromodynamics precisely describe the behavior of quarks and gluons? What is the mechanism of CP violation? What is the wave function of the proton; of a heavy nucleus? ... What we measure: The production and decay of particles and associated properties Cross sections, spectra measurements (E, Pt, eta,...), angular distributions, particle correlations the top mass and properties Properties of the electoweak bosons. Flavor physics; mixing … What we seek: Higgs Boson SUSY and other new phenomena beyond the Standard Model
Amber Boehnlein, FNAL The Road to Physics Reconstruction (RECO) detector algorithms particle identifications production farm user ready data format … pedestals gains, linearity … Calibration detector trigger system data acquisition … Raw Data: event generation Geant detector simulation Fast simulations … Physics Analysis event selections efficiencies & backgrounds … Databases Network Releases Operation Data handling & access Trigger simulations Luminosity passes through software and computing … Monte Carlo
Amber Boehnlein, FNAL OSG is a road to Physics Reconstruction (RECO) DO-reconstruction from raw and from derived Data STAR … pedestals gains, linearity … Calibration detector trigger system data acquisition … Raw Data: Atlas, CMS, CDF,DO Star … Physics Analysis Atlas CMS CDF STAR Monte Carlo Atlas CDF CMS DO
Amber Boehnlein, FNAL Implications Calibration database connectivity via some mechanism is essential for reconstruction “User” application code/macros distributed as self contained tarballs or as a an advertised local installation of code distribution. Computations can be compute intensive ALPGEN simulates multi-parton processes well, but is much slower than other standard packages ALPGEN Flagship analysis: CDF estimates 84 GHZ-years for top mass and cross section analyses (manipulating about 10 TB of data) Computations can be data intensive Reconstruction process typically GBs of data and GB. Run over terabytes of input data clustered in hundreds of GB of dataset for bookkeeping purposes. Job management is shaped around this clustering, resulting in bursts (hundreds) of local jobs submitted at the same time. Jobs typically run for several hours and typically require external network connectivity. For efficient storage, output files might require merging. OSG provides a maturing infrastructure to run within this paradigm. Resources are made available via standard interfaces for job and data management. Operationally issues such as time synchrony for security, local scratch management.
Amber Boehnlein, FNAL Operations CDF, DO, STAR Mature experiments accumulating ~1pb/year, Billions of events, Millions of files… Well established and stable applications Anticipating upgrades in detectors, luminosity All depend on distributed computing Atlas, CMS Use of MC data challenges, test beam data to test infrastructure and prepare for physics Cosmic ray commissioning Computing scales up dramatically compared to current experiments in all dimensions, including number of collaborators. My thanks to all those who contributed to this talk!
Amber Boehnlein, FNAL CDF Operational Modes OSG for MC production, Targeting other production chain tasks such as generating user level ntuples Condor-g submission Self contained tarball for production applications DB access via squid server or connection to FNAL Pursuing user analysis using “glide CAF” Provides familiar user environment investigating user-level mounting of a remote filesystem using HTTP, and using local squid servers for caching to provide flexibility of the full CDF software distribution Will rely on SAM for data handling
Amber Boehnlein, FNAL CMS Operations CMS Relies on OSG for two significant activities Centralized production of simulated events in the US CMS is performing both opportunistic submission to non-CMS sites Centralized submission by a dedicated to US-CMS sites Remote submission of user simulation on the US-CMS Tier-2 sites User submission of jobs to access data published as being available at the site CMS Simulated Event Production Over the last 4 years CMS has been successfully submitting simulated production jobs to distributed computing sites using ever-improving grid middleware CMS dedicated infrastructure initially, followed by Grid3, followed by OSG In 5 months in 2006 we expect to generate 50M events for the next challenge. The OSG share is 15M-20M CMS Analysis Activities During the Worldwide LCG (WLCG) service challenge CMS submitted analysis jobs to access local data Thousands of jobs, 10s of TB of data access During the challenge only dedicated expert users Next step will include normal users
Amber Boehnlein, FNAL CMS Simulation Submitted centrally from UFL by a dedicated team Adds to 1FTE of effort over three people Relatively quiet period for CMS over the final quarter of 2005 CMS ran 5M events with three processing steps on OSG resources Represents about 40CPU years of computing During ramp for DC04 CMS utilized several hundred years of CPU More than 100 years of Opportunistic resources CMS expects to generate a Sample roughly the size Of the raw data at start USCMS contribution is Roughly 30% of this 800TB per year of simulation by The start of high lumi running
Amber Boehnlein, FNAL CMS Analysis Experience Service challenge 3 CMS ran over 18k jobs on OSG connected Tier-2 resources. Completed 14k, corresponding to ~20 TB The total data read was ~20TB, Preload data at site using phedex. Submission and completion efficiency still need to be improved Many of the failures were uniquely attributable to CMS First large scale analysis attempt for CMS on OSG Increasing user participation on OSG analysis to the whole collaboration and improving the experience are part of the 2006 program of work
Amber Boehnlein, FNAL Star Operations SUMS based (STAR Unified Meta-Scheduler) High level User JDL describes task, code needed, dataset and SUMS submits to appropriate sites depending on user resource requirements or hints Assumed software installed - Transferred input using GRAM input (achive/ tarball) - Output transferred using GRAM output - Integrated Cataloging possible via RRS = Replica Registration Service) making this fully automated MC - ALL is SUMS based - MC jobs only, nightly test (QA) moved to Grid - PACMAN packages available for STAR software for one OS (Linux) - Use Archive SandBox for the specific codes [mostly used]. - Assumes DB connectivity and outbound connections. - More recently: SRM transfer of output - Job submission COndor-G based Plan to migrate all MC to OSG Offload from Tier0 and Tier1 center to ANY resource Allow Tier2 to submit R&D simulations (RHIC-II detector simulation)
Amber Boehnlein, FNAL STAR Analysis Experience Star has very positive user analysis experience with 10K jobs/user. User analysis is “expert” only STAR has strong incentive to encourage generic users Users already severely constrained Opportunistic computing for user analysis makes more sense at this stage (jobs are smaller as time and input adaptable to even the smallest site). RHIC-II running will require more resources. Data moved/relocated/managed on demand (in the background) Generic user analysis would require mechanism to locate "Hot" datasets Would need (require) SE enabled sites and asynchronous CPU / data transfer mechanism (like SRM now) RRS-like essential for automation of data mining and registration on arrival (immediate access and exploitation) * * Concerned of user needs mismatching available QoS and "help desk" - OSG our best hope.
Amber Boehnlein, FNAL DO Operational Modes DO Depends on distributed computing for MC, production chain activities Use SAMGrid to submit jobs SAMGrid can broker jobs Or forward Data handling via SAM Data Sets delivered to local cache Self-contained Tarball distributed via SAM DB access via proxy servers Next steps will be towards targeted ID activities such as jet energy scale determination to improve systematic error M(top) = /- 3.0 (stat) +/- 3.2 (JES) +/- 1.7 (other) GeV
Amber Boehnlein, FNAL DO Operations Monte Carlo production Reprocessing Reprocessing Improved tracking, EM calorimeter calibration ~1 B event effort using 4000 GHz cpu equivalents for 9 months at 12 sites (3 OSG sites) Would have taken ~5 years on FNAL DO dedicated resources. Calibration DB access via proxy servers Refixing DO applied new hadronic calorimeter calibration post processing on FNAL dedicated analysis resources. Found a problem and are doing so again— Six week target using remote facilities. Fixed some skims for immediate use QCD sample processed on CMS farm (OSG site) Full effort ramping up—cpu needs same scale as reprocessing Moving aggressively to use 1000 GHz equivalents on OSG! Every DO publication depends on Grid Computing
Amber Boehnlein, FNAL ATLAS Production Runs ( ) Grid ProductionWorldwideU.S.U.S. Tier 2 Jobs (k)Events (M)Jobs (k)Events (M)Percentage of U.S. Jobs done by three U.S. T2 sites Data Challenge 2 (DC2) % Rome Physics Workshop % U.S. Tier 2 role was critical to success of ATLAS production Over 400 physicists attended Rome workshop, 100 papers presented based on the data produced during DC2 and Rome production U.S. provided resources on appropriate scale for U.S. physicists (60k CPU- days, >50 TB data), provided leadership roles in organization of challenges, in key software development, and in production operations Production during DC2 and Rome established a hardened Grid3 infrastructure benefiting all participants in Grid3
Amber Boehnlein, FNAL Next ATLAS Production Formerly, DC3, now Computer System Commissioning Simulate 10^7 events (same order as DC2) Full software commissioning –calibration and alignment Will need ~2000 CPU in the U.S. continuously in 2006 OSG opportunistic resource will provide an important part of these resources. Started last week.
Amber Boehnlein, FNAL Atlas MC Analysis Running Alpgen possible with OSG Resources
Amber Boehnlein, FNAL Conclusions OSG is providing progressively more mature infrastructure Increased use is leading to positive feedback from the perspective of users and providers of middleware and facilities The Accelerator based experiments are relying on it to deliver their physics programs.