Download presentation
Presentation is loading. Please wait.
Published bySharleen May Modified over 8 years ago
1
1 ATLAS Computing on Grid3 and the OSG Rob Gardner University of Chicago Mardi Gras Conference February 4, 2005
2
2 New Physics Discoveries with ATLAS At CERN, Geneva, Switzerland The Large Hadron Collider (LHC) The ATLAS Detector The 27 km long LHC tunnel
3
3
4
4 Diameter25 m Barrel toroid length26 m End-cap end-wall chamber span46 m Overall weight 7000 Tons Scale of the detector
5
5 ATLAS superimposed on a 5 story building at CERN
6
6
7
7 ATLAS Collaboration
8
8
9
9 To understand the Higgs mechanism, imagine that a room full of physicists chattering quietly is like space filled with the Higgs field... if a rumor crosses the room,...a well-known scientist walks in, creating a disturbance as he moves across the room and attracting a cluster of admirers with each step... this increases his resistance to movement, in other words, he acquires mass, just like a particle moving through the Higgs field... …it creates the same kind of clustering, but this time among the scientists themselves. In this analogy, these clusters are the Higgs particles. The Higgs Particle Credit: The ATLAS Outreach Team (http://atlas.ch)http://atlas.ch And Prof. David J. Miller of University College London
10
10 ATLAS
11
11 If we can start up at 1/10 th design luminosity, we’ll discover a Higgs with mass greater than 130 GeV within 1 year. Will cover entire theoretically allowed range with 1 year of design luminosity. The Higgs Particle: Discovery
12
12 New Physics The Higgs Mechanism for generating mass has a problem It explains the masses of the known particles, but has a mathematical problem (divergence) at high energies To fix this, there must be new particles These new particles must show up at energies we will explore at the LHC.
13
13 Supersymmetric Signatures We will discover supersymmetry if it is what stabilizes the Higgs mass. Dramatic event signatures mean we will discover it quickly.
14
14 Channels of Electronics @ 40 MHz 1.45x10 8 190,000 10,000 3,600 1.2x10 6 Numbers are electronic channel count
15
15 Distributed Computing Centers… Tier2 Center ~200kSI2k Event Builder Event Filter ~7.5MSI2k T0 ~5MSI2k US Regional Centre (BNL) UK Regional Center (RAL) French Regional Center Dutch Regional Center Tier3 Tier 3 ~0.25TIPS Workstations 10 GB/sec 320 MB/sec 100 - 1000 MB/s links Each Tier 2 has ~20 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation Physics data cache ~Pb/sec ~ 75MB/s/T1 for ATLAS Tier2 Center ~200kSI2k 622Mb/s links Tier 0 Tier 1 Desktops PC (2004) = ~1 kSpecInt2k Other Tier2 ~200kSI2k Tier 2 ~200 Tb/year/T2 ~2MSI2k/T1 ~2 Pb/year/T1 ~5 Pb/year No simulation 622Mb/s links
16
16 US common Grid infrastructure Collection of Grid Services via the VDT and other providers
17
17 Prototype Tier2 Center in 2004 tier2-01 compute01 compute64 tier2-u1 compute01 Grid3/OS G Interactive/ Local users Public/Internet Private Network tier2-02tier2-03 Tier2 Prototype Cluster se1 se4 se2se3 tier2-u2tier2-u3 home tier2-mgt tier2-01 gatekeeper condor master tier2-02 gridFTP tier2-03 SRM gridFTP tier2-u1,2,3 Local job analysis tier2-mgt Rocks frontend webserver ganglia monalisa se1-se4 Grid3 /tmp, /app, /data atlas SE home home directories
18
18 Simulation Framework for 3 Grids USATLAS GTS
19
19 ProdDB Condor-G schedd GridMgr CE gsiftp WN SE Chimera RLS Windmill Pegasus VDC DonQuijote MonServers MonALISA gram Grid3 Sites Capone sch GridCat MDS System Architecture for US ATLAS
20
20 Capone Production on Grid3 ATLAS environment for Grid3 (VDT based) Accept ATLAS DC2 jobs from Windmill Manage all steps in the job life cycle prepare, submit, monitor, output & register Manage workload and data placement Process messages from Production Supervisor Provide useful logging information to user Communicate executor and job state information
21
21 Capone System Elements GriPhyN Virtual Data System VDC – catalog containing all US ATLAS transformations, derivations and job records Transformation A workflow accepting input data (datasets), parameters and producing output data (datasets) Derivation Transformation where the parameters have been bound to actual parameters Directed Acyclic Graph (DAG) Abstract DAG (DAX) created by Chimera, with no reference to concrete elements in the Grid; Concrete DAG (cDAG) created by Pegasus, where CE, SE and PFN have been assigned Globus, RLS, Condor-G all used
22
22 Capone Architecture Message interface Web Service Jabber Translation layer Windmill schema CPE (Process Engine) Processes Grid3: GCE interface Stub: local shell testing DonQuijote (future) Message protocols Jabber Web Service Translation ADA Windmill Process Execution PXECPE-FSM Stub Grid(GCE-Client) DonQuijote
23
23 Effective Access for DC2 ~150K ATLAS jobs from July 1 to Dec 8, ’04 Grid3 Sites with >100 successful DC2 jobs: 21
24
24 U.S. ATLAS Grid Production G. Poulard, 9/21/04 # Validated Jobs total Day 3M Geant4 events of ATLAS, roughly 1/3 of International ATLAS Over 150K 20 hour jobs executed Competitive with peer European Grid projects LCG and NorduGrid
25
25 Data Challenge Summary by Grid LCG Included some non- ATLAS sites, used the LCG-Grid- Canada interface NorduGrid Scandinavian resources + sites in Australia, Germany, Slovenia, Switzerland Grid3 Used computing resources that are not dedicated to ATLAS
26
26 Global Production by facility 69 sites ~276000 Jobs
27
27 Job Failure Analysis on Grid3 FailuresCumulative till end Nov 2004 Sep 2004Oct.-Nov. 2004 Submission894472422 Exe check428 0 Run-End1013111478984 StageOut1083380372796 RLS reg106598976 Capone host interruption 397527251250 Windmill56457507 Other5225513986 TOTAL331651930313862 Not all failures “equal” – some more costly than others…
28
28 Summary and Outlook ATLAS has made good use of the Grid3 and other grid infrastructures and will continue to accumulate lessons and experience for OSG Challenges for 2005: Address submit host scalability issues and job recovery Focus on data management, policy-based scheduling, and user access Support on-going production while developing new services and adapting to changes in the infrastructure
29
29 References and Acknowledgements PPDG, GriPhyN, iVDGL Collaborations US ATLAS Software and Computing, http://www.usatlas.bnl.gov/computing/ http://www.usatlas.bnl.gov/computing/ US ATLAS Grid Tools and Services http://grid.uchicago.edu/gts http://grid.uchicago.edu/gts UC Prototype Tier 2 http://grid.uchicago.edu/tier2/http://grid.uchicago.edu/tier2/ iVDGL: “ The International Virtual Data Grid Laboratory ” http://www.ivdgl.org/ http://www.ivdgl.org/ Grid3: “ Application Grid Laboratory for Science ” http://www.ivdgl.org/grid3/ http://www.ivdgl.org/grid3/ OSG: Open Science Grid Consortium http://www.opensciencegrid.org/ http://www.opensciencegrid.org/
30
30 A job in Capone (1, submission) Reception Job received from Windmill Translation Un-marshalling, ATLAS transformation DAX generation Chimera generates abstract DAG Input file retrieval from RLS catalog Check RLS for input LFNs (retrieval of GUID, PFN) Scheduling: CE and SE are chosen Concrete DAG generation and submission Pegasus creates Condor submit files DAGMan invoked to manage remote steps
31
31 A job in Capone (2, execution) Remote job running / status checking Stage-in of input files, create POOL FileCatalog Athena (ATLAS code) execution Remote Execution Check Verification of output files and exit codes Recovery of metadata (GUID, MD5sum, exe attributes) Stage Out: transfer from CE site to destination SE Output registration Registration of the output LFN/PFN and metadata in RLS Finish Job completed successfully, communicates to Windmill that jobs is ready for validation Job status is sent to Windmill during all the execution Windmill/DQ validate & register output in ProdDB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.