Computing Issues for the ATLAS SWT2
What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University of Oklahoma and Langston University One of five Tier 2 centers in the U.S. Builds, operates and maintains computing resources for U.S. ATLAS
ATLAS ATLAS is a particle detector being constructed as part of the Large Hadron Collider (LHC) LHC is in Geneva, Switzerland at CERN Collides Protons at 14 TeV with 40 MHz proton bunch crossing rate The goal is produce ~100 Hz event recording 1.5 PB raw data / year
Monte-Carlo Simulations in ATLAS Why simulate? –Trigger refinement –Reconstruction refinement –Understanding new phenomena Four step process –Generation –Simulation –Digitization –Reconstruction
MC Generation Generation step (Pythia is one example) –Work from a description of what type of event to generate Physics process Initial Energy Selection Criteria –Software uses MC techniques to generate candidate events –If events don’t meet criteria, more events generated
Simulation Uses generator output (particles/momenta) Physical description of detector (materials/environment)– Based on GEANT Produces accurate depiction of how the particles move through the various parts of the detector, losing/depositing energy in the process This step consumes the most CPU time
Digitization Introduces detector measurements Simulates behavior of detector electronics Input is taken from the simulation step Output is description of the event as seen by the measurement systems Output “looks like” real data from detector
Reconstruction Reconstructs physics objects from the raw data Same code is used for MC data and real data Input is taken from digitization step (or detector) Output is the data physicists use for analysis
Computing Event simulation is computationally intensive Simulation step takes a few minutes up to an hour per event Large need for simulated data during the lifetime of the experiment. Yields large need for compute cycles
Computing (cont.) At full luminosity raw data will be ~1.5 PB/year Data for analysis will be order of magnitude smaller (but retained) Most ATLAS physicists need access to this data Yields large need for storage resources
ATLAS Tier Model ATLAS is coping with “tiered” structure. Tier 0 (CERN) Tier 1 National Facility (Brookhaven National Laboratory) Tier 2 Regional Facility (SWT2, NET2, AGLT2, MWT2, SLAC) Tier3 University Facility (e.g. DPCC, HPC)
Purpose of a Tier 2 Perform Monte-Carlo based simulations of collision events Perform Reprocessing, converting raw data to physics description Provide resources for user analysis
Distributed Computing in ATLAS Data handling is central issue Move jobs or data Production versus Analysis –Production is CPU intensive –Analysis is I/O intensive How to manage user access Grid Computing
How To Satisfy demands Balance cost / performance SMP vs Commodity processor SMP’s –Great for memory intensive multi-threaded applications –Expensive Computing Clusters –Less expensive –Less performance for multi-threaded applications Can be improved by spending money on interconnect network
UTA Computing Clusters DPCC (Joint project between Physics and CSE) –Installed 2003 –~80 Node cluster (160 processors) –50TB Storage UTA_SWT2 –Installed 2006 –160 Node Cluster (320 processors) –16 TB Storage UTA_CPB –Being installed now –50 Node (200 processors) –60TB Storage
What Makes a Computing Cluster Head node(s) –Allows interactive login, compilation Worker nodes –Provide computing cycles, may not be accessible Network –Can be most important aspect depending on computing model Storage access –NFS, Global file systems Batch system –Controls access to worker nodes by scheduling work
Networking Cost vs. Performance for application –Main issue is communication latency Low cost (Ethernet) –Suitable for single threaded applications –Can be used for multi-threaded applications –Higher latency High cost (Myricom, Infiniband) –Low latency –Best suited for improving multi-threaded apps
DPCC Resources 80 worker nodes –Dual 2.4 / 2.66 GHz Xeon processors –2GB RAM –60/80 GB local Disk 50 TB RAID storage –11 raid servers (3 x 1.5 TB RAID Cards) 1000 Mb/s Ethernet internal network
DPCC Diagram
UTA_SWT2 Resources Wholly operated by UTA for ATLAS 160 Worker nodes –Dual 3.2 GHz Xeon EM64T processors –4GB RAM –160 GB local disk 16 TB Storage –DataDirect S2A3000 SAN –6 I/O servers running IBRIX Dual internal networks –Management (100 Mbs) –Storage (1000 Mbs)
UTA_SWT2
UTA_CPB Being constructed now 50 Worker nodes –Dual dual-core 2.4 GHz Opteron (2216) –8 GB RAM –80 GB Local Storage 60 TB Storage –6 I/O servers with attached storage –10 Dell MD1000 Storage Units (7.5 TB RAW) Single 1000 Mb/s Network Will likely be supplemented this year with additional storage
UTA_CPB
Summary 520 dedicated cores + ~100 additional available 90 TB disk space + ~30TB additional available