The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee University of Michigan CCP 2006, Gyeongju, Korea August 29 th, 2006
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 2 Overview The ATLAS collaboration has only a year before it must manage large amounts of “real” data for its globally distributed collaboration. ATLAS physicists need the software and physical infrastructure required to: Calibrate and align detector subsystems to produce well understood data Realistically simulate the ATLAS detector and its underlying physics Provide access to ATLAS data globally Define, manage, search and analyze data-sets of interest I will cover current status, plans and some of the relevant research in this area and indicate how it might benefit ATLAS in augmenting and extending its infrastructure. ATLAS
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 3 The ATLAS Computing Model Computing Model is fairly well evolved, documented in C-TDR pdf pdf There are many areas with significant questions/issues to be resolved: Calibration and alignment strategy is still evolving Physics data access patterns MAY be exercised (SC04: since June) Unlikely to know the real patterns until 2007/2008! Still uncertainties on the event sizes, reconstruction time How best to integrate ongoing “infrastructure” improvements from research efforts into our operating model? Lesson from the previous round of experiments at CERN (LEP, ) Reviews in 1988 underestimated the computing requirements by an order of magnitude!
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 4 ATLAS Computing Model Overview We have a hierarchical model (EF-T0-T1-T2) with specific roles and responsibilities Data will be processed in stages: RAW->ESD->AOD-TAG Data “production” is well-defined and scheduled Roles and responsibilities are assigned within the hierarchy. Users will send jobs to the data and extract relevant data typically NTuples or similar Goal is a production and analysis system with seamless access to all ATLAS grid resources All resources need to be managed effectively to insure ATLAS goals are met and resource providers policy’s are enforced. Grid middleware must provide this
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 5 ATLAS Facilities and Roles Event Filter Farm at CERN Assembles data (at CERN) into a stream to the Tier 0 Center Tier 0 Center at CERN Data archiving: Raw data to mass storage at CERN and to Tier 1 centers Production: Fast production of Event Summary Data (ESD) and Analysis Object Data (AOD) Distribution: ESD, AOD to Tier 1 centers and mass storage at CERN Tier 1 Centers distributed worldwide (10 centers) Data steward: Re-reconstruction of raw data they archive, producing new ESD, AOD Coordinated access to full ESD and AOD (all AOD, % of ESD depending upon site) Tier 2 Centers distributed worldwide (approximately 30 centers) Monte Carlo Simulation, producing ESD, AOD, ESD, AOD sent to Tier 1 centers On demand user physics analysis of shared datasets Tier 3 Centers distributed worldwide Physics analysis A CERN Analysis Facility Analysis Enhanced access to ESD and RAW/calibration data on demand
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 6 Computing Model: event data flow from EF Events written in “ByteStream” format by the Event Filter farm in 2 GB files ~1000 events/file (nominal size is 1.6 MB/event) 200 Hz trigger rate (independent of luminosity) Currently 4+ streams are foreseen: Express stream with “most interesting” events Calibration events (including some physics streams, such as inclusive leptons) “Trouble maker” events (for debugging) Full (undivided) event stream One 2-GB file every 5 seconds will be available from the Event Filter Data will be transferred to the Tier-0 input buffer at 320 MB/s (average) The Tier-0 input buffer will have to hold raw data waiting for processing And also cope with possible backlogs ~125 TB will be sufficient to hold 5 days of raw data on disk
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 7 ATLAS Data Processing Tier-0: Prompt first pass processing on express/calibration & physics streams hours, process full physics streams with reasonable calibrations Implies large data movement from T0 →T1s, some T0 ↔ T2 (Calibration) Tier-1: Reprocess 1-2 months after arrival with better calibrations Reprocess all local RAW at year end with improved calibration and software Implies large data movement from T1↔T1 and T1 → T2
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 8 ATLAS partial &“average” T1 Data Flow (2008) Tier-0 CPU farm T1 Other Tier-1s disk buffer RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day AODm2 500 MB/file Hz 0.34K f/day 2 MB/s 0.16 TB/day RAW ESD2 AODm Hz 3.74K f/day 44 MB/s 3.66 TB/day RAW ESD (2x) AODm (10x) 1 Hz 85K f/day 720 MB/s T1 Other Tier-1s T1 Each Tier-2 Tape RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day disk storage AODm2 500 MB/file Hz 0.34K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day Plus simulation and analysis data flow Slide from D.Barberis There are a significant number of flows to be managed and optimized
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 9 ATLAS Event Data Model RAW: “ByteStream” format, ~1.6 MB/event ESD (Event Summary Data): Full output of reconstruction in object (POOL/ROOT) format: Tracks (+ their hits), Calo Clusters, Calo Cells, combined reconstruction objects etc. Nominal size 500 kB/event currently 2.5 times larger: contents and technology under revision AOD (Analysis Object Data): Summary of event reconstruction with “physics” (POOL/ROOT) objects: electrons, muons, jets, etc. Nominal size 100 kB/event currently 70% of that: contents and technology under revision TAG: Database used to quickly select events in AOD and/or ESD files
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 10 ATLAS Data Streaming ATLAS Computing TDR had 4 streams from event filter primary physics, calibration, express, problem events Calibration stream has split at least once since! Discussions are focused upon optimisation of data access At AOD, envisage ~10 streams TAGs useful for event selection and data set definition We are now planning ESD and RAW streaming Straw man streaming schemes (trigger based) being agreed Will explore the access improvements in large-scale exercises Are also looking at overlaps, bookkeeping etc
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 11 HEP Data Analysis Raw data hits, pulse heights Reconstructed data (ESD) tracks, clusters… Analysis Objects (AOD) Physics Objects Summarized Organized by physics topic Ntuples, histograms, statistical data
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 12 Production Data Processing Raw data Reconstruction Data Acquisition Level 3 trigger Trigger Tags Event Summary Data ESD Event Tags Physics Models Monte Carlo Truth Data MC Raw Data Reconstruction MC Event Summary Data MC Event Tags Detector Simulation Calibration Data Run Conditions Trigger System coordination required at the collaboration and group levels
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 13 Physics Analysis Event Tags Event Selection Calibration Data Analysis Processing Raw Data Tier 0,1 Collaboration wide Tier 2 Analysis Groups Tier 3, 4 Physicists Physics Analysis PhysicsObjects StatObjects ESD Analysis Objects PhysicsObjects StatObjects PhysicsObjects StatObjects
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 14 ATLAS Resource Requirements in for 2008 Recent (July 2006) updates have reduced the expected contributions Computing TDR
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 15 ATLAS Grid Infrastructure ATLAS plans to use grid technology To meet its resource needs To manage those resources Three grids LCG Nordugrid OSG Significant resources, but different middleware Teams working on solutions are typically associated to a grid and its middleware In principle all ATLAS resources are available to all ATLAS users Works out to O(1) cpu per user Interest by ATLAS users to use their local systems with priority Not only a central system, flexibility concerning middleware Plan “A” is “the Grid”…there is no plan “B”
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 16 ATLAS Virtual Organization Until recently the Grid has been a “free for all” no CPU or storage accounting (new in a prototyping/testing phase) no or limited priorities (roles mapped to small number of accounts: atlas01-04) no storage space reservation Last year ATLAS saw a competition for resources between “official” Rome productions and “unofficial”, but organized, productions B-physics, flavour tagging... The latest release of the VOMS (Virtual Organisation Management Service) middleware package allows the definition of user groups and roles within the ATLAS Virtual Organisation and is used by all ATLAS grid flavors! Relative priorities are easy to enforce IF all jobs go through the same system For a distributed submission system, it is up to the resource providers to: agree to the policies of each site with ATLAS publish and enforce the agreed policies
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 17 Calibrating and Aligning ATLAS Calibrating and aligning detector subsystems is a critical process Without well understood detectors we will have no meaningful physics data The default option for offline prompt calibrations is processing at Tier-0 or at the Cern Analysis Facility, however the TDR states that: “Tier-2 centres will provide analysis facilities, and some will provide the capacity to produce calibrations based on processing raw data”. “Tier-2 facilities may take a range of significant roles in ATLAS such as providing calibration constants, simulation and analysis”. “Some Tier-2s may take significant role in calibration following the local detector interests and involvements”. ATLAS will have some subsystems utilizing Tier-2 centers as Calibration and Alignment sites. Must insure we can support the data flow without disrupting other planned flows Real-time aspect is critical – the system must account for “deadlines”
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 18 L2PU Thread Calibration Server Local Server Gatherer Calibration farm disk Server Control Network x 25 x ~ 20 5 =Thread ~ 10 MB/s TCP/IP, UDP, etc. ~ 500 kB/s Dequeue Memory queue Proposed ATLAS Muon Calibration System (quoted bandwidths are for 10 KHz muon rate)
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 19 ATLAS Simulations Within ATLAS the Tier-2 centers will be responsible for the bulk of the simulation effort. Current planning assumes ATLAS will simulate approximately 20% of the real data volume This number is dictated by resources; ATLAS may need to find a way to increase this fraction Event generator frame work interfaces multiple packages including the Genser distribution provided by LCG-AA Simulation with Geant4 since early 2004 automatic geometry build from GeoModel >25M events fully simulated up to now since mid-2004 only a handful of crashes! Digitization tested and tuned with Test Beam
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 20 ATLAS Analysis Computing Model ATLAS Analysis model broken into two components Scheduled central production of augmented AOD, tuples & TAG collections from ESD Derived files moved to other T1s and to T2s Chaotic user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU- bound tasks matching the official MC production Modest to large(?) job traffic between T2s (and T1s, T3s)
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 21 Distributed Analysis At this point emphasis is on a batch model to implement the ATLAS Computing model Interactive solutions are difficult to realize on top of the current middleware layer We expect ATLAS users to send large batches of short jobs to optimize their turnaround Scalability Data Access Analysis in parallel to production Job Priorities Distributed analysis effectiveness depends strongly upon the hardware and software infrastructure. Analysis is divided into “group” and “on demand” types
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 22 ATLAS Group Analysis Group analysis is characterised by access to full ESD and perhaps RAW data This is resource intensive Must be a scheduled activity Can back-navigate from AOD to ESD at same site Can harvest small samples of ESD (and some RAW) to be sent to Tier 2s Must be agreed by physics and detector groups Group analysis will produce Deep copies of subsets Dataset definitions TAG selections Big Trains Most efficient access if analyses are blocked into a ‘big train’ Idea around for a while, already used in e.g. heavy ions Each wagon (group) has a wagon master=production manager Must ensure will not derail the train Train must run often enough (every ~2 weeks?)
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 23 ATLAS On-demand Analysis Restricted Tier 2s and CAF Could specialize some Tier 2s for some groups ALL Tier 2s are for ATLAS-wide usage Role and group based quotas are essential Quotas to be determined per group not per user Data Selection Over small samples with Tier-2 file-based TAG and AMI dataset selector TAG queries over larger samples by batch job to database TAG at Tier-1s/large Tier 2s What data? Group-derived EventViews Root Trees Subsets of ESD and RAW Pre-selected or selected via a Big Train run by working group Each user needs 14.5 kSI2k (about 12 current boxes) 2.1TB ‘associated’ with each user on average
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 24 ATLAS Data Management Based on Datasets PoolFileCatalog API is used to hide grid differences On LCG, LFC acts as local replica catalog Aims to provide uniform access to data on all grids FTS is used to transfer data between the sites To date FTS has tried to manage data flow by restricting allowed endpoints (“channel” definition) Interesting possibilities exist to incorporate network related research advances to improve performance, efficiency and reliability Data management is a central aspect of Distributed Analysis PANDA is closely integrated with DDM and operational LCG instance was closely coupled with SC3 Right now we run a smaller instance for test purposes Final production version will be based on new middleware for SC4 (FPS)
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 25 Distributed Data Management Accessing distributed data on the Grid is not a simple task (see below!) Several DBs are needed centrally to hold dataset information “Local” catalogues hold information on local data storage The new DDM system (right) is under test this summer It will be used for all ATLAS data from October on (LCG Service Challenge 3)
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 26 ATLAS plans for using FTS T1 T0 T2 LFC FTS Server T1 FTS Server T0 T1 …. VO box LFC: local within ‘cloud’ All SEs SRM Tier-0 FTS server: Channel from Tier-0 to all Tier-1s: used to move "Tier-0" (raw and 1st pass reconstruction data) Channel from Tier-1s to Tier-0/CAF: to move e.g. AOD (CAF also acts as "Tier-2" for analysis) Tier-1 FTS server: Channel from all other Tier-1s to this Tier-1 (pulling data): used for DQ2 dataset subscriptions (e.g. reprocessing, or massive "organized" movement when doing Distributed Production) Channel to and from this Tier-1 to all its associated Tier-2s Association defined by ATLAS management (along with LCG) “Star”-channel for all remaining traffic [new: low-traffic]
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 27 ATLAS and Related Research Up to now I have focused on the ATLAS computing model Implicit in this model and central to its success are: High-performance, ubiquitous and robust networks Grid middleware to securely find, prioritize and manage resources Without either of these capabilities the model risks melting down or failing to deliver the required capabilities. Efforts to date have (necessarily) focused on building the most basic capabilities and demonstrating they can work. To be truly effective will require updating and extending this model to include the best results of ongoing networking and resource management research projects. A quick overview of some selected (US) projects follows…
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 28 The UltraLight Project UltraLight is A four year $2M NSF ITR funded by MPS (2005-8) Application driven Network R&D. A collaboration of BNL, Buffalo, Caltech, CERN, Florida, FIU, FNAL, Internet2, Michigan, MIT, SLAC, Vanderbilt. Significant international participation: Brazil, Japan, Korea amongst many others. Goal: Enable the network as a managed resource. Meta-Goal: Enable physics analysis and discoveries which could not otherwise be achieved.
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 29 ATLAS and UltraLight Disk-to-Disk Research ATLAS MDT sub- systems need very fast calibration turn-around time (< 24 hours) Initial estimates plan for as much as 0.5 TB/day of high-Pt muon data for calibration. UltraLight UltraLight could enable us to quickly transport (~1/4 hour) the needed events to Tier-2 sites for calibration Michigan is an ATLAS Muon Alignment and Calibration Center, a Tier-2 and an UltraLight Site Muon calibration work has presented an opportunity to couple research efforts into production
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 30 Networking at KNU (Korea) TUses 10Gbps GLORIAD link from Korea to US, which is called BIG- GLORIAD, also part of UltraLight TTry to saturate this BIG- GLORIAD link with servers and cluster storages connected with 10Gbps TKorea is planning to be a Tier-1 site for LHC experiments Korea U.S. BIG-GLORIAD
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 31 VINCI: Virtual Intelligent Networks for Computing Infrastructures A network Global Scheduler implemented as a set of collaborating agents running on distributed MonALISA services Each agent uses policy-based priority queues; and negotiates for an end to end connection using a set of cost functions A lease mechanism is implemented for each offer an agent makes to its peers Periodic lease renewal is used for all agents; this results in a flexible response to task completion, as well as to application failure or network errors If network errors are detected, supervising agents cause all segments to be released along a path. An alternative path may then be set up rapidly enough to avoid a TCP timeout, allowing the transfer to continue uninterrupted.
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 32 Lambda Station A network path forwarding service to interface production facilities with advanced research networks: Goal is selective forwarding on a per flow basis Alternate network paths for high impact data movement Dynamic path modification, with graceful cutover & fallback Current implementation is based on policy-based routing & DSCP marking Lambda Station interacts with: Host applications & systems LAN infrastructure Site border infrastructure Advanced technology WANs Remote Lambda Stations D. Petravick, P. DeMar
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 33 TeraPaths (LAN QoS Integration) Site ASite B WAN web services WAN monitoring WAN web services hardware drivers Web page APIs Cmd line QoS requests user manager scheduler site monitor … router manager user manager scheduler site monitor … router manager The TeraPaths project investigates the integration and use of LAN QoS and MPLS/GMPLS-based differentiated network services in the ATLAS data intensive distributed computing environment in order to manage the network as a critical resource TeraPaths Includes: BNLMichigan ESNet (OSCARS) FNAL(LambdaStation)SLAC(DWMI)
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 34 Integrating Research into Production As you can see there are many efforts, even just within the US, to help integrate a managed network into our infrastructure There are also many similar efforts in computing, storage, grid-middleware and applications (EGEE, OSG, LCG,…). The challenge will be to harvest these efforts and integrate them into a robust system for LHC physicists. I will close with an “example” vision of what could result from such integration…
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 35 An Example: UltraLight/ATLAS Application (2008) Node1> fts –vvv –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum TFTS: Initiating file transfer setup… TFTS: Remote host responds ready TFTS: Contacting path discovery service TPDS: Path discovery in progress… TPDS:Path RTT ms, best effort path bottleneck is 10 GE TPDS:Path options found: TPDS:Lightpath option exists end-to-end TPDS:Virtual pipe option exists (partial) TPDS:High-performance protocol capable end-systems exist TFTS: Requested transfer 1.2 TB file transfer within 2 hours 50 minutes, priority 3 TFTS: Remote host confirms available space for TFTS: End-host agent contacted…parameters transferred TEHA: Priority 3 request allowed for TEHA: request scheduling details TEHA: Lightpath prior scheduling (higher/same priority) precludes use TEHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes TEHA: request monitoring prediction along path TEHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over next 2 hours 50 minutes
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 36 ATLAS FTS 2008 Example (cont.) TEHA: Virtual pipe (partial) expected to deliver 3 Gbps(+0/-0.3) during reservation; variance from unprotected section < 0.3 Gbps 95%CL TEHA: Recommendation: begin transfer using FAST-UL using network identifier #5A-3C1. Connection will migrate to MPLS/QoS tunnel in 52.3 minutes. Estimated completion in 1 hour minutes. TFTS: Initiating transfer between mercury.ultralight.org and venus.ultralight.org using #5A-3C1 TEHA: Transfer initiated…tracking at URL: fts://localhost/FTS/AE13FF132-FAFE39A-44-5A-3C1 TEHA: Reservation placed for MPLS/QoS connection along partial path: 3Gbps beginning in 52.2 minutes: duration 60 minutes TEHA: Reservation confirmed, rescode #9FA-39AF2E, note: unprotected network section included. T T TFTS: Transfer proceeding, average 1.1 Gbps, GB transferred TEHA: Connecting to reservation: tunnel complete, traffic marking initiated TEHA: Virtual pipe active: current rate 2.98 Gbps, estimated completion in minutes TFTS: Transfer complete, signaling EHA on #5A-3C1 TEHA: Transfer complete received…hold for xsum confirmation TFTS: Remote checksum processing initiated… TFTS: Checksum verified—closing connection TEHA: Connection #5A-3C1 completed…closing virtual pipe with 12.3 minutes remaining on reservation TEHA: Resources freed. Transfer details uploading to monitoring node TEHA: Request successfully completed, transferred 1.2 TB in 1 hour 41.3 minutes (transfer 1 hour 34.4 minutes )
The ATLAS Computing Model: Status, Plans and Future Possibilities Shawn McKee 37 Conclusions ATLAS is quickly approaching “real” data and our computing model has been successfully validated (as far as we have been able to take it). Some major uncertainties exist, especially around “user analysis” and what resource implications these may have. There are lots of R&D programs active in many areas of special importance to ATLAS (and LHC) which could significantly strengthen the core model The challenge will be to select, integrate, prototype and test the R&D developments in time to have a meaningful impact upon the ATLAS (or LHC) program Questions?