Download presentation
Presentation is loading. Please wait.
Published byShawn Shields Modified over 8 years ago
1
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community
2
Overview Distributed Analysis in ATLAS Grids, Computing Model The ATLAS Strategy Production system Direct submission Common Aspects Datamanagement Transformation GUI Initial experiences Production system on LCG PANDA in OSG GANGA
3
ATLAS Grid Infrastructure Three grids LCG OSG Nordugrid Significant resources, but different middleware Teams working on solutions are typically associated to a grid and its middleware In principle ATLAS resources are available to all ATLAS users Interest by our users to use their local systems in priority Not only a central system, flexibility concerning middleware Poster 181: Prototype of the Swiss ATLAS Computing Infrastructure
4
Distributed Analysis At this point emphasis on batch model to implement the ATLAS Computing model Interactive solutions are difficult to realize on top of the current middleware layer We expect our users to send large batches of short jobs to optimize their turnaround Scalability Data Access Analysis in parallel to production Job Priorities
5
ATLAS Computing Model Data for analysis will be available distributed on all Tier-1 and Tier-2 centers AOD & ESD T1 & T2 are open for analysis jobs The computing model foresees 50 % of grid resources to be allocated for analysis Users will send jobs to the data and extract relevant data typically NTuples or similar
6
Requirements Data for a year of data taking AOD – 150 TB ESD Scalability Last year up to 10000 jobs per day for production (job duration up to 24 hours) Grid and our needs will grow We expect that our analysis users will run much shorter jobs Job delivery capacity of the order of 10 6 jobs per day Peak capacity Involves several grids Longer jobs can reduce this number (but might not always be practical) Job Priorities Today we need short queues In the future we need to steer the resource consumption of our physics and detector groups based on VOMS groups
7
ATLAS Strategy Production system Seamless access to all ATLAS grid resources Direct submission to GRID LCG LCG/gLite Resource Broker CondorG OSG PANDA Nordugrid ARC Middleware
8
ProdDB CE Dulcinea Lexor Dulcinea CondorG CG PANDA RB ATLAS Prodsys
9
Production System Provides a layer on top of the middleware Increases the robustness by the system Retrials and fallback mechanism both for workload and data management Our grid experience is captured in the executors Jobs can be run in all systems Redesign based on the experiences of last year New Supervisor - Eowyn New Executors Connects to new Data Management Adaptation for Distributed Analysis Configurable user jobs Access control based on X509 Certificates Graphical User Interface ATCOM Presentation 110: ATLAS Experience on Large Scale Production on the Grid
10
LCG Resource Broker Scalability Reliability Throughput New gLite Resource Broker Bulk submission Many other enhancements Studied in ATLAS LCG/EGEE Taskforce Special setup in Milano & Bolongna gLite – 2-way Intel Xeon 2.8 CPU (with hyper-threading), 3 GByte memory LCG – 2-way Intel Xeon 2.4 CPU (without hyper-threading), 2 GByte memory Both are using the same BDII (52 CE in total) Several bug fixes and optimization Steady collaboration with the developers
11
LCG vs gLite Resource Broker Bulk submission much faster Sandbox handling better and faster Now the match making is the limiting factor Strong effect from ranking gLiteLCG (sec/job)Submission Match- making OverallSubmission Match- making Overall To any CE 0.32.73.02.41~0.012.42 To one CE 0.30.50.81.78~0.161.94
12
CondorG Conceptually similar to LCG RB, but different architecture Scaling by increasing the number of schedulers No logging & bookkeeping, but a scheduler keeps tracks of the job Used in parallel during DC2 & Rome production and increased our use of grid resources Submission via the Production System, but also direct submission is imaginable Presentation 401: A Grid of Grids using CondorG
13
Last years experience Adding CondorG based executor in the production system helped us to increase the number of jobs on LCG
14
PANDA New prodsys executor for OSG Pilot jobs Resource Brokering Close integration with DDM Operational in the production since December Presentation 347: PANDA: Production and Distributed Analysis System for ATLAS
15
PANDA Direct submission Regional production Analysis jobs Key features for analysis Analysis Transformations Job-chaining Easy job-submission Monitoring DDM end-user tool Transformation repository
16
ARC Middleware Standalone ARC client software – 13 MB Installation CE has extended functionality Input files can be staged and are cached Output files can be staged Controlled by XRSL, an extended version of globus RSL Brokering is part of the submission in the client software Job delivery rates of 30 to 50 per min have been reported Logging & bookkeeping on the site Currently about 5000 CPUs, 800 available for ATLAS
17
Common Aspects Data management Transformations GUI
18
ATLAS Data Management Based on Datasets PoolFileCatalog API is used to hide grid differences On LCG, LFC acts as local replica catalog Aims to provide uniform access to data on all grids FTS is used to transfer data between the sites Evidently Data management is a central aspect of Distributed Analysis PANDA is closely integrated with DDM and operational LCG instance was closely coupled with SC3 Right now we run a smaller instance for test purposes Final production version will be based on new middleware for SC4 (FPS) Presentation 75: A Scalable Distributed Data Management System for ATLAS
19
Transformations Common transformations is a fundamental aspect of the ATLAS strategy Overall no homogeneous system …. but a common transformation system allows to run the same job on all supported systems All system should support them In the end the user can adapt easily to a new submission system, if he does not need to adapt his jobs Separation of functionality in grid dependant wrappers and grid independent execution scripts. A set of parameters is used to configure the specific job options A new implementation in terms of python is under way
20
GANGA – The GUI for the Grid Common project with LHCb Plugins allow define applications Currently: Athena and Gaudi, ADA (DIAL) And backends Currently: Fork, LSF, PBS, Condor, LCG, gLite, DIAL and DIRAC Presentation 318: GANGA – A Grid User Interface
21
GANGA latest development New version 4 Job splitting GUI Work on plugins to various system is ongoing
22
Initial experiences PANDA on OSG Analysis with the Production System GANGA
23
PANDA on OSG pathena Lightweight submission interface to PANDA DIAL System submits analysis jobs to PANDA to get acces to grid resources First users are working on the system Presentation 38: DIAL: Distributed Interactive Analysis of Large Datasets
24
Distributed Analysis using Prodsys Currently based on CondorG Lexor based system on its way GUI ATCOM Central team operates the executor as a service Several analysis were ported to the system Selected users are testing it Poster 264: Distributed Analysis with the ATLAS Production System
25
GANGA Most relevant Athena application LCG backend Evaluated by several users Simulation & Analysis Faster submission necessary Prodsys/PANDA/gLite/CondorG Feedback All based on the CLI New GUI will be presented soon
26
Summary Systems have been exposed to selected users Positive feedback Direct contact to the experts still essential For this year – power users and grid experts … Main issues Data distribution → New DDM Scalability → New Prodsys/PANDA/gLite/CondorG Analysis in parallel to Production → Job Priorities
27
Conclusions As of today Distributed Analysis in ATLAS is still work in progress (the detector too) The expected data volume require us to perform analysis on the grid Important pieces are coming into place We will verify Distributed Analysis according to the ATLAS Computing Model in the context of SC4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.