Presentation is loading. Please wait.

Presentation is loading. Please wait.

The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.

Similar presentations


Presentation on theme: "The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community."— Presentation transcript:

1 The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community

2 Overview Distributed Analysis in ATLAS  Grids, Computing Model The ATLAS Strategy  Production system  Direct submission Common Aspects  Datamanagement  Transformation  GUI Initial experiences  Production system on LCG  PANDA in OSG  GANGA

3 ATLAS Grid Infrastructure Three grids  LCG  OSG  Nordugrid Significant resources, but different middleware  Teams working on solutions are typically associated to a grid and its middleware In principle ATLAS resources are available to all ATLAS users  Interest by our users to use their local systems in priority  Not only a central system, flexibility concerning middleware Poster 181: Prototype of the Swiss ATLAS Computing Infrastructure

4 Distributed Analysis At this point emphasis on batch model to implement the ATLAS Computing model  Interactive solutions are difficult to realize on top of the current middleware layer We expect our users to send large batches of short jobs to optimize their turnaround  Scalability  Data Access Analysis in parallel to production  Job Priorities

5 ATLAS Computing Model Data for analysis will be available distributed on all Tier-1 and Tier-2 centers  AOD & ESD  T1 & T2 are open for analysis jobs  The computing model foresees 50 % of grid resources to be allocated for analysis Users will send jobs to the data and extract relevant data  typically NTuples or similar

6 Requirements Data for a year of data taking  AOD – 150 TB  ESD Scalability  Last year up to 10000 jobs per day for production (job duration up to 24 hours) Grid and our needs will grow We expect that our analysis users will run much shorter jobs  Job delivery capacity of the order of 10 6 jobs per day Peak capacity Involves several grids Longer jobs can reduce this number (but might not always be practical) Job Priorities  Today we need short queues  In the future we need to steer the resource consumption of our physics and detector groups based on VOMS groups

7 ATLAS Strategy Production system  Seamless access to all ATLAS grid resources Direct submission to GRID  LCG LCG/gLite Resource Broker CondorG  OSG PANDA  Nordugrid ARC Middleware

8 ProdDB CE Dulcinea Lexor Dulcinea CondorG CG PANDA RB ATLAS Prodsys

9 Production System Provides a layer on top of the middleware  Increases the robustness by the system Retrials and fallback mechanism both for workload and data management  Our grid experience is captured in the executors  Jobs can be run in all systems Redesign based on the experiences of last year  New Supervisor - Eowyn  New Executors  Connects to new Data Management Adaptation for Distributed Analysis  Configurable user jobs  Access control based on X509 Certificates  Graphical User Interface ATCOM Presentation 110: ATLAS Experience on Large Scale Production on the Grid

10 LCG Resource Broker  Scalability  Reliability  Throughput New gLite Resource Broker  Bulk submission  Many other enhancements  Studied in ATLAS LCG/EGEE Taskforce Special setup in Milano & Bolongna  gLite – 2-way Intel Xeon 2.8 CPU (with hyper-threading), 3 GByte memory  LCG – 2-way Intel Xeon 2.4 CPU (without hyper-threading), 2 GByte memory  Both are using the same BDII (52 CE in total) Several bug fixes and optimization  Steady collaboration with the developers

11 LCG vs gLite Resource Broker Bulk submission much faster Sandbox handling better and faster Now the match making is the limiting factor  Strong effect from ranking gLiteLCG (sec/job)Submission Match- making OverallSubmission Match- making Overall To any CE 0.32.73.02.41~0.012.42 To one CE 0.30.50.81.78~0.161.94

12 CondorG Conceptually similar to LCG RB, but different architecture  Scaling by increasing the number of schedulers  No logging & bookkeeping, but a scheduler keeps tracks of the job Used in parallel during DC2 & Rome production and increased our use of grid resources Submission via the Production System, but also direct submission is imaginable Presentation 401: A Grid of Grids using CondorG

13 Last years experience Adding CondorG based executor in the production system helped us to increase the number of jobs on LCG

14 PANDA New prodsys executor for OSG  Pilot jobs  Resource Brokering  Close integration with DDM Operational in the production since December Presentation 347: PANDA: Production and Distributed Analysis System for ATLAS

15 PANDA Direct submission  Regional production  Analysis jobs Key features for analysis  Analysis Transformations  Job-chaining  Easy job-submission  Monitoring  DDM end-user tool  Transformation repository

16 ARC Middleware Standalone ARC client software – 13 MB Installation CE has extended functionality  Input files can be staged and are cached  Output files can be staged  Controlled by XRSL, an extended version of globus RSL Brokering is part of the submission in the client software  Job delivery rates of 30 to 50 per min have been reported  Logging & bookkeeping on the site Currently about 5000 CPUs, 800 available for ATLAS

17 Common Aspects Data management Transformations GUI

18 ATLAS Data Management Based on Datasets PoolFileCatalog API is used to hide grid differences  On LCG, LFC acts as local replica catalog  Aims to provide uniform access to data on all grids FTS is used to transfer data between the sites Evidently Data management is a central aspect of Distributed Analysis  PANDA is closely integrated with DDM and operational  LCG instance was closely coupled with SC3  Right now we run a smaller instance for test purposes  Final production version will be based on new middleware for SC4 (FPS) Presentation 75: A Scalable Distributed Data Management System for ATLAS

19 Transformations Common transformations is a fundamental aspect of the ATLAS strategy Overall no homogeneous system …. but a common transformation system allows to run the same job on all supported systems  All system should support them  In the end the user can adapt easily to a new submission system, if he does not need to adapt his jobs Separation of functionality in grid dependant wrappers and grid independent execution scripts. A set of parameters is used to configure the specific job options A new implementation in terms of python is under way

20 GANGA – The GUI for the Grid Common project with LHCb Plugins allow define applications  Currently: Athena and Gaudi, ADA (DIAL) And backends  Currently: Fork, LSF, PBS, Condor, LCG, gLite, DIAL and DIRAC Presentation 318: GANGA – A Grid User Interface

21 GANGA latest development New version 4 Job splitting GUI Work on plugins to various system is ongoing

22 Initial experiences PANDA on OSG Analysis with the Production System GANGA

23 PANDA on OSG pathena  Lightweight submission interface to PANDA DIAL  System submits analysis jobs to PANDA to get acces to grid resources First users are working on the system Presentation 38: DIAL: Distributed Interactive Analysis of Large Datasets

24 Distributed Analysis using Prodsys Currently based on CondorG  Lexor based system on its way GUI ATCOM Central team operates the executor as a service Several analysis were ported to the system Selected users are testing it Poster 264: Distributed Analysis with the ATLAS Production System

25 GANGA Most relevant  Athena application  LCG backend Evaluated by several users  Simulation & Analysis  Faster submission necessary Prodsys/PANDA/gLite/CondorG Feedback  All based on the CLI  New GUI will be presented soon

26 Summary Systems have been exposed to selected users  Positive feedback  Direct contact to the experts still essential  For this year – power users and grid experts … Main issues  Data distribution → New DDM  Scalability → New Prodsys/PANDA/gLite/CondorG  Analysis in parallel to Production → Job Priorities

27 Conclusions As of today Distributed Analysis in ATLAS is still work in progress (the detector too) The expected data volume require us to perform analysis on the grid Important pieces are coming into place We will verify Distributed Analysis according to the ATLAS Computing Model in the context of SC4


Download ppt "The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community."

Similar presentations


Ads by Google