Download presentation
Presentation is loading. Please wait.
Published byAvice Stephens Modified over 9 years ago
1
Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13, 2012
2
Background Distributed production requires many different ATLAS specific SW components/applications Athena and Transformations – core software ProdSys – task management system AMI – Production Tags and Metadata PanDA – job execution system DQ2 – data management system Monitoring of tasks, data and jobs They utilize common tools like Globus, VDT, XRootD, Dcache, CVMFS, … deployed at our facilities Kaushik De 2November 13, 2012
3
Overview Many distributed production components used in ATLAS are being upgraded after ~5 years of continuous use In this talk we will focus on their evolution in 2013-2014 Athena on many fronts: AthenaMP, Athena64, AthenaGPU, AthenaPhi, Athena event service trf -> tf DQ2 -> Rucio ProdSys -> ProdSys II PanDA -> CAF PanDA -> BigData New monitoring capabilities Kaushik De 3November 13, 2012
4
AthenaXX Many future paths for Athena driven by hardware – will not talk about them here Interesting topic for distributed production – event service Basic unit of measurement in HEP is events – not bits, bytes or files Multi-core is the new paradigm (same as the old one) Caching technologies may be best optimized at event level Started discussions during SW week for event service Client-server architecture in Athena desirable long term PanDA server with Athena client will be first step to try November 13, 2012 Kaushik De 4
5
Job Transforms Job transforms – trf – workflow wrapper around Athena All production jobs use trf Most major ATLAS workloads are supported Including multi-step jobs New workloads like overlay, FTK … are being added Major changes underway See recent talks by Graeme Stewart https://indico.cern.ch/getFile.py/access?contribId=35&sessionId=19 &resId=0&materialId=slides&confId=169697 https://indico.cern.ch/getFile.py/access?contribId=35&sessionId=19 &resId=0&materialId=slides&confId=169697 https://indico.cern.ch/getFile.py/access?contribId=7&resId=0&materi alId=slides&confId=214562 https://indico.cern.ch/getFile.py/access?contribId=7&resId=0&materi alId=slides&confId=214562 Highlights of future changes in next few slides November 13, 2012 Kaushik De 5
6
November 13, 2012 Kaushik De 6
7
November 13, 2012 Kaushik De 7
8
November 13, 2012 Kaushik De 8
9
November 13, 2012 Kaushik De 9
10
November 13, 2012 Kaushik De 10
11
November 13, 2012 Kaushik De 11
12
November 13, 2012 Kaushik De 12
13
November 13, 2012 Kaushik De 13
14
November 13, 2012 Kaushik De 14 https://indico.cern.ch/getFile.py/access?contribId=1&sessionId=5 &resId=2&materialId=slides&confId=169697
15
November 13, 2012 Kaushik De 15
16
November 13, 2012 Kaushik De 16
17
November 13, 2012 Kaushik De 17
18
November 13, 2012 Kaushik De 18
19
What is ProdSys Task management system Interface to request production tasks Generate jobs for execution by PanDA Manage task completion Consisting of many scripts Web interface for task request Bulk task submission interface Auto generation of jobs from tasks Scripts for task completion Interacts with AMI and DQ2 And add-ons Task-list creation scripts developed by production managers Task monitoring November 13, 2012 Kaushik De 19
20
Current System November 13, 2012 Kaushik De 20 Production Manager Submits Tasks Jobs ProdSys Jobs PanDA User Bamboo User
21
What is ProdSys II Split ProdSys into two parts DEfT – task request and task definition Some components will be taken from current ProdSys JeDi – dynamic job definition and task execution Integrated with PanDA (replaces Bamboo) Will also be the engine for user analysis tasks Need to work closely with Transforms & Rucio groups All three systems should evolve together Integration with monitoring Will be planned from the beginning Kaushik De 21November 13, 2012
22
Future System November 13, 2012 Kaushik De 22 Production Manager DEfT PanDA User JeDi User
23
DEfT Key features Web UI for simplified interactive task request Task request system based on physics requirements Managers/users insulated from execution details Deprecate/remove script based task submission Error checking of task requests Built-in authentication and approval mechanisms Creates task according to a new simplified schema Kaushik De 23November 13, 2012
24
Tasks, Meta-tasks, Basket-tasks New extensions to the concept of task Task – basic unit Input dataset -> Output dataset Meta-task – chain of tasks, which will be auto-generated Manager/user makes single request Successive processing steps (transforms) created by DEfT Intermediate steps in chain may be specified as transient Basket-task – group of related tasks (eg. same tag) Manager/user can define basket of tasks Manager/user makes single request for execution Ability to clone tasks, meta-tasks and basket-tasks From pervious tasks, meta-tasks and basket-tasks Or from predefined templates Kaushik De 24November 13, 2012
25
JeDi Key features JeDi will be core component of PanDA Generate jobs dynamically from DEfT tasks Jobs are defined to match execution environment and specified constraints(eg. number of cores, duration, file size, dataset size…) Number of events varies per job Jobs are not predefined with fixed number of events – key feature PanDA responsible for optimal task execution PanDA responsible for task completion Auto-merging if requested Data will be collected by PanDA to optimize job execution and completion (expanded concept of scout jobs) Kaushik De 25November 13, 2012
26
Common Analysis Framework Task force to evaluate suitability of PanDA for a LHC common user analysis framework Latest report: https://indico.cern.ch/getFile.py/access?contribId=7& sessionId=19&resId=1&materialId=slides&confId=16 9697 https://indico.cern.ch/getFile.py/access?contribId=7& sessionId=19&resId=1&materialId=slides&confId=16 9697 https://indico.cern.ch/getFile.py/access?contribId=7& sessionId=19&resId=1&materialId=slides&confId=16 9697 November 13, 2012 Kaushik De 26
27
November 13, 2012 Kaushik De 27
28
November 13, 2012 Kaushik De 28
29
November 13, 2012 Kaushik De 29
30
November 13, 2012 Kaushik De 30
31
November 13, 2012 Kaushik De 31
32
November 13, 2012 Kaushik De 32
33
Conclusion Many updates/improvements planned 2013-2014 Some applications will be completely re-written But based on past 5 years of LHC experience Plans and teams are in place Will lead to better software running at facilities Waiting for current LHC run to end Stay tuned for more November 13, 2012 Kaushik De 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.