Presentation is loading. Please wait.

Presentation is loading. Please wait.

Requirements for a Next Generation Framework: ATLAS Experience S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett, D. Malon, G. Stewart, B.

Similar presentations


Presentation on theme: "Requirements for a Next Generation Framework: ATLAS Experience S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett, D. Malon, G. Stewart, B."— Presentation transcript:

1 Requirements for a Next Generation Framework: ATLAS Experience S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett, D. Malon, G. Stewart, B. M. Wynne on behalf of ATLAS experiment

2 Introduction Most if the existing frameworks were designed before multi- core era Early multi-core machines were still exploited by multi-processing approach at the cost of increased memory usage Ever increasing core counts, wider vector units and co-processors necessitate re-thinking of event processing S. Kama, CHEP 2015 2

3 Future Framework Requirements Group Established to study the framework requirements of the ATLAS in Run 3 time frame and beyond Composed of experts from various domains including Core Software Reconstruction Simulation Trigger Analysis Distributed Computing and HPC Started working spring 2014 Weekly meetings to study Existing framework support Experience from Run 1 and recent developments from each perspective Future requirements Delivered its report in December 2014 S. Kama, CHEP 2015 3

4 Main Requirements Multi-Threaded operation Multiple events on flight Sub-event contexts (ability to work with regions of interest) and early rejection Minimal changes to existing code (a.k.a. person power) Good to have Co-processor support Improved I/O S. Kama, CHEP 2015 4

5 Multiple Threads Single thread event processing is well established Running multiple instances of single threaded process is wasting memory Running event processing in multiple threads require separation of global components like multi-event services and local components such as per-event or sub-event algorithms and tools S. Kama, CHEP 2015 5 Services ToolsToolsTools Services Tools Tools Tools Tools Tools Tools Ideal Case Shapes = Algorithms Colors = events Time Services span multiple events while tools are in event context

6 Multiple Events Most algorithms are not thread safe Having multiple events on flight with algorithm cloning (Gaudi-Hive) hides inherent serial sections (see C. Leggett’s talk) Multiple events requires event context and dependency tracking As a result scheduler becomes much more complex than serial scheduler Time flow S. Kama, CHEP 2015 6 Shapes = Algorithms Colors = events

7 Event Store - Whiteboard Algorithms and tools create, update or consume EDM objects EDM object can only be passed between components through Whiteboard service Data from multiple events can exist on whiteboard at the same time Data from the same event or different events can be accessed from multiple threads simultaneously Each EDM object has an event-context, associating them with one of the events that are being processed S. Kama, CHEP 2015 7 Whiteboard Track seeds Clone removed seeds Tracks ⁞ ⁞ Track seeds Clone removed seeds Tracks ⁞ ⁞ Track seeds Clones removed Write Update Read Write Tracks ⁞ ⁞

8 Trigger Processing S. Kama, CHEP 2015 8 A μ of 8GeV L1 creates an RoIs with 3 different thresholds RoI seeds 3 different muon chains Fails prescale Fails pt cut RoI passes the chain -> full event data is read Not executed again, already calculated ATLAS trigger uses small subset of data called Region of Interest(RoI) RoIs are identified by Hardware Level1 trigger around interesting activity such as a set of hits in muon chambers and ID, some high energy activity in calorimeter cells These RoIs are processed in sequences of trigger algorithms called Trigger Chains (TC) Each TC is executed stepwise on all suitable RoIs, if a step fails, following steps are not executed. If any TC decides the RoI is interesting, complete event data is read from the detector and processing continues until all chains are executed on full event data An algorithm is only executed once on same data(caching) An event is accepted if at least one chain accepts or rejected if all chains reject. Calculate pt

9 Trigger Requirements RoI and step-wise processing is necessary for limiting the bandwidth from the detector and efficient CPU utilization by early rejection RoIs can be split into smaller chunks or can be merged into bigger pieces. If all chains fail at a step, event is rejected and further processing should stop Algorithms should not be executed on same data more than once if the same algorithm is already executed in another chain before Time dependent changes in processing has to be handled S. Kama, CHEP 2015 9 Track seeds Clone removed seeds Tracks Track seeds Clone removed seeds Tracks Track seeds Clone removed seeds Tracks Whiteboard Scheduled chains Same chains different RoIs Containers per RoI

10 Event Views Event view concept is a means to use subset of the data to accommodate RoIs They are lightweight accessors to a subset of containers Trigger algorithms see only relevant data and produce output in their own contexts For offline or full event algorithms, event view covers whole containers S. Kama, CHEP 2015 10 EV1 EV2 EV3 Whiteboard Track Seeds EV1 EV2 EV3 Clone Removed seeds EV1 EV2 EV3 Tracks Vertices ⁞ ⁞ Three Event Views RoI1 RoI2 RoI3 Alg A Alg B Alg C

11 Scheduler Decides which algorithm to be scheduled next by looking at the presence of required input in Whiteboard Handles multiple events, event views and Trigger Chains Keeps track of the RoI and Event context Schedules algorithms when dependencies appear in the whiteboard Keeps resources (threads) occupied by scheduling algorithms from same event or different events S. Kama, CHEP 2015 11 Incidents are treated as schedulable updates Multiple dependency and control flow graphs are managed with multiple events

12 Migration Constraints Advanced properties of the framework limits available developer pool Most of the developers of the new framework needs to maintain the existing framework during Run2 Due to sheer amount of existing code, old interfaces are needed be preserved whenever possible Modifications to existing tools and algorithms should be towards removing the behavior prohibited by new design (i.e. Cleaning up anti-patterns) If needed, new interfaces should be introduced without removing existing functionality Both trigger and offline use-cases has to be supported Prototyping is essential Gaudi-Hive is considered as a starting point to minimize the code change in existing framework and effort needed to implement the requirements (see C. Leggett’s talk) S. Kama, CHEP 2015 12

13 Other requirements Co-processor support is desirable Framework should be able to utilize off- CPU/Hybrid resources if they become feasible Algorithms are agnostic to where the data comes from. I/O should be configurable and support Parallel I/O Data access over network/disk/memory CPU or throughput optimized access S. Kama, CHEP 2015 13

14 Summary ATLAS FFREQ group investigated problem from different aspects Identified shortcomings and strong points of existing software Needed improvements for future architectures and LHC conditions Necessary effort to implement new framework Conclusions Multi-threaded multi-event processing is required Same framework should handle both offline and trigger use cases Scheduler needs to be smarter and more complex than serial case Whiteboard should handle multiple events on flight transparently to algorithms Findings are presented to collaboration Framework should be ready for Run3 Framework implementations should finish in 2017 Prototyping studies have started S. Kama, CHEP 2015 14

15 THANK YOU S. Kama, CHEP 2015 15


Download ppt "Requirements for a Next Generation Framework: ATLAS Experience S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett, D. Malon, G. Stewart, B."

Similar presentations


Ads by Google