Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support.

Similar presentations


Presentation on theme: "M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support."— Presentation transcript:

1 M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support

2 M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support

3 M.Frank CERN/LHCb Event Data Processing Frameworks for the Future  A journey through the past  The problems of current frameworks  Possible solution

4 M.Frank CERN/LHCb Digging in the past  Single structure for HEP event processing software for the last 50 years  Initialize  Loop over events  Loop over subroutine calls Simulation/reconstruction/analysis  Finalize  Object orientation (>1995) did not change anything except  Each of the above steps is represented by a loop over “processors”

5 M.Frank CERN/LHCb Digging in the past (2)  Why was this model so successful?  50 years is a long time!  Simple model  Intuitive model  (relatively) easy to manage/extend  Perfect fit to HEP  Particle collisions are independent  Analysis of “independent” events  Common approach to achieve scalability  Parallelization at the level of processes  More Events? => Submit more concurrent batch jobs

6 M.Frank CERN/LHCb method calls Call-and- return style RD.Schaffer ATLAS LCB workshop 1999, Marseille

7 M.Frank CERN/LHCb input module output module event module event intermediate data objects input module output module event control framework app module control BaBar and CDF ATLAS Object Network pipe-and-filter style RD.Schaffer ATLAS LCB workshop 1999, Marseille ~ Marlin

8 M.Frank CERN/LHCb input module output module event control framework app module control pipe-and-filter style + (in)visible Blackboard (TES) memory Blackboard (shared data) app module. Gaudi, Athena

9 M.Frank CERN/LHCb Lassi Tuura RD.Schaffer ATLAS LCB workshop 1999, Marseille “event” signals ? ! ? ! ! ? method invocation Independent reactive objects

10 M.Frank CERN/LHCb Current Framework Trends m Flexibility is a must: reuse of Frameworks  Different applications typically improve abstraction where necessary m Within one experiment, the same event processing framework is used online and offline:  Software based triggers are “in”: High level trigger apps  Reconstruction (-> Trigger verification)  Data Analysis  (Note: I leave simulation here out)

11 M.Frank CERN/LHCb Current Framework Trends  Frameworks tend to be shared between experiments  “The Framework”: Babar – CDF  Gaudi: LHCb, ATLAS, HARP, MINERVA, GLAST, BES3, FIRST@GSI (?)  DIRAC: Distributed Infrastructure with Remote Agent Control: LHCb, LCD, Belle (?)  Ganga: Gaudi / Athena and Grid Alliance LHCb, ATLAS, LCD (?) m Flexibility should not be neglected  Different applications typically improve abstraction where necessary

12 M.Frank CERN/LHCb All Current Frameworks  The architecture was done in the late 1990s  15 years ago:  Memory is cheap, CPU is expensive  Approach: one process one box  Feature size: 250 nm (PIII Katmai 1999)  Today: memory / node(core) is significant  Multiple core technologies  Feature size: 32 nm (hexacore/Westmere)  in 10 years  ??? Multi core technologies were not considered in the past Triggered by limit to clock speed

13 M.Frank CERN/LHCb Today’s Framework Problems  Problems are already present  Single execution thread per process  High resource demand per execution thread  Example:  Current LCG nodes are defined (mostly) by needs of ATLAS and CMS  => 4GB / core  dual-hexa core: 48 GB of memory for running 12 jobs (Without hyper-threading)  > ½ - ¾ of memory is read-only: 3*12=36 GB  ~12 GB actively used  lxplus: dual-quad core with hyper threads enabled 48 GB memory: 3 GB per hyper thread

14 M.Frank CERN/LHCb First Conclusions  Different apps, but same event processing framework  Get the online in early enough or enjoy the pain  Quite soon we cannot load the CPUs anymore  The one process/core model gets at it’s limits  But what to do?  Typical ideas apply parallelization at the sub-event processing level  Difficult unless clearly identified blocks  “Then I fit each track with a separate thread”  Does not work: active data used created/modified by threads must be independent => locking hell  If this simple approach would really be feasible, we would see examples by now

15 M.Frank CERN/LHCb Considerations for new frameworks  If multi threading should work, each thread must do a sizable amount of work  Otherwise management overhead is large  Threads must be aware, that altering input data may cause trouble to other threads  Every piece of data has exactly 1 writer  Either lock data or have a gentlemen’s agreement  Worker objects must be “const” if reusable  To be efficient, the same worker may be used by different threads at the same time  Use case: time consuming, but indivisible processing steps  If there is an existing code base re-use of existing user code (processors/drivers) is a must  User code must be agnostic to framework changes  Forget acceptance otherwise

16 M.Frank CERN/LHCb Predictions  We will see a change of paradigm  Parallelism inside one process will have to be applied  The resource usage will simply require it  Memory  Number of concurrently open files cause severe limits to castor / xrootd etc.  Number of concurrent database connections e.g. Oracle => Let’s have a look to the resources (memory)

17 M.Frank CERN/LHCb Event Data Size – ttbar Reconstruction [200 events avg.] WITH OVERLAY / WITHOUT OVERLAY TrackerHit 595564 +- 89468 41318 +- 22396 LCGenericObject 64 +- 17 1 +- 1 LCRelation 62574 +- 8075 9929 +- 2112 Track 2977 +- 248 186 +- 56 Vertex 3 +- 2 2 +- 1 SimCalorimeterHit 110070 +- 10782 18821 +- 5109 SimTrackerHit 424304 +- 62335 24993 +- 14007 MCParticle 73441 +- 6963 4448 +- 1165 Cluster 797 +- 105 97 +- 26 CalorimeterHit 44952 +- 5958 9319 +- 1970 ReconstructedPart 1603 +- 176 110 +- 30

18 M.Frank CERN/LHCb Size of Marlin without data

19 M.Frank CERN/LHCb Size of Marlin without data

20 M.Frank CERN/LHCb Comparison LHCb HLT vs. Marlin Rec.  HLT ~ 500 MB larger  Probably more fine grained  Alignment constants  Geometry description  Magnetic field map  Probably more code  O(100-200 MB) ??  Not unreasonable to assume that SLD code will grow to min. the same amount The HLT uses forking to reuse read-only memory sections: 20*1.8 GB = 34 GB > 24 GB Machine memory still a total 14 GB of free memory left saved 20+X GB (more than 60 %) threading can save even more

21 M.Frank CERN/LHCb The Vision  Make use of current CPU technology  Try to save hardware resources  => Usage of multiple threads ~ 1-2 thread per hardware thread Pipelined data processing

22 M.Frank CERN/LHCb Pipelined data processing  Idea from data flows on hardware boards  Connected processing elements (e,g, FPGAs)  If data from several input lines is present  process the input data  once finished: send data to the output line(s)  data may then be picked up by the next FPGA  Such models were also used to simulate DAQ models  Foresight: modeling ALICE DAQ, parts of LHCb HLT  Alchemy: LHCb DAQ models

23 M.Frank CERN/LHCb Pipelined Data Processing (1) T0T0 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1 Time InputOutputProcessing = “Clock cycles”

24 M.Frank CERN/LHCb Pipelined Data Processing (2) T0T0 T1T1  1 st clock cycle  Start processing  Input first event

25 M.Frank CERN/LHCb Pipelined Data Processing (3)  2 nd clock cycle  2 separate threads  Parallel execution of  first Processor  Input module T0T0 T2T2 T1T1

26 M.Frank CERN/LHCb Pipelined Data Processing (4) T0T0 T3T3 T2T2 T1T1  3 rd clock cycle  3 separate threads  Parallel execution of  second Processor  first Processor  Input module  Each line handles it’s own event  Each line represents a thread

27 M.Frank CERN/LHCb Pipelined Data Processing (5) T0T0 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1  Filling up threads up to some configurable limit

28 M.Frank CERN/LHCb Pipelined Data Processing (6) T0T0 T8T8 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1  Output module & cleanup  Rest: Processing events

29 M.Frank CERN/LHCb Pipelined Data Processing (7) T0T0 T8T8 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1 T9T9 X

30 M.Frank CERN/LHCb Pipelined Data Processing (8) T0T0 T8T8 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1 T9T9 T 10 T 11 T 12 X X X

31 M.Frank CERN/LHCb Processor List Processor Dependencies  Dependencies may be defined as required data content of a given event Input e.g. from file Output 1 Histogramming 3 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 TimeEvent Data Content required as input to Processor Processors with same rank can be executed in parallel

32 M.Frank CERN/LHCb Practical Consequence T0T0 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1  Can keep more threads simultaneously busy  Hence:  Less events in memory  Better resource usage  Example  First massage raw data for each subdetector (parallel)  Then fit track…

33 M.Frank CERN/LHCb Parallel Processing  There are 2 parallelization concepts  Event parallelization simultaneous processing of multiple events  Processor parallelization for a given event simultaneous execution of multiple processors  Both concepts may coexist  This was never tried before

34 M.Frank CERN/LHCb Requirement  Such an approach makes no sense if infrastructure size << event data size in memory  There must be a benefit in reusing the process infrastructure  Otherwise simpler run more processes  LCD  Assume worst case is reconstruction with overlay: event data size / infrastructure ~ 1  But infrastructure is unrealistic small

35 M.Frank CERN/LHCb Towards a more concrete model  An abstract view of a Marlin Processor  Self organization issues  I/O modeling  Threading

36 M.Frank CERN/LHCb Output Collection A (Marlin) Processor Processor Interface: - initialize - process event - process run - finalize Input Collection Output Collection Parameter Processor Processor 1 Processor 2 Processor 3 A processor chain is also a processor Process self-configuration given a set of processors, then -- input data and -- output data allow to define the execution sequence:

37 M.Frank CERN/LHCb Pipelined Data Processing: Configuration and Processor Ranks  Start with a sea of processors  e.g. from some inventory  possibly with some configuration hints Input Module In Out Processor 2 In Out Processor 1 In Out Processor 3 Histogramm 1 In Out …..

38 M.Frank CERN/LHCb Pipelined Data Processing: Configuration and Processor Ranks  Then combine them according to required inputs/outputs  Input/Outputs define dependencies => solve them Input Module In Out Processor 2 In Out Processor 1 Histogramm 1 In Out Processor 3 ….. 1 2 3 5 4

39 M.Frank CERN/LHCb … Then model input/outputs as AND gates Dataflow Manager Processor … … InputProcessor Executor (Wrapper) Input port Output port (multiple instances)

40 M.Frank CERN/LHCb Concept: Dataflow Manager  Dataflow manager  Knows of each Executor  Input data (mandatory)  Output data  Whenever new data arrives  Evaluate executor fitting the new data content  Gives executor to worker thread Dataflow Manager … InputProcessor

41 M.Frank CERN/LHCb And sequence the execution units Dataflow Manager … InputProcessor Executor I/O Executor I/O Executor I/O

42 M.Frank CERN/LHCb Concept: Executor, Worker  Executor: Formal workload to be given to a worker thread e.g. wrapper to a Marlin Processor  To schedule an executor  acquire worker from idle queue  attach executor to worker  start worker  Once finished  put worker back to idle queue  Executor back to “sea”  Check work queue for rescheduling Dataflow Manager (Scheduler) Worker Idle queue Busy queue Worker Executor Worker Executor Waiting work

43 M.Frank CERN/LHCb Flexibility  Can the implementation of such an a processing framework be applied to existing code?  depends…  Processors and framework components must be able to deal with several events in parallel  e.g. Single “blackboard” would not do  The state of processor instances may not depend on the current event  The Processor chain must be divisible into “self-contained” units  Locking typically not supported by existing frameworks  Spaghetti code is a killer…  Otherwise: Yes; this can be applied to Marlin, Gaudi,…

44 M.Frank CERN/LHCb Example: A Gaudi “Processor” ConcreteAlgorithm EventDataService IDataProviderSvc DetectorDataService IDataProviderSvc HistogramService IHistogramSvc MessageService IMessageSvc IAlgorithmIProperty ObjectAObjectB Gaudi Application Framework

45 M.Frank CERN/LHCb … Can the model be applied elswhere ? IAlgorithm … Executor (Thread) Input port Output port Will depend on blackboards Will depend on framework services modify long living objects (histos,…) tools with event context !!! Difficult !!! Impossible if coupling constraints between algorithms exist [ in LHCb they do ]

46 M.Frank CERN/LHCb Conclusions  I did not address “implementation details” such as  Detector description  Interactivity  …  HEP data processing will face change of paradigm  A possible approach was shown for a multi threaded event processing framework  This approach a priori is not the realization of a framework  But: if such a model should be applied to existing frameworks certain constraints must be fulfilled

47 M.Frank CERN/LHCb Program of Work  Build a prototype  Test with dummy Processors  Apply to existing LCD reconstruction program  worst case: existing user code base  Measure performance

48 M.Frank CERN/LHCb Backup Slides

49 M.Frank CERN/LHCb Standard event processing  Sequential processing of single events  One execution thread  Effectively no parallelization (modulo sse2, if compiler clever)  Parallelization only possible at the level of multiple processes Processor for each event: Processor ….

50 M.Frank CERN/LHCb Gaudi, Athena

51 M.Frank CERN/LHCb WWZ Collection (from JJB) ETDCollection SimTrackerHit 21 EcalBarrelCollectionSimCalorimeterHit 122 EcalBarrelPreShowerCollectionSimCalorimeterHit 4 EcalEndcapCollection SimCalorimeterHit852 EcalEndcapPreShowerCollectionSimCalorimeterHit9 FTDCollection SimTrackerHit2 HcalBarrelRegCollection SimCalorimeterHit 469 HcalEndCapRingsCollection SimCalorimeterHit1 HcalEndCapsCollection SimCalorimeterHit6645 MCParticle 111 MuonBarrelCollection SimCalorimeterHit139 SETCollectionSimTrackerHit6 SITCollectionSimTrackerHit 11 TPCCollectionSimTrackerHit814 VXDCollectionSimTrackerHit26 Size on Disk: 3 MB

52 M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 1 Time Processor List Event Number Event Data Content

53 M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 2 1 Event Number Time Processor List Event Data Content

54 M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 6 1 2 3 4 5 Event Number Time Processor List Event Data Content

55 M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 Event Number Time Processor List Event Data Content

56 M.Frank CERN/LHCb Pipelined Data Processing Different processing, Identical results Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 Input Histogramming 3 Output 1 Histogramming 4 Processor 6 Processor 5 Histogramming 1 Histogramming 2 Processor 4 Processor 3 Processor 2 Processor 1 Event Number Time Event Data Content


Download ppt "M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support."

Similar presentations


Ads by Google