M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support.

Slides:



Advertisements
Similar presentations
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Advertisements

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Spark: Cluster Computing with Working Sets
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
1: Operating Systems Overview
©Brooks/Cole, 2003 Chapter 7 Operating Systems Dr. Barnawi.
Computer Organization and Architecture
INPUT/OUTPUT ORGANIZATION INTERRUPTS CS147 Summer 2001 Professor: Sin-Min Lee Presented by: Jing Chen.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Proxy Design Pattern Source: Design Patterns – Elements of Reusable Object- Oriented Software; Gamma, et. al.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
LC Software Workshop, May 2009, CERN P. Mato /CERN.
Requirements for a Next Generation Framework: ATLAS Experience S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett, D. Malon, G. Stewart, B.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Unit 2 Architectural Styles and Case Studies | Website for Students | VTU NOTES | QUESTION PAPERS | NEWS | RESULTS 1.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Processor Architecture
ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
The JANA Reconstruction Framework David Lawrence - JLab May 25, /25/101JANA - Lawrence - CLAS12 Software Workshop.
General requirements for BES III offline & EF selection software Weidong Li.
CS533 - Concepts of Operating Systems 1 Threads, Events, and Reactive Objects - Alan West.
Preliminary Ideas for a New Project Proposal.  Motivation  Vision  More details  Impact for Geant4  Project and Timeline P. Mato/CERN 2.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
M.Frank, CERN/LHCb Persistency Workshop, Dec, 2004 Distributed Databases in LHCb  Main databases in LHCb Online / Offline and their clients  The cross.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
October 19, 2010 David Lawrence JLab Oct. 19, 20101RootSpy -- CHEP10, Taipei -- David Lawrence, JLab Parallel Session 18: Software Engineering, Data Stores,
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
OPERATING SYSTEMS CS 3502 Fall 2017
Processes and threads.
Process Management Process Concept Why only the global variables?
Software Architecture in Practice
Distributed Processors
SOFTWARE DESIGN AND ARCHITECTURE
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
William Stallings Computer Organization and Architecture
SW Architecture SG meeting 22 July 1999 P. Mato, CERN
Ch 4. The Evolution of Analytic Scalability
Multithreaded Programming
Object Networks—ATLAS' Future Control Framework For Offline?
The LHCb High Level Trigger Software Framework
Overview of Workflows: Why Use Them?
Use Of GAUDI framework in Online Environment
Presentation transcript:

M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support

M.Frank CERN/LHCb Event Data Processing  Philosophical changes of future frameworks  Detector description  Miscellaneous application support

M.Frank CERN/LHCb Event Data Processing Frameworks for the Future  A journey through the past  The problems of current frameworks  Possible solution

M.Frank CERN/LHCb Digging in the past  Single structure for HEP event processing software for the last 50 years  Initialize  Loop over events  Loop over subroutine calls Simulation/reconstruction/analysis  Finalize  Object orientation (>1995) did not change anything except  Each of the above steps is represented by a loop over “processors”

M.Frank CERN/LHCb Digging in the past (2)  Why was this model so successful?  50 years is a long time!  Simple model  Intuitive model  (relatively) easy to manage/extend  Perfect fit to HEP  Particle collisions are independent  Analysis of “independent” events  Common approach to achieve scalability  Parallelization at the level of processes  More Events? => Submit more concurrent batch jobs

M.Frank CERN/LHCb method calls Call-and- return style RD.Schaffer ATLAS LCB workshop 1999, Marseille

M.Frank CERN/LHCb input module output module event module event intermediate data objects input module output module event control framework app module control BaBar and CDF ATLAS Object Network pipe-and-filter style RD.Schaffer ATLAS LCB workshop 1999, Marseille ~ Marlin

M.Frank CERN/LHCb input module output module event control framework app module control pipe-and-filter style + (in)visible Blackboard (TES) memory Blackboard (shared data) app module. Gaudi, Athena

M.Frank CERN/LHCb Lassi Tuura RD.Schaffer ATLAS LCB workshop 1999, Marseille “event” signals ? ! ? ! ! ? method invocation Independent reactive objects

M.Frank CERN/LHCb Current Framework Trends m Flexibility is a must: reuse of Frameworks  Different applications typically improve abstraction where necessary m Within one experiment, the same event processing framework is used online and offline:  Software based triggers are “in”: High level trigger apps  Reconstruction (-> Trigger verification)  Data Analysis  (Note: I leave simulation here out)

M.Frank CERN/LHCb Current Framework Trends  Frameworks tend to be shared between experiments  “The Framework”: Babar – CDF  Gaudi: LHCb, ATLAS, HARP, MINERVA, GLAST, BES3, (?)  DIRAC: Distributed Infrastructure with Remote Agent Control: LHCb, LCD, Belle (?)  Ganga: Gaudi / Athena and Grid Alliance LHCb, ATLAS, LCD (?) m Flexibility should not be neglected  Different applications typically improve abstraction where necessary

M.Frank CERN/LHCb All Current Frameworks  The architecture was done in the late 1990s  15 years ago:  Memory is cheap, CPU is expensive  Approach: one process one box  Feature size: 250 nm (PIII Katmai 1999)  Today: memory / node(core) is significant  Multiple core technologies  Feature size: 32 nm (hexacore/Westmere)  in 10 years  ??? Multi core technologies were not considered in the past Triggered by limit to clock speed

M.Frank CERN/LHCb Today’s Framework Problems  Problems are already present  Single execution thread per process  High resource demand per execution thread  Example:  Current LCG nodes are defined (mostly) by needs of ATLAS and CMS  => 4GB / core  dual-hexa core: 48 GB of memory for running 12 jobs (Without hyper-threading)  > ½ - ¾ of memory is read-only: 3*12=36 GB  ~12 GB actively used  lxplus: dual-quad core with hyper threads enabled 48 GB memory: 3 GB per hyper thread

M.Frank CERN/LHCb First Conclusions  Different apps, but same event processing framework  Get the online in early enough or enjoy the pain  Quite soon we cannot load the CPUs anymore  The one process/core model gets at it’s limits  But what to do?  Typical ideas apply parallelization at the sub-event processing level  Difficult unless clearly identified blocks  “Then I fit each track with a separate thread”  Does not work: active data used created/modified by threads must be independent => locking hell  If this simple approach would really be feasible, we would see examples by now

M.Frank CERN/LHCb Considerations for new frameworks  If multi threading should work, each thread must do a sizable amount of work  Otherwise management overhead is large  Threads must be aware, that altering input data may cause trouble to other threads  Every piece of data has exactly 1 writer  Either lock data or have a gentlemen’s agreement  Worker objects must be “const” if reusable  To be efficient, the same worker may be used by different threads at the same time  Use case: time consuming, but indivisible processing steps  If there is an existing code base re-use of existing user code (processors/drivers) is a must  User code must be agnostic to framework changes  Forget acceptance otherwise

M.Frank CERN/LHCb Predictions  We will see a change of paradigm  Parallelism inside one process will have to be applied  The resource usage will simply require it  Memory  Number of concurrently open files cause severe limits to castor / xrootd etc.  Number of concurrent database connections e.g. Oracle => Let’s have a look to the resources (memory)

M.Frank CERN/LHCb Event Data Size – ttbar Reconstruction [200 events avg.] WITH OVERLAY / WITHOUT OVERLAY TrackerHit LCGenericObject LCRelation Track Vertex SimCalorimeterHit SimTrackerHit MCParticle Cluster CalorimeterHit ReconstructedPart

M.Frank CERN/LHCb Size of Marlin without data

M.Frank CERN/LHCb Size of Marlin without data

M.Frank CERN/LHCb Comparison LHCb HLT vs. Marlin Rec.  HLT ~ 500 MB larger  Probably more fine grained  Alignment constants  Geometry description  Magnetic field map  Probably more code  O( MB) ??  Not unreasonable to assume that SLD code will grow to min. the same amount The HLT uses forking to reuse read-only memory sections: 20*1.8 GB = 34 GB > 24 GB Machine memory still a total 14 GB of free memory left saved 20+X GB (more than 60 %) threading can save even more

M.Frank CERN/LHCb The Vision  Make use of current CPU technology  Try to save hardware resources  => Usage of multiple threads ~ 1-2 thread per hardware thread Pipelined data processing

M.Frank CERN/LHCb Pipelined data processing  Idea from data flows on hardware boards  Connected processing elements (e,g, FPGAs)  If data from several input lines is present  process the input data  once finished: send data to the output line(s)  data may then be picked up by the next FPGA  Such models were also used to simulate DAQ models  Foresight: modeling ALICE DAQ, parts of LHCb HLT  Alchemy: LHCb DAQ models

M.Frank CERN/LHCb Pipelined Data Processing (1) T0T0 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1 Time InputOutputProcessing = “Clock cycles”

M.Frank CERN/LHCb Pipelined Data Processing (2) T0T0 T1T1  1 st clock cycle  Start processing  Input first event

M.Frank CERN/LHCb Pipelined Data Processing (3)  2 nd clock cycle  2 separate threads  Parallel execution of  first Processor  Input module T0T0 T2T2 T1T1

M.Frank CERN/LHCb Pipelined Data Processing (4) T0T0 T3T3 T2T2 T1T1  3 rd clock cycle  3 separate threads  Parallel execution of  second Processor  first Processor  Input module  Each line handles it’s own event  Each line represents a thread

M.Frank CERN/LHCb Pipelined Data Processing (5) T0T0 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1  Filling up threads up to some configurable limit

M.Frank CERN/LHCb Pipelined Data Processing (6) T0T0 T8T8 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1  Output module & cleanup  Rest: Processing events

M.Frank CERN/LHCb Pipelined Data Processing (7) T0T0 T8T8 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1 T9T9 X

M.Frank CERN/LHCb Pipelined Data Processing (8) T0T0 T8T8 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1 T9T9 T 10 T 11 T 12 X X X

M.Frank CERN/LHCb Processor List Processor Dependencies  Dependencies may be defined as required data content of a given event Input e.g. from file Output 1 Histogramming 3 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 TimeEvent Data Content required as input to Processor Processors with same rank can be executed in parallel

M.Frank CERN/LHCb Practical Consequence T0T0 T7T7 T6T6 T5T5 T4T4 T3T3 T2T2 T1T1  Can keep more threads simultaneously busy  Hence:  Less events in memory  Better resource usage  Example  First massage raw data for each subdetector (parallel)  Then fit track…

M.Frank CERN/LHCb Parallel Processing  There are 2 parallelization concepts  Event parallelization simultaneous processing of multiple events  Processor parallelization for a given event simultaneous execution of multiple processors  Both concepts may coexist  This was never tried before

M.Frank CERN/LHCb Requirement  Such an approach makes no sense if infrastructure size << event data size in memory  There must be a benefit in reusing the process infrastructure  Otherwise simpler run more processes  LCD  Assume worst case is reconstruction with overlay: event data size / infrastructure ~ 1  But infrastructure is unrealistic small

M.Frank CERN/LHCb Towards a more concrete model  An abstract view of a Marlin Processor  Self organization issues  I/O modeling  Threading

M.Frank CERN/LHCb Output Collection A (Marlin) Processor Processor Interface: - initialize - process event - process run - finalize Input Collection Output Collection Parameter Processor Processor 1 Processor 2 Processor 3 A processor chain is also a processor Process self-configuration given a set of processors, then -- input data and -- output data allow to define the execution sequence:

M.Frank CERN/LHCb Pipelined Data Processing: Configuration and Processor Ranks  Start with a sea of processors  e.g. from some inventory  possibly with some configuration hints Input Module In Out Processor 2 In Out Processor 1 In Out Processor 3 Histogramm 1 In Out …..

M.Frank CERN/LHCb Pipelined Data Processing: Configuration and Processor Ranks  Then combine them according to required inputs/outputs  Input/Outputs define dependencies => solve them Input Module In Out Processor 2 In Out Processor 1 Histogramm 1 In Out Processor 3 …

M.Frank CERN/LHCb … Then model input/outputs as AND gates Dataflow Manager Processor … … InputProcessor Executor (Wrapper) Input port Output port (multiple instances)

M.Frank CERN/LHCb Concept: Dataflow Manager  Dataflow manager  Knows of each Executor  Input data (mandatory)  Output data  Whenever new data arrives  Evaluate executor fitting the new data content  Gives executor to worker thread Dataflow Manager … InputProcessor

M.Frank CERN/LHCb And sequence the execution units Dataflow Manager … InputProcessor Executor I/O Executor I/O Executor I/O

M.Frank CERN/LHCb Concept: Executor, Worker  Executor: Formal workload to be given to a worker thread e.g. wrapper to a Marlin Processor  To schedule an executor  acquire worker from idle queue  attach executor to worker  start worker  Once finished  put worker back to idle queue  Executor back to “sea”  Check work queue for rescheduling Dataflow Manager (Scheduler) Worker Idle queue Busy queue Worker Executor Worker Executor Waiting work

M.Frank CERN/LHCb Flexibility  Can the implementation of such an a processing framework be applied to existing code?  depends…  Processors and framework components must be able to deal with several events in parallel  e.g. Single “blackboard” would not do  The state of processor instances may not depend on the current event  The Processor chain must be divisible into “self-contained” units  Locking typically not supported by existing frameworks  Spaghetti code is a killer…  Otherwise: Yes; this can be applied to Marlin, Gaudi,…

M.Frank CERN/LHCb Example: A Gaudi “Processor” ConcreteAlgorithm EventDataService IDataProviderSvc DetectorDataService IDataProviderSvc HistogramService IHistogramSvc MessageService IMessageSvc IAlgorithmIProperty ObjectAObjectB Gaudi Application Framework

M.Frank CERN/LHCb … Can the model be applied elswhere ? IAlgorithm … Executor (Thread) Input port Output port Will depend on blackboards Will depend on framework services modify long living objects (histos,…) tools with event context !!! Difficult !!! Impossible if coupling constraints between algorithms exist [ in LHCb they do ]

M.Frank CERN/LHCb Conclusions  I did not address “implementation details” such as  Detector description  Interactivity  …  HEP data processing will face change of paradigm  A possible approach was shown for a multi threaded event processing framework  This approach a priori is not the realization of a framework  But: if such a model should be applied to existing frameworks certain constraints must be fulfilled

M.Frank CERN/LHCb Program of Work  Build a prototype  Test with dummy Processors  Apply to existing LCD reconstruction program  worst case: existing user code base  Measure performance

M.Frank CERN/LHCb Backup Slides

M.Frank CERN/LHCb Standard event processing  Sequential processing of single events  One execution thread  Effectively no parallelization (modulo sse2, if compiler clever)  Parallelization only possible at the level of multiple processes Processor for each event: Processor ….

M.Frank CERN/LHCb Gaudi, Athena

M.Frank CERN/LHCb WWZ Collection (from JJB) ETDCollection SimTrackerHit 21 EcalBarrelCollectionSimCalorimeterHit 122 EcalBarrelPreShowerCollectionSimCalorimeterHit 4 EcalEndcapCollection SimCalorimeterHit852 EcalEndcapPreShowerCollectionSimCalorimeterHit9 FTDCollection SimTrackerHit2 HcalBarrelRegCollection SimCalorimeterHit 469 HcalEndCapRingsCollection SimCalorimeterHit1 HcalEndCapsCollection SimCalorimeterHit6645 MCParticle 111 MuonBarrelCollection SimCalorimeterHit139 SETCollectionSimTrackerHit6 SITCollectionSimTrackerHit 11 TPCCollectionSimTrackerHit814 VXDCollectionSimTrackerHit26 Size on Disk: 3 MB

M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 1 Time Processor List Event Number Event Data Content

M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor Event Number Time Processor List Event Data Content

M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor Event Number Time Processor List Event Data Content

M.Frank CERN/LHCb Pipelined Data Processing Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 Event Number Time Processor List Event Data Content

M.Frank CERN/LHCb Pipelined Data Processing Different processing, Identical results Input e.g. from file Output 1 Histogramming 3 Histogramming 4 Processor 6 Processor 5 Histogramming 2 Histogramming 1 Processor 4 Processor 3 Processor 2 Processor 1 Input Histogramming 3 Output 1 Histogramming 4 Processor 6 Processor 5 Histogramming 1 Histogramming 2 Processor 4 Processor 3 Processor 2 Processor 1 Event Number Time Event Data Content