Vector Prototype Status Philippe Canal (For VP team)

Slides:



Advertisements
Similar presentations
Geant4 v9.2p02 Speed up Makoto Asai (SLAC) Geant4 Tutorial Course.
Advertisements

A Block-structured Heap Simplifies Parallel GC Simon Marlow (Microsoft Research) Roshan James (U. Indiana) Tim Harris (Microsoft Research) Simon Peyton.
The MSC Process in Geant4
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Interrupts, Low Power Modes and Timer A (Chapters 6 & 8)
Chapter 2: Memory Management, Early Systems
Chapter 2: Memory Management, Early Systems
Memory Management, Early Systems
Ties Behnke, Vasiliy Morgunov 1SLAC simulation workshop, May 2003 Pflow in SNARK: the next steps Ties Behnke, SLAC and DESY; Vassilly Morgunov, DESY and.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 10: Collect and Analyze Performance Data.
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
U-Solids: new geometrical primitives library for Geant4 and ROOT Marek Gayer CERN Physics Department (PH) Group Software Development for Experiments (SFT)
Prototyping particle transport towards GEANT5 A. Gheata 27 November 2012 Fourth International Workshop for Future Challenges in Tracking and Trigger Concepts.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Status of the vector transport prototype Andrei Gheata 12/12/12.
Detector Simulation on Modern Processors Vectorization of Physics Models Philippe Canal, Soon Yung Jun (FNAL) John Apostolakis, Mihaly Novak, Sandro Wenzel.
ADAPTATIVE TRACK SCHEDULING TO OPTIMIZE CONCURRENCY AND VECTORIZATION IN GEANTV J Apostolakis, M Bandieramonte, G Bitzes, R Brun, P Canal, F Carminati,
17-19 Oct, 2007Geant4 Japan Oct, 2007Geant4 Japan Oct, 2007Geant4 Japan 2007 Geant4 Collaboration.
3D Viewers Two main uses: –Detector/event exploration – interactivity priority (15fps min). –Generate presentation material (still/movie renders) – quality.
1 Calorimeter in G4MICE Berkeley 10 Feb 2005 Rikard Sandström Geneva University.
New software library of geometrical primitives for modelling of solids used in Monte Carlo detector simulations Marek Gayer, John Apostolakis, Gabriele.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
LHCb production experience with Geant4 LCG Applications Area Meeting October F.Ranjard/ CERN.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Introduction What is detector simulation? A detector simulation program must provide the possibility of describing accurately an experimental setup (both.
VMC workshop1 Ideas for G4 navigation interface using ROOT geometry A.Gheata ALICE offline week, 30 May 05.
PARTICLE TRANSPORT REFLECTING ON THE NEXT STEP R.BRUN, F.CARMINATI, A.GHEATA 1.
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
The High Performance Simulation Project Status and short term plans 17 th April 2013 Federico Carminati.
STATUS OF THE UNIFIED SOLIDS LIBRARY Gabriele Cosmo/CERN Tatiana Nikitina/CERN.
U-Solids: new geometrical primitives library for Geant4 and ROOT Marek Gayer CERN Physics Department (PH) Group Software Development for Experiments (SFT)
GeantV scheduler, concurrency Andrei Gheata GeantV FNAL meeting Fermilab, October 20, 2014.
Ties Behnke: Event Reconstruction 1Arlington LC workshop, Jan 9-11, 2003 Event Reconstruction Event Reconstruction in the BRAHMS simulation framework:
Geant4 on GPU prototype Nicholas Henderson (Stanford Univ. / ICME)
Update on G5 prototype Andrei Gheata Computing Upgrade Weekly Meeting 26 June 2012.
Purpose of Operating System Part 2 Monil Adhikari.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
Andrei Gheata (CERN) for the GeantV development team G.Amadio (UNESP), A.Ananya (CERN), J.Apostolakis (CERN), A.Arora (CERN), M.Bandieramonte (CERN), A.Bhattacharyya.
GeantV code sprint report Fermilab 3-8 October 2015 A. Gheata.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
MONTE CARLO TRANSPORT SIMULATION Panda Computing Week 2012, Torino.
Report on Vector Prototype J.Apostolakis, R.Brun, F.Carminati, A. Gheata 10 September 2012.
GeantV – status and plan A. Gheata for the GeantV team.
GeantV fast simulation ideas and perspectives Andrei Gheata for the GeantV collaboration CERN, May 25-26, 2016.
GeantV prototype at a glance A.Gheata Simulation weekly meeting July 8, 2014.
Status of TFluka: geometry and validation Andrei Gheata ALICE Off-line week, 21 Feb
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
Scheduling fine grain workloads in GeantV A.Gheata Geant4 21 st Collaboration Meeting Ferrara, Italy September
GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future.
Scheduler overview status & issues
Memory Management.
lecture 5: CPU Scheduling
A task-based implementation for GeantV
GeantV – Parallelism, transport structure and overall performance
GeantV – Parallelism, transport structure and overall performance
Report on Vector Prototype
Main Memory Management
Process management Information maintained by OS for process management
Processor Management Damian Gordon.
Chapter 6: CPU Scheduling
Chapter 5: CPU Scheduling
Operating System Concepts
Chapter 6: CPU Scheduling
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Processor Management Damian Gordon.
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Vector Prototype Status Philippe Canal (For VP team)

Components Scheduler UGeom Physics Simulation 7/21/14Vector Prototype Status2

About Scheduler Data structures Baskets and basket management Basket managers (per LV) Track and basket lifecycle Transport (physics and geometry) and track phases Scheduler workflow 7/21/14Vector Prototype Status3

GeantTrack Track identifiers – event, slot (memory management), track ID, PDG, G5 code Particle identifiers – PDG, GeantV code, charge, mass, species Kinematics – position, direction, momentum, energy Status flags – status, N steps, N null steps, boundary flag, pending flag Geometry/physics context – process, proposed step, current step, edep, distance to boundary, safety, current path, next path 7/21/14Vector Prototype Status4

GeantTrack fEvent fEvslot fParticle fPDG … fXpos fYpos fZpos fXdir fYdir fZdir … Edep Pstep Snext Safety fEvent fEvslot fParticle fPDG … fXpos fYpos fZpos fXdir fYdir fZdir … Edep Pstep Snext Safety *fPath *fNextpath *fEventV *fEvslotV *fParticleV *fPDGV … *fXposV *fYposV *fZposV *fXdirV *fYdirV *fZdirV … *fEdepV *fPstepV *fSnextV *fSafetyV *fEventV *fEvslotV *fParticleV *fPDGV … *fXposV *fYposV *fZposV *fXdirV *fYdirV *fZdirV … *fEdepV *fPstepV *fSnextV *fSafetyV C0 00 *fPathV *fNextpath V fEventV fEvslotV fParticle V V fPDGV … … fXPosV fYPosV fZPosV fSnextV fSafetyV … … fPathV fNextpathV fNtracks=10 padding=32 vector 1 vector 2 GeantTrackPool GeantTrack GeantTrack_v SOA of fNtracks fBuffer 192 bytes TO FIX 7/21/14Vector Prototype Status5

Track_v operations (overhead) Pre-requirement to use vectorized: contiguity at the beginning of the arrays fEventVfParticleV … During transport, tracks stop leaving holes in the container Method(fXposV,…, fNtracks) or Method(GeantTrack_v &) fEventVfParticleV … fEventVfParticleV fEventVfParticleV fEventVfParticleV Use Compact Move A A A B A 7/21/14Vector Prototype Status6

Track_v operations (overhead) Track selection according some criteria fEventVfParticleV … Tracks have to be copied to a receiver during rescheduling fEventVfParticleV … fEventVfParticleV fEventVfParticleV Reshuffle Copy fEventVfParticleV … AA A B C Concurrency support 7/21/14Vector Prototype Status7

GeantBasket Elementary work unit for GeantV – They currently only hold tracks that are physically inside a given logical volume – Input GeantTrack_v array, filled by the scheduler – Output GeantTrack_v array, filled during transport Baskets have thread local access during transport, but concurrent access during scheduling Input Scheduler Transport Physics Output 7/21/14Vector Prototype Status8

Automatic basket scheduling Concurrent track addition, garbage collection, collection of tracks from prioritized events Adjustable threshold – T vol = N tracks_in_flight /2N threads rounded to %4 (min 4, max 256) Volume BM fThreshold * current empty Basket pool Transport queue GeantScheduler bottleneck 7/21/14Vector Prototype Status9

Basket lifecycle empty full Basket pool TGeoVolume Basket manager current Generator Scheduler 1…N volumes Transport queue Propagator transported recycle AddTrack priority AddTrack Push on threshold Push on garbage collection 1…N workers 7/21/14Vector Prototype Status10

Track lifecycle PhysicsSelect fProcessV[i] fPstepV[i] PropagateTracks Input tracks Output tracks kCrossing kExiting kPhysics kKilled (geom) PostStep (continuous) PostStep (continuous) fXposV[i], … fXdir[i], …, fPV[i], fEV[i] PostStep (discrete) PostStep (discrete) kNew kKilled(phys) kKilled(phys) 7/21/14Vector Prototype Status11

PropagateTracks kVector – continue in vector mode kSingle – call PropagateTracksSingle at the given stage kPostpone – copy remaining tracks to output MarkRemoved + Compact – compact holes and copy these tracks to the output PostponedAction kVector kSingle kPostpone ComputeTransport Length ComputeTransport Length FindNextBoundary AndStep FindNextBoundary AndStep vectorloop Propagate Neutrals Propagate Neutrals kCrossing kExiting kPhysics MarkRemoved Compact(output) MarkRemoved Compact(output) Propagate Safe<Pstep Propagate Safe<Pstep kPhysics Propagate Close to bound. Propagate Close to bound. kCrossing kExiting Propagate with safety Propagate with safety fSnextV[i], fSafetyV[i] stage0 stage1 stage2 7/21/14Vector Prototype Status12

Propagation to boundaries Safety-based approach algorithm very slow What is the step in magnetic field which shifts the final particle position with no more than epsilon with respect to linear propagation? – If proposed step within isotropic safety: use safety – Otherwise take into account only safe_step value in competition with distance to boundary and proposed step C =1/R ε = 1 micron safe_step = 2√ε/C 7/21/14Vector Prototype Status13

Track stages Imported Pending (threshol d) Queued for pickup Being transpor ted Queued to be dispatch ed Scheduled Basket manage r Transport queue Generator Basket transport Scheduler queue Scheduler dispatch Priority dispatch 7/21/14Vector Prototype Status14

Scheduler Pulls transported baskets, dispatches tracks to basket managers per volume – Not anymore! Applying policies to: – Provide work balancing (concurrency) – Keep memory under control – Keep the vectors up (most of the time) 7/21/14Vector Prototype Status15

Scheduler workflow Recycle transported baskets Event done? Digitize event ImportTracks Digitize event ImportTracks Last event done? EXIT Priority is ON? Y Y Last PE done? PE = prioritized event PE range = event number range for priority events Stop priority mode Y Y Queue flushed? Flush priority baskets Y Q size<min Adjust basket size Y Priority = ON PE range = (last,last+4) Priority = ON PE range = (last,last+4) Collect prioritized tracks (once) Empty Q? Garbage collect Y Check track counters Digitize transported events and Inject new events into released slot Priority mode: the scheduler puts all tracks from priority events to special baskets, injected them at every loop regardless the content Garbage collect mode when the queue is empty: inject every basket regardless the content 7/21/14Vector Prototype Status16

Monitoring Main bottleneck: GeantObjectPool::Borrow/Return 7/21/14Vector Prototype Status17

Performance 1000 events with 100 tracks each, measured on a 24- core dual socket E GHz (IVB). 7/21/14Vector Prototype Status18

Physics Simulation Strategy – Implement tabulated physics Backport to Geant4 as a single process (incorporating all implemented physics) Compare back ported Physics to regular G4 – Both physics performance and run-time performance Then compare VP with tabulated physics against G4 with tabulated physics – Implement vectorized physics Same scheme for verification 7/21/14Vector Prototype Status19

Vector Prototype Status20

Vector Prototype Status21

Vector Prototype Status22

Vector Prototype Status23

Vector Prototype Status24

Vector Prototype Status25

Vector Prototype Status26

Vector Prototype Status27

Vector Prototype Status28

Vector Prototype Status29

Vector Prototype Status30

Tabulated Physics Everything (except decay) is implemented both behind Geant4 (as a TotalPhysicsProcess) and behind VP Simple final state correction is implemented – scaling of the 3-momentum; of course not correct but we cannot do anything else for now. exampleN03 – exampleN03 can now be executed both by using the tabulated physics (default physics list TABPHYS) and FTFP BERT, FTFP BERT HP, QBBC physics lists. Physics list can be selected by -p flag at execution. – both production cuts(fixed to 1.0 [keV]) and tracking cuts are set (in energy) when exampleN03 is executed by using one of the original Geant4 physics lists – tracking cuts can be set by the -l flag at execution (both in case of G4 and TABPHYS physics lists) 7/21/14Vector Prototype Status31

Geant4+FTFP BERT vs Geant4+TABPHYS First results by using e − as primaries with energies of 30, 300, 3000, [MeV]: – production and tracking cuts are the same and set in energy – we don’t have range tables in the tabulated physics – linLossLimit is set to 100% in Geant4 – fluctuations and decay are switched off in Geant4 Energy grid of tabulation: – E p = 30, 300, [MeV]: 1000 bins between 1.0[keV ] − 3.0[GeV ] (logscale) and 10 final states – E p = 30 [GeV]: 100 bins between 1.0[keV ] − 1.0[TeV ] (logscale) and 5 final states 7/21/14Vector Prototype Status32

Next steps Further data will be generated by using Geant4 – switching on fluctuations – setting back linLossLimit to 1-2 % and so on to see these effects... – Decay Can start debugging the prototype! First by comparing these simple statistics generated by Geant4+TABPHYS and GeantV+TABPHYS 7/21/14Vector Prototype Status33

7/21/14Vector Prototype Status34

7/21/14Vector Prototype Status35

7/21/14Vector Prototype Status36

7/21/14Vector Prototype Status37

Geometry Vectorized Propagator implemented Merg(ing) with Usolids – Repository merged – Backward compatible interface Shapes – Box – Paraboloid – Trapezoid – Parallelepiped – Tube Coming soon – Hyperboloid – Polyhedra – Orb – TRD 7/21/14Vector Prototype Status38

7/21/14Vector Prototype Status39

Trapezoid 7/21/14Vector Prototype Status40

7/21/14Vector Prototype Status41

7/21/14Vector Prototype Status42

7/21/14Vector Prototype Status43

7/21/14Vector Prototype Status44

Connecting Geant-V and VecGeom Geant-V could already use VecGeom in serial mode Geant-V can now use VecGeom in vector mode – added missing pieces in Geant-V: some thread local data to provide workspace – completed and tested vector navigation functionality in VecGeom – connected the two 7/21/14Vector Prototype Status45

First glance at performance Did some initial “valgrind --tool=callgrind” benchmarks of Geant-V Used scheduler version with hard working scheduler thread ( on 4+1 threads ) – 200 events – Ex03 geometry – SSE instructions 7/21/14Vector Prototype Status46

Major cpu users 13 % log 13 % Geometry Navigation 5 % memcpy 7/21/14Vector Prototype Status47

Current influence of VecGeom on overall performance 7/21/14Vector Prototype Status48

Geometry Integration Next Steps Performance tuning of vector navigation – global to local transformation not yet optimal – a couple of other ideas comparison to tabulated Geant4 simulation gradually more complicated geometries ( we should now put tubes, traps,... ) 7/21/14Vector Prototype Status49

Summary Progress on all 3 parts in both: – Performance – Breadth Moving along toward silver bullet measurement 7/21/14Vector Prototype Status50