Scheduler overview status & issues A.Gheata GeantV CERN meeting Mar 16, 2015
Outlook Data structures and memory New vector balancing policy New basket management policy New prioritization policy Issues
GeantTrack_v Contiguous buffer GeantTrack GeantTrack_v 00 40 80 C0 00 fEvent fEvslot fParticle fPDG … fXpos fYpos fZpos fXdir fYdir fZdir Edep Pstep Snext Safety *fEventV *fEvslotV *fParticleV *fPDGV … *fXposV *fYposV *fZposV *fXdirV *fYdirV *fZdirV *fEdepV *fPstepV *fSnextV *fSafetyV vector 1 fEventV fEvslotV fParticle V fPDGV … fXPosV fYPosV SOA of fNtracks fZPosV … 192 bytes + 2*sizeof(VolumePath_t) fNtracks=10 padding=32 fSnextV fSafetyV fPathV fNextpathV vector 2 *fPath *fPathV sizeof(VolumePath_t) = 136 + 8*max_geom_depth Sizeof(GeantTrack_v) = 384 + fNtracks*sizeof(VolumePath_t) *fNextpath *fNextpathV
Size considerations Size of a GeantTrack_v depends on the track capacity and maximum geometry depth (15 for CMS) (256, 15) -> size = 176 kBytes (16, 15) -> size = 22 kBytes CMS tracks size for capacity of 256 tracks 4173 basket managers 2 baskets 2 track arrays per basket 4173*4*176 = 2.85 Gbytes (no baskets transported!!!) For 16 tracks per basket: 358 Mbytes Pending baskets can be cut off on memory threshold Smarter basket policy implemented
Memory threshold // Maximum user memory limit [MB] propagator->fMaxRes = 2000;
GeantBasket Elementary work unit for GeantV They currently only hold tracks that are physically inside a given logical volume Evolution: filter tracks by different criteria providing locality for any processing stage (when it’s worth) Input GeantTrack_v array, filled by the scheduler Output GeantTrack_v array, filled during transport Baskets have thread local access during transport, but concurrent access during scheduling (!) Recycled after re-basketizing to owner basket managers Mixed baskets: containing mixed tracks in different volumes Everything called with scalar interface To avoid overheads when vectorization is difficult or penalizing Input Scheduler multithreaded re-basketizing Scheduler re-basketizing Transport (single thread) Physics Output AddTrack
Basket managers One per logical volume (4173 in cms2015) 2 baskets each (4 GeantTrack_v arrays) Current: normal scheduling operations Priority: prioritize events 1 queue for concurrent replacement and recycling Dynamic track content threshold per basket, pushing to work queue Low basket flow-> small baskets, high flow-> large baskets At any moment at least 2+Nqueued baskets instantiated and held by one basket manager Plus a variable number of detached baskets being processed TGeoVolume Basket manager current priority Recycled baskets bounded queue
Vector size balancing GeantScheduler Volume current BM Ntotal = Nused + Nrecycle fThreshold* Concurrent track addition, garbage collection, collection of tracks from prioritized events Adjustable threshold, aiming for Nused = Nthreads Nused (per volume type) Transport queue
Monitoring vectorization === Thread 1: exiting === === Thread 3: exiting === === Percent of tracks transported in single track mode: 23.9395%
New basket allocation policy Monitor distribution of steps in volumes After initial total number of steps Sort by activity, sum-up bins to threshold (e.g. 90% of total steps) Activate basket managers, which will represent a fraction of the total Redo after every 4x previous number of steps
New basket allocation policy === Learning phase of 4000000 steps completed === Activated 528 volumes accounting for 90.0% of track steps * FixedShield102880: 708955 steps * HVQX8780: 462821 steps * ZDC_EMLayer9b00: 83838 steps * BeamTube22b780: 78748 steps * OQUA6780: 62597 steps * QuadInner3300: 56376 steps * ZDC_EMAbsorber9d00: 53672 steps * QuadOuter3700: 52155 steps * QuadCoil3680: 49086 steps * ZDC_EMFiber9e80: 41705 steps
New priority policy Monitor distribution of tracks per event Start prioritizing when #tracks is a fraction of the maximum reached (e.g. 1%) The remaining tracks collected by mixed baskets One mixed basket per thread Cutting short event tails no need for one priority basket per volume less basket fragmentation concurrency should get a bit better
Queue with new policy … Imported 340 tracks from events 10 to 10. Dispatched 20 baskets. ### Event 0 prioritized at 1 % threshold Event 0: 281480 tracks transported, max in flight 5187 = digitizing event 0 with 0 tracks => Importing event 11
Threads WorkloadManager::MainScheduler (1) Method run as separate thread Does queue monitoring, triggering actions WorkloadManager::GarbageCollectorThread (1) Woke up by MainScheduler every 50 iterations Garbage collects pending baskets for every basket manager WorkloadManager::MonitoringThread (1) Runs as a background thread, activating histograms on demand Work queue, memory, number of baskets per volume, concurrency, number of tracks in flight per event WorkloadManager::TransportTracks (Nworkers) Main basket transport method Calls also the re-basketizer (GeantScheduler::AddTracks)
Issues Basket management too memory hungry Pre-allocate everything policy not good for CMS We know that ~10% of volumes take >60% of transport Implement policy to create baskets only for these volumes – DONE Priority algorithm producing too fragmented baskets and taking too long to flush events Preempt starting of priority regime, independent of queue status - DONE E.g. when number of tracks in flight for one event is less than 5% of the maximum ever reached (cutting tails) Keep only prioritized tracks in the same mixed basket and reuse baskets without re-basketizing when population low.
Issues (2) Contention on re-basketizing high (specially with fat baskets) Read from many, write to one policy Highly optimized using atomics, but still… Amdahl watches Overheads in concurrent queues non negligible Currently scaling to ~6 threads Can be changed to reading concurrently the same basket, but writing to thread local one Packaging events per group of threads always possible Concurrency in the new scheduling approach to be closely monitored (VTune)