Download presentation
Presentation is loading. Please wait.
Published byMeredith Russell Modified over 8 years ago
1
Vector Physics Models Soon Yung Jun US ASCR-HEP@Fermilab January 30, 2015
2
Contents Emerging technologies and challenges for HEP software Vector physics models Vision S.Y. Jun @US ASCR-HEP, 1/30/20152
3
Computing Challenges: The Tower of Babel? S.Y. Jun @US ASCR-HEP, 1/30/20153 Exascale Cloud BIG Data Chaos -> DiversificationOne model -> Many models
4
Rmax Cores TOP500 (Nov. 15, 2014) S.Y. Jun @US ASCR-HEP, 1/30/20154 Exascale: 2017 ± 2 No changes 2020 (?)
5
Q1: What is the fraction of software applications run on more than 10,000 or more cores? (Application Software Report) 1.0.1% 2.1% 3.5% 4.10% Is the1% important? Absolutely! Quantitative differences become Qualitative ones (Marx) S.Y. Jun @US ASCR-HEP, 1/30/20155 Battle of speed -> Fitness for Survival
6
The Other Corner of the Battle Field: Caching and Vector Challenges for high performance HEP computing –HEP (and most real world) applications are memory bounded –HTC + HPC Long time ago at Troy: Death of Heroes Two decays ago: Death of Vector in HEP Era of Pentium: Microprocessors rule the HEP computing S.Y. Jun @US ASCR-HEP, 1/30/20156 Microprocessor Achilius
7
Q2 What is the primary application running on WLGC 1.Reconstruction 2.Simulation 3.Analysis LHC simulation (Run1) –Several 10 7 volumes, 10 10 events –10 12 sec CPU time using 250,000 cores –60% of WLGC (expected to 65% in LHC Run2) Challenges for High-luminosity LHC –Need at least x5 computing power with more likely a flat budget Opportunities –New architectures, new applications S.Y. Jun @US ASCR-HEP, 1/30/20157
8
Hardware Side: Emerging Technologies Two categories –General purpose CPU (ARM, MIC-native) –Coprocessors (GPU, MIC-offload) New Metrics –GFlop/KWatt –Bandwidth/Latency (FLOPS is free) Even more questions –General purpose CPU and coprocessors will be merged –Mobile chips will rule the world (even top500) S.Y. Jun @US ASCR-HEP, 1/30/20158
9
Challenges to Software Developers S.Y. Jun @US ASCR-HEP, 1/30/20159 Evolution? Revolution? - Traditional? - Truly?
10
Demonstrators: Coprocessor (GPU/MIC) in HEP Lattice QCD Triggers Reconstruction and Analysis Simulation –accelerator –detector –physics –generator Most of them are memory bounded applications Hardware are far ahead (biased) than software (ecosystem) Example of collaborative efforts –GATE, GAP in RT, GeantV S.Y. Jun @US ASCR-HEP, 1/30/201510
11
General Purpose CPU (ARM/MIC) in HEP Power efficiency –More events/Watt in a big GRID –ARM: “89% less energy, 94% less space, and 63% less cost” –porting existing software packages (trigger, reconstruction, etc) Vector pipeline –geometry and particle navigation –physics processes and models (???) S.Y. Jun @US ASCR-HEP, 1/30/201511
12
Present to Future Challenges for high performance HEP computing –lack of scalable applications in HEP –no Moore’s law for software, algorithms and applications –overwhelmingly bias in favor of hardware should be rebalanced New strategies –mixture of two (CPU and coprocessors) – beginning of design –prioritizes the data-intensive operations to be executed by the accelerator –redesign kernels to keep the accelerators and processors busy Is this (hybrid with many-cores) right solution? S.Y. Jun @US ASCR-HEP, 1/30/201512
13
Assumption is Important (at least in Science) Sun’s angles are different at different latitudes Measured the circumference of earth (Eratosthenes,~BC200) Measured the height of the sky (Ancient Chinese) Two different thinking will follow different paths S.Y. Jun @US ASCR-HEP, 1/30/201513
14
S.Y. Jun @US ASCR-HEP, 1/30/201514
15
Vector Physics Model Assumption: particles are independent during tracking Vectorization of the density of collisions, ψ Vector strategies: data locality and instruction throughput –decomposition sequential tracking and regroup them by tasks –algorithmic vectorization –parallel data patterns S.Y. Jun @US ASCR-HEP, 1/30/201515
16
Q3 Typically, what fraction of all essential FORTRAN statements (legacy physics codes) is IF-THEN ? 1.10% 2.20% 3.35% 4.50% Conditional statements –implicit loops (do-while) –conditional coding (case) –optional coding (skip operations) Potential solutions –mask, shuffling, gather/scatter, pack-expand S.Y. Jun @US ASCR-HEP, 1/30/201516
17
Pre-requisite SIMD (single instruction, multiple data) pseudo random number generator Data layout: coalesced memory access on vector operands –SoA (struct of array) track (x,p,E,t,…)[i], order data arrays Data locality for the vector particles: share common data –particle type, geometry and material, physics process Vector operations –identical instruction on each component of the vector –scalar + vector = scalar (do not mix them) –no conditional branches, no data dependencies –replace un-vectorizable algorithms by alternatives S.Y. Jun @US ASCR-HEP, 1/30/201517
18
Target Basic components for physics kernels –free path analysis: sampling physics process and step length –collision analysis: energy loss, multiple scattering, secondary production Choices for vectorized physics models –tabulate physics (cross section calculation, final state sampling –fully vectorized arithmetic algorithms (auto, deep SIMD) Core techniques and patterns –conditional branches: mask –coalesced memory access: gather –composition and rejection: replaced by alias S.Y. Jun @US ASCR-HEP, 1/30/201518
19
Q4 What is the most used Monte Carlo techniques in Geant4 1.Inverse transform 2.Acceptance and rejection 3.Composition and rejection 4.Vegas algorithm Problem: conditional branches (do-while) – not vectorizable –loop counter is un-deterministic Use effectively vectorizable algorithms –shuffling (overhead may be significant) –inverse cumulative pdf method (potential bias) –Alias method (A.J Walker, 1974) S.Y. Jun @US ASCR-HEP, 1/30/201519
20
Sampling Secondary Particles: Alias Method (A.J.Walker) Replace composition and rejection methods (conditional branches – not vectorizable) Recast a cross section, f(x) to N equal probable events, each with likelihood 1/N=c Alias table –a[recipient]=donor –q[N] = non-alias probability Sampling x j : random u 1, u 2 –bin index: N x u 1 = i + –sample j = (q [i] < u 2 ) ? i : a[i] –x j = [ j + (1- ) (j+1)] x S.Y. Jun @US ASCR-HEP, 1/30/201520
21
Alias Method: Validation and Performance Differential cross sections of EM processes (ex:scatter angle) Sizable performance gains both in CPU and GPU S.Y. Jun @US ASCR-HEP, 1/30/201521
22
Coalesced Memory Access Sampling the step length and the associated physics process –cross section calculations on-the-fly (fully vectorizable, but may be costly) –Tabulated physics (gather operation for table look-ups, bandwidth limited) Rearrange data to enable contiguously ordered memory accesses Overhead?: reallocate data to the stack (< gain by vectorization) Sizable performance gains both in CPU and GPU S.Y. Jun @US ASCR-HEP, 1/30/201522
23
Portability (Template Approach): Scale, Vector, CUDA, MIC S.Y. Jun @US ASCR-HEP, 1/30/201523 Common interface for different architectures (Backend)
24
Plan Implement one fully vectorized EM physics model (Klein Nishina Compton) and test with GeantV by CHEP2015 –Vector, Scalar, CUDA –Performance evaluation and validation (tabulated and Geant4) Complete all EM physics –establish a backend schema for the vector physics package extend for hadron physics and explore other algorithms S.Y. Jun @US ASCR-HEP, 1/30/201524
25
Vision: Ancien Régime to Liberty S.Y. Jun @US ASCR-HEP, 1/30/201525 Failure of architecture-aware algorithms (and vice versa) is the dawn of new agies. Scientific advancement is not evolutionary, but rather is a “series of peaceful interludes punctuated by intellectually violent revolutions”, and in those revolutions “one conceptual world view is replaced by another” (Thomas Kuhn).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.