CS267-April 20th, 2010 Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center,

CS267-April 20th, 2010 Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center, LBL Space Sciences Laboratory, UCB with Chris Cantalupo, Ted Kisner, Radek Stompor, Rajesh Sudarsan and the BOOMERanG, MAXIMA, Planck, EBEX, PolarBear & other experimental collaborations

CS267-April 20th, 2010 The Cosmic Microwave Background About 400,000 years after the Big Bang, the expanding Universe cools through the ionization temperature of hydrogen: p + + e - => H. Without free electrons to scatter off, CMB photons free-stream to us today. COSMIC - filling all of space. MICROWAVE - redshifted by the expansion of the Universe from 3000K to 3K. BACKGROUND - primordial photons coming from “behind” all astrophysical sources.

CS267-April 20th, 2010 CMB Physics Drivers It is the earliest possible photon image of the Universe. Its existence supports a Big Bang over a Steady State cosmology (NP1). Tiny fluctuations in the CMB temperature (NP2) and polarization encode details of – cosmology geometry topology composition history – ultra-high energy physics fundamental forces beyond the standard model Inflation & the dark sector (NP3)

CS267-April 20th, 2010 The Concordance Cosmology Supernova Cosmology Project (1998): Cosmic Dynamics (   -  m ) BOOMERanG & MAXIMA (2000): Cosmic Geometry (   +  m ) 70% Dark Energy + 25% Dark Matter + 5% Baryons 95% Ignorance What (and why) is the Dark Universe ?

CS267-April 20th, 2010 1% of static on (untuned) TV Observing The CMB

CS267-April 20th, 2010 CMB Satellite Evolution

CS267-April 20th, 2010 The Planck Satellite The primary driver for HPC CMB work for the last decade. A joint ESA/NASA satellite mission performing a 2-year+ all-sky survey from L2. All-sky survey at 9 microwave frequencies from 30 to 857 GHz. The biggest data set to date: – O(10 12 ) observations – O(10 8 ) sky pixels – O(10 4 ) spectral multipoles

CS267-April 20th, 2010 Beyond Planck EBEX (1x Planck) - Antarctic long-duration balloon flight in 2012. PolarBear (10x Planck) - Atacama desert ground-based 2010-13. QUIET-II (100x Planck) - Atacama desert ground-based 2012-15. CMBpol (1000x Planck) - L2 satellite 2020-ish?

CS267-April 20th, 2010 CMB Data Analysis In principle very simple – Assume Guassianity and maximize the likelihood 1.of maps given the data and its noise statistics (analytic). 2.of power spectra given maps and their noise statistics (iterative). In practice very complex – Foregrounds, asymmetric beams, non-Gaussian noise, etc. – Algorithm & implementation scaling with evolution of Data volume HPC architecture

CS267-April 20th, 2010 The CMB Data Challenge Extracting fainter signals (polarization mode, angular resolution) from the data requires: – larger data volumes to provide higher signal-to-noise. – more complex analyses to remove fainter systematic effects. 1000x data increase over next 15 years – need to continue to scale on the bleeding edge through the next 10 M-foldings ! ExperimentDateTime SamplesSky PixelsGflop/Map COBE198910 9 10 3 1* BOOMERanG200010 9 10 5 10 3 WMAP200110 10 6 10 4 Planck200910 11 10 7 10 5 PolarBear201210 12 10 6 QUIET-II201510 13 10 6 10 7 CMBpol2020+10 14 10 8

CS267-April 20th, 2010 CMB Data Analysis Evolution Data volume & computational capability dictate analysis approach. DateDataSystemMapPower Spectrum 2000B98 Cray T3E x 700 Explicit Maximum Likelihood (Matrix Invert - N p 3 ) Explicit Maximum Likelihood (Matrix Cholesky + Tri-solve - N p 3 ) 2002B2K2 IBM SP3 x 3,000 Explicit Maximum Likelihood (Matrix Invert - N p 3 ) Explicit Maximum Likelihood (Matrix Invert + Multiply - N p 3 ) 2003- 7 Planck subsets IBM SP3 x 6,000 PCG Maximum Likelihood (FFT - N t log N t ) Monte Carlo (Sim + Map - many N t ) 2007+ Planck full EBEX Cray XT4 x 40,000 PCG Maximum Likelihood (FFT - N t log N t ) Monte Carlo (SimMap - many N t ) 2010+ Towards CMBpol Addressing the challenges of 1000x data & next 10 generations of HPC systems starting with Hopper, Blue Waters, etc.

CS267-April 20th, 2010 Scaling In Practice 2000: BOOMERanG-98 temperature map (10 8 samples, 10 5 pixels) calculated on 128 Cray T3E processors; 2005: A single-frequency Planck temperature map (10 10 samples, 10 8 pixels) calculated on 6000 IBM SP3 processors; 2008: EBEX temperature and polarization maps (10 11 samples, 10 6 pixels) calculated on 15360 Cray XT4 cores.

CS267-April 20th, 2010 Aside: HPC System Evaluation Scientific applications provide realistic benchmarks – Exercise all components of a system both individually and collectively. – Performance evaluation can be fed back into application codes. MADbench2 – Based on MADspec CMB power spectrum estimation code. – Full computational complexity (calculation, communication & I/O). – Scientific complexity removed reduces lines of code by 90%. runs on self-generated pseudo-data. – Used for NERSC-5 & -6 procurements. – First friendly-user Franklin system crash (90 minutes after access).

CS267-April 20th, 2010 MADbench2 I/O Evaluation IO performance comparison 6 HPC systems Read & write Unique & shared files Asynchronous IO experiment N bytes asynchronous read/write N  flops simultaneous work Measure time spent waiting on IO

CS267-April 20th, 2010 MADmap for Planck Map Making A massively parallel, highly optimized, PCG solver for maximum likelihood maps(s) given a time-stream of observations and their noise statistics 2005: First Planck-scale map – 75 billion observations mapped to 150 million pixels – First science code to use all 6,000 CPUs of Seaborg 2007: First full Planck map-set (FFP) – 750 billion observations mapped to 150 million pixels – Using 16,000 cores of Franklin – IO doesn’t scale write-dominated simulations read-dominated mappings May 14 th 2009: Planck launches!

CS267-April 20th, 2010 Planck First Light Survey

CS267-April 20th, 2010 Planck Sim/Map Target By the end of the Planck mission in 2013, we need to be able to simulate and map – O(10 4 ) realizations of the entire mission 74 detectors x 2.5 years ~ O(10 16 ) samples – On O(10 5 ) cores – In O(10) wall-clock hours WAIT ~ 1 day : COST ~ 10 6 CPU-hrs

CS267-April 20th, 2010 TARGET: 10 4 maps 9 freqs 2.5 years 10 5 cores 10 hours 12x217 FFP1 M3/ GCP CTP3 OTFS Peta- Scaling

CS267-April 20th, 2010 On-The-Fly Simulation Remove redundant & non-scaling IO from traditional simulate/write then read/map cycle. M3-enabled map-maker’s read TOD request translated into runtime (on-the- fly) simulation. Trades cycles & memory for disk & IO. Currently supports – Piecewise stationary noise with arbitrary spectra Truly independent random numbers – Symmetric & asymmetric beam-smoothed sky Can be combined with explicit TOD (e.g. systematic).

CS267-April 20th, 2010 Current Planck State-Of-The-Art : CTP3 1000 each Planck 1-year 1-frequency noise & signal. Noise sim/map runs – O(10 14 ) samples, 2TB disk (maps) – 2 hours on 20,000 cores

CS267-April 20th, 2010 MADmap Scaling Profile

CS267-April 20th, 2010 Next Generation HPC Systems Large, correlated, CMB data sets exercise all components of HPC systems: – Data volume => disk space, IO, floating point operations. – Data correlation => memory, communication. Different components scale differently over time and across HPC systems – Develop trade-offs and tune to system/concurrency. IO bottleneck has been ubiquitous – now largely solved by replacement with (re-)calculation. For all-sky surveys, communication bottleneck is the current challenge – Even in a perfect world, all-to-all communication won’t scale Challenges & opportunities of next-generation systems: – Multi- & many-core (memory-per-core fork) – GPUs/accelerators (heterogeneous systems)

CS267-April 20th, 2010 Ongoing Research Address emerging bottlenecks at very high concurrency in the age of many-core & accelerators: – Hopper – Blue Waters – NERSC-7 Communication: replace inter-core with inter-node communication. Calculation: re-write/tool/compile for GPUs et al. Auto-tuning: system- & analysis- specific run-time configuration.

CS267-April 20th, 2010 Conclusions Roughly 95% of the Universe is known to be unknown – the CMB provides a unique window onto the early years. The CMB data sets we gather and the HPC systems we analyze them on are both evolving. CMB data analysis is a long-term computationally-challenging problem requiring state-of-the-art HPC capabilities. The quality of the science derived from present and future CMB data sets will be determined by the limits on – our computational capability – our ability to exploit it CMB analysis codes can be powerful full-system evaluation tools. We’re always very interested in hiring good computational scientists to do very interesting computational science!

CS267-April 20th, 2010 Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center,

Similar presentations

Presentation on theme: "CS267-April 20th, 2010 Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS267-April 20th, 2010 Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center,

Similar presentations

Presentation on theme: "CS267-April 20th, 2010 Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center,"— Presentation transcript:

Similar presentations

About project

Feedback