Download presentation
Presentation is loading. Please wait.
Published byAlyssa Dee Modified over 9 years ago
1
Parallel Research at Illinois Parallel Everywhere www.parallel.illinois.edu
2
2 www.parallel.illinois.edu Parallel @ Illinois Illiac IV UPCRC Cloud Computing Testbed OpenSparc Center of Excellence CUDA Center of Excellence Extreme Scale Computing
3
3 www.parallel.illinois.edu Taxonomy Cloud: runs many loosely coupled queries or applications that often are IO intensive; focus is on dynamic provisioning, virtualization, cost reduction, etc. Costs are linear and fault tolerance is relatively easy Supercomputer: runs few tightly coupled, compute intensive applications. Focus is on algorithms, performance, communication, synchronization; fault tolerance is hard and cost is nonlinear Both can be humongous systems
4
4 www.parallel.illinois.edu FROM PETASCALE TO EXASCALE Talk focuses on supercomputers
5
5 www.parallel.illinois.edu Current Leader: Jaguar Cray XT5 at Oak Ridge Number of cores153,000 Peak performance1.645 petaflops System memory362 terabytes Disk space10.7 petabytes Disk bandwidth200+ gigabytes/second PowerMegawatts (Band)
6
6 www.parallel.illinois.edu Supercomputing in Two Years Blue Waters Computing System at Illinois System Attribute Vendor Processor Peak Performance (PF) Sustained Performance (PF) Number of Processor Cores Amount of Memory (PB) Amount of Disk Storage (PB) Amount of Archival Storage (PB) External Bandwidth (Gbps) * Reference petascale computing system (no accelerators). Blue Waters* IBM Power 7 >1.0 >200,000 >0.8 >10 >500 100-400 $200M Machine room designed for 20 MGW
7
7 www.parallel.illinois.edu Molecular ScienceWeather & Climate Forecasting Earth ScienceAstronomy Health Think of a Supercomputer as a Large Scientific Instrument to Observe the Unobservable
8
8 www.parallel.illinois.edu Performance growths 1,000-fold every 11 years
9
9 www.parallel.illinois.edu Exascale in 2020 Extrapolation of current technology – 100M- 1B threads: the compute power of each thread does not increase! – 100-500 MWatts
10
10 www.parallel.illinois.edu Main Issues Can we reduce power consumption? Can we use so many threads? Can we program them? Will the machine stay up? How do we deal with variability & jitter? How do we control & manage such a system? 10
11
11 www.parallel.illinois.edu Low Power Design Energy consumption might be reduced one order of magnitude with aggressive technology and architecture change But changes have a price – Low power cores -- more cores – Aggressive voltage scaling -- more errors – Aggressive DRAM redesign -- less bandwidth – Hybrid architecture -- hard to program This problem also affects large clouds, but to a lesser extent
12
12 www.parallel.illinois.edu Scaling Applications to 1B Threads Weak scaling: use more powerful machine to solve larger problem – increase application size and keep running time constant; e.g., refine grid – Larger problem may not be of interest – May want to scale time, not space (molecular dynamics) – not always easy to work in 4D – Cannot scale space without scaling time (iterative methods): granularity decreases and communication increases
13
13 www.parallel.illinois.edu Scaling Iterative Methods Assume that number of cores (and compute power) are increased by factor of k 4 Space and time scales are refined by factor of k Mesh size increased by factor of k×k×k Local cell dimension decreases by factor of k 1/4 Cell volume decreases by factor of k 3/4 while area decreases by factor of k 2/4 ; area to volume ratio (communication to computation ratio) increases by factor of k 3/2.
14
14 www.parallel.illinois.edu Debugging and Tuning: Observing 1B Threads Scalable infrastructure to control and instrument 1B threads Parallel information compression to identify “anomalies” Need to ability to express “normality” (global correctness and performance assertions)
15
15 www.parallel.illinois.edu Decreasing Mean Time to Failure Problem: – More transistors – Smaller transistors – Lower voltage – More manufacturing variance Current technology: global, synchronized checkpoint; HW error detection Future technology: – More HW fault detection – More efficient checkpoint (OS?, compiler? application?) – Fault resilient algorithms?
16
16 www.parallel.illinois.edu Handling Variability Current parallel programming models implicitly assume that all tasks progress at same rate. Future systems will have increased variability in rate of progress – HW variability: power management, manufacturing variability, error correction – System variability: Jitter – Application variability: dynamic, irregular computations. Dynamic load balancing overcomes variability, but increases communication and reduces locality
17
17 www.parallel.illinois.edu Controlling the System Observability: Can you probe state of any HW or SW component? Situational awareness: When cascading errors occur, how long before you find the root cause? System analytics: Can you mine logs to identify anomalies? This problem also affects large clouds, but to a lesser extent
18
18 www.parallel.illinois.edu Summary Supercomputing is fun (again): business as usual will not go far Supercomputing is essential for progress on key scientific and societal issues You might live thru the end of Moore’s law: this will be a fundamental shift in our discipline and our industry
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.