A Summary of Short Course on Parallel Computing

A Summary of Short Course on Parallel Computing
Haibo Chen Fix this! Want just one URL Slides adapted from Jim Demmel

Where: Berkeley ParLab
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Where: Berkeley ParLab Krste Asanovic, Ras Bodik, Jim Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Edward Lee, Nelson Morgan, Dave Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Kathy Yelick

The Audience There are 261 registrants Where are you from?
112 on-site, 149 off-site registrants Where are you from? 45 companies 61 universities and research organizations Who are you? 32 software developers or engineers 24 faculty 132 students (undergrad, grad, postdoc) Other: architect / director / sysadmin / consultant / … 12/31/2018 Jim Demmel

Short Course Goals Teach the basics about parallelism
How to program, including hands-on lab Tools you can use now (simple and advanced) Tools we hope to build, and ongoing research

7 Dwarfs of High Performance Computing (HPC)
Some people might subdivide, some might combine them together Trying to stretch a computer architecture then you can do subcategories as well Graph Traversal = Probabilistic Models N-Body = Particle Methods, … Monte Carlo 5 5

A few sample CS267 Class Projects (all posters and video on web pages)
Content based image recognition “Find me other pictures of the person in this picture” Faster molecular dynamics, applied to Alzheimer’s Disease Better speech recognition through a faster “inference engine” Faster algorithms to tolerate errors in new genome sequencers Faster simulation of marine zooplankton population Sharing cell-phone bandwidth for faster transfers, better gaming experience Economic implications of parallel computing Amyloid beta peptide, how it chemically changes to form the A-Beta Monomers that are believed to cause Alzheimer’s disease New genome sequencing machines can be used to very quickly sequence The genomes of many patients, in the hopes of identifying genes that Contribute to cancer, heart disease, diabetes, hypertension etc, but They generate a lot of errors (side effect of speed), more than older Algorithms can handle Zooplankton bottom of food chain in ocean, sensitive indicators of How climate change can change distribution of larger ocean creatures, Like the ones we eat. Better cell phone bandwidth means better music downloads, better Video and picture uploads, better gaming experiences Which is the undergraduate project? Plan to freely offer 3 day “bootcamp” version of this course later this summer

Experimental Platform: NERSC Systems
Large-Scale Computing Systems Franklin (NERSC-5): Cray XT4 9,532 compute nodes; 38,128 cores ~25 Tflop/s on applications; 356 Tflop/s peak Hopper (NERSC-6): Cray XE6 6,384 compute nodes, 153,216 cores 120 Tflop/s on applications; 1.3 Pflop/s peak Clusters 140 Tflops total Carver IBM iDataplex cluster PDSF (HEP/NP) ~1K core cluster Magellan Cloud testbed GenePool (JGI) ~5K core cluster HPSS Archival Storage 40 PB capacity 4 Tape libraries 150 TB disk cache NERSC Global Filesystem (NGF) Uses IBM’s GPFS 1.5 PB capacity 5.5 GB/s of bandwidth Analytics Euclid (512 GB shared memory) Dirac GPU testbed (48 nodes) 110 TF: send to GLs

The TOP10 of the TOP500 Rank Site Manufacturer Computer Country Cores Rmax [Pflops] [MW] 1 RIKEN Advanced Institute for Computational Science Fujitsu K Computer SPARC64 VIIIfx 2.0GHz, Tofu Interconnect Japan 548,352 8.162 9.90 2 National SuperComputer Center in Tianjin NUDT Tianhe-1A NUDT TH MPP, Xeon 6C, NVidia, FT C China 186,368 2.566 4.04 3 Oak Ridge National Laboratory Cray Jaguar Cray XT5, HC 2.6 GHz USA 224,162 1.759 6.95 4 National Supercomputing Centre in Shenzhen Dawning Nebulae TC3600 Blade, Intel X5650, NVidia Tesla C2050 GPU 120,640 1.271 2.58 5 GSIC, Tokyo Institute of Technology NEC/HP TSUBAME-2 HP ProLiant, Xeon 6C, NVidia, Linux/Windows 73,278 1.192 1.40 6 DOE/NNSA/LANL/SNL Cielo Cray XE6, 8C 2.4 GHz 142,272 1.110 3.98 7 NASA/Ames Research Center/NAS SGI Pleiades SGI Altix ICE 8200EX/8400EX 111,104 1.088 4.10 8 DOE/SC/ LBNL/NERSC Hopper Cray XE6, 6C 2.1 GHz 153,408 1.054 2.91 9 Commissariat a l'Energie Atomique (CEA) Bull Tera 100 Bull bullx super-node S6010/S6030 France 1.050 4.59 10 DOE/NNSA/LANL IBM Roadrunner BladeCenter QS22/LS21 122,400 1.042 2.34 (AICS) missing in name of first entry Pleiades were #11 last time System configuration: computer cabinets, nodes - 1 node consists of 1 CPU - 8 cores/CPU - 6-D Mesh/Torus Interconnect SPARC64 VIIIfx CPU: - 8 cores/CPU GHz

Computational Research Division
Computational Science Nanoscience Combustion Climate Cosmology & Astrophysics Genomics Energy & Environment Applied Mathematics Computer Science Mathematical Models Adaptive Mesh Refinement Linear Algebra Libraries and Frameworks Interface Methods HPC architecture, OS, and compilers 512 256 128 64 32 16 8 4 2 1024 1/16 1 1/8 1/4 1/2 1/32 single-precision peak double-precision peak Device bandwidth RTM/wave eqn. NVIDIA C2050 (Fermi) DP add-only SpMV 7pt Stencil 27pt Stencil DGEMM GTC/chargei GTC/pushi Performance & Autotuning Visualization and Data Management Cloud, grid & distributed computing

Schedule(1/3) Monday, Aug 15 9-9:30 am – Introduction and Welcome
Jim Demmel (UCB) 9:30-12pm – Introduction to Parallel Architectures and Pthreads John Kubiatowicz (UCB) 12-1:15pm – Lunch (see web page for suggested venues) 1:15-2:15pm – Shared Memory Programming with OpenMP Tim Mattson (Intel) 2:15-3:00pm – Prototyping Parallel Code: Intel(R) Adviser Gary Carleton (Intel) 3:00-3:30pm – Break 3:30-4:30pm – Parallel Programming in the .NET Framework Igor Ostrovsky (Microsoft) 4:30-5:00pm – Break/Transition to HP Auditorium, 306 Soda Hall 5:00-6:00pm – Hands-on Lab (rooms announced in HP Auditorium) 6:00pm –Reception in Wozniak Lounge! Gary Carleton: EECS ‘74 Jim Demmel: EECS ‘83 PhD Bryan Catanzar: EECS ’11 PhD

Schedule (2/3) Tuesday, Aug 16
8:45-9:45am – Sources of Parallelism and Locality in Simulation Jim Demmel (UCB) 9:45-10:45am – Architecting parallel software with design patterns Kurt Keutzer (UCB) 10:45-11:15am – Break 11:15-12:15pm –Distributed Memory Programming in MPI and UPC Kathy Yelick (UCB and LBNL) 12:15-1:30pm – Lunch 1:30-2:30pm – GPU, CUDA, OpenCL Programming Bryan Catanzaro (NVIDIA Research) 2:30-3:00pm - Break / Transition to rooms in Soda Hall 3-6pm – Hands-on Lab Microsoft Tools Exercises UCB Exercises

Schedule (3/3) Wednesday, Aug 17
8:45-10:45am – Autotuning of Common Computational Patterns Jim Demmel (UCB) 10:45-11:15am – Break 11:15-12:15pm – Finding Deadlock Conditions: Intel(R) Inspector XE Gary Carleton (Intel) 12:15-1:30pm – Lunch 1:30-2:30pm – Cloud Computing Ben Hindman (UCB) 2:30–3:30pm – Performance Tools David Skinner (UCB) 3:30-4:00pm - Break 4:00-5:00pm – Building Parallel Applications – Speech, Music, Health, Browser Gerald Friedland, David Wessel, Tony Keaveny, Ras Bodik (ICSI, UCB)

More expanded Information: Course – CS267
“Applications of Parallel Computing” Long version of this short course! see Taught every Spring All lectures on web (slides + video), freely available UC Berkeley, UC Merced, UC Santa Cruz, UC Davis in Spr09 To be provided nationwide by XSEDE starting Spr12 Google, I feel lucky

Thanks 12/31/2018 Jim Demmel

A Summary of Short Course on Parallel Computing

Similar presentations

Presentation on theme: "A Summary of Short Course on Parallel Computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Summary of Short Course on Parallel Computing

Similar presentations

Presentation on theme: "A Summary of Short Course on Parallel Computing"— Presentation transcript:

Similar presentations

About project

Feedback