High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive.

Slides:



Advertisements
Similar presentations
ISO 9126: Software Quality Internal Quality: Maintainability
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
1 Computational models of the physical world Cortical bone Trabecular bone.
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
CS 140: Models of parallel programming: Distributed memory and MPI.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Computer Systems Nat 4/5 Computing Science Types of Computer and Performance.
Parallelizing Audio Feature Extraction Using an Automatically-Partitioned Streaming Dataflow Language Eric Battenberg Mark Murphy CS 267, Spring 2008.
Types of Parallel Computers
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Parallel Computing Overview CS 524 – High-Performance Computing.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N BIPS Tuning Sparse Matrix Vector Multiplication for multi-core SMPs Samuel Williams 1,2, Richard.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Tuning Sparse Matrix Vector Multiplication for multi-core SMPs (paper to appear at SC07) Sam Williams
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Lecture 1: What is a Computer? Lecture for CPSC 2105 Computer Organization by Edward Bosworth, Ph.D.
SPMD: Single Program Multiple Data Streams
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
If Exascale by 2018, Really? Yes, if we want it, and here is how Laxmikant Kale.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
SCIDAC Town Hall Alan Edelman Massachusetts Institute of Technology Professor of Applied Mathematics Computer Science and AI Laboratories Interactive Supercomputing.
Computer Programming 2 Why do we study Java….. Java is Simple It has none of the following: operator overloading, header files, pre- processor, pointer.
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Parallel Computing With High Performance Computing Clusters (HPCs) By Jeremy Cathey.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Parallel Computing’s Challenges. Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
BIPS C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Tuning Sparse Matrix Vector Multiplication for multi-core SMPs (details in paper at SC07)
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.
Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.
Introduction to threads
Conclusions on CS3014 David Gregg Department of Computer Science
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter 4: Threads.
Introduction Super-computing Tuesday
The University of Adelaide, School of Computer Science
Nat 4/5 Computing Science Types of Computer and Performance
Three Questions To Ask About Clusters
Nat 4/5 Computing Science Types of Computer and Performance
Parallel Computers Today
Multi-Processing in High Performance Computer Architecture:
Introduction.
Introduction.
Chapter 4: Threads.
Chapter 4: Threads.
Hybrid Programming with OpenMP and MPI
Parallel Computing’s Challenges
EE 193: Parallel Computing
Chapter 4: Threads & Concurrency
Lecture 20 Parallel Programming CSE /27/2019.
EE 155 / Comp 122 Parallel Computing
Types of Parallel Computers
Presentation transcript:

High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive Supercomputing, Chief Science Officer)

Not said: many powerful computer owners prefer low profiles

Some historical machines

Earth Simulator was #1 now #30

Moore’s Law The number of people who point out that “Moore’s Law” is dead is doubling every year. Feb 2008: NSF requests $20M for "Science and Engineering Beyond Moore's Law" –Ten years out, Moore’s law itself may be dead Moore’s law has various forms and versions never stated by Moore but roughly doubling every 18 months-2 years –Number of transistors –Computational Power –Parallelism!  Still good for a while!  At Risk!

AMD Opteron quadcore 8350 Sept 2007 Eight core in 2009? 2.0?

Intel Clovertown and Dunnington Six Core: Later in 2008?

Sun Niagara 2 Crossbar Switch Fully Buffered DRAM 4MB Shared L2 (16 way) 42.7GB/s (read), 21.3 GB/s (write) 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 179 GB/s (fill) 90 GB/s (writethru) 4x128b FBDIMM memory controllers 1.4gHz 16 core in 2008?

Accelerators NVIDIA

Sicortex Teraflops from Milliwatts

Software Give me software leverage and a supercomputer, and I shall solve the world’s problems (apologies to) Archimedes

What’s wrong with this story? I can’t get my five year old son off my (serial) computer I have access to the world’s fastest machines and have nothing cool to show him!

Engineers and Scientists (The leading indicators) Mostly work in serial (still!) (Just like my 5 year old) Those working in parallel Go to conferences, show off speedups Software: MPI –(Message Passing Interface) –Really thought of as the only choice –Some say the assembler of parallel computing –Some say has allowed code to be portable –Others say has held back progress and performance

Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another parallel program –Now, …, make them work together!

Apples and Oranges A: row distributed array (or worse) B: column distributed array(or worse) C=A+B

MPI Performance vs PThreads Professional Performance Study by Sam Williams MPI(autotuned)Pthreads(autotuned)Naïve Single Thread Intel ClovertownAMD Opteron MPI may introduce speed bumps on current architectures

MPI Based Libraries Typical sentence: … we enjoy using parallel computing libraries such as Scalapack What else? … you know, such as scalapack And …? Well, there is scalapack (petsc, superlu, mumps, trilinos, …) Very few users, still many bugs, immature Highly Optimized Libraries? Yes and No

Natural Question may not be the most important How do I parallelize x? –First question many students ask –Answer often either one of Fairly obvious Very difficult –Can miss the true issues of high performance These days people are often good at exploiting locality for performance People are not very good about hiding communication and anticipating data movement to avoid bottlenecks People are not very good about interweaving multiple functions to make the best use of resources –Usually misses the issue of interoperability Will my program play nicely with your program? Will my program really run on your machine?

Real Computations have Dependencies (example FFT) Time wasted on the telephone

Modern Approaches Allow users to “wrap up” computations into nice packages often denoted threads Express dependencies among threads Threads need not be bound to a processor Not really new at all: see Arvind Dataflow etc Industry not yet caught up with the damage SPMD and MPI has done See Transactional Memories, Streaming Languages etc. Advantages Easier on Programmer More productivity Allows for autotuning Can Overlap Communication with Computation

LU Example

Software Give me software leverage and a supercomputer, and I shall solve the world’s problems (apologies to) Archimedes

New Standards for Quality of Computation Associative Law: (a+b)+c=a+(b+c) Not true in roundoff Mostly didn’t matter in serial Parallel computation reorganizes computation Lawyers get very upset!