18.337 Introduction.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

1 Computational models of the physical world Cortical bone Trabecular bone.
Parallel computer architecture classification
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Introduction CS 524 – High-Performance Computing.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Prince Sultan College For Woman
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
Lecture 1 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
CS591x -Cluster Computing and Parallel Programming
Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta.
High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive.
Parallel Computing’s Challenges. Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another.
ADDING FRACTIONS. Adding Fractions How to do…… 1.You have to get the bottoms (denominators) the same 2.To get the bottoms the same you find the biggest.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
University of Washington Today Quick review? Parallelism Wrap-up 
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
The Limits of Volunteer Computing Dr. David P. Anderson University of California, Berkeley March 20, 2011.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
General Purpose computing on Graphics Processing Units
Conclusions on CS3014 David Gregg Department of Computer Science
Introduction to Parallel Processing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
September 2 Performance Read 3.1 through 3.4 for Tuesday
Introduction Super-computing Tuesday
Parallel computer architecture classification
Ramya Kandasamy CS 147 Section 3
Parallel Processing - introduction
What Exactly is Parallel Processing?
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Parallel Computing Lecture
The University of Adelaide, School of Computer Science
Modern Processor Design: Superscalar and Superpipelining
Real-Time Ray Tracing Stefan Popov.
Multi-Processing in High Performance Computer Architecture:
Introduction.
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
GPU Introduction: Uses, Architecture, and Programming Model
Chapter 1 Introduction.
Parallel Computing’s Challenges
GRAPHICAL USER INTERFACE GITAM GADTAULA. OVERVIEW What is Human Computer Interface (User Interface) principles of user interface design What makes a good.
GRAPHICAL USER INTERFACE GITAM GADTAULA KATHMANDU UNIVERSITY CLASS PRESENTATION.
Chapter 4 Multiprocessors
Vrije Universiteit Amsterdam
Multicore and GPU Programming
Lecture 20 Parallel Programming CSE /27/2019.
Multicore and GPU Programming
Presentation transcript:

18.337 Introduction

News you can use Hardware Multicore chips (2009: mostly 2 cores and 4 cores)(2010 hexacores,octocores)(2011 twelve cores) Servers (often many multicores sharing memory) Clusters (often several, to tens, and many more servers not sharing memory)

Performance Single processor speeds for now no longer growing. Moore’s law still allows for more real estate per core (transistors double/nearly every two years) http://www.intel.com/technology/mooreslaw/index.htm People want performance but hard to get Slowdowns seen before speedups Flops (floating point ops / second) Gigaflops (109), Teraflops (1012), Petaflops(1015) Compare matmul with matadd. What’s the difference?

Some historical machines

Earth Simulator was #1

Some interesting hardware Nvidia Cell Processor Sicortex – “Teraflops from Milliwatts” http://www.sicortex.com/products/sc648 http://www.gizmag.com/mit-cycling-human-powered-computation/8503/

Programming MPI: The Message Passing Interface Low level “lowest common denominator” language that the world has stuck with for nearly 20 years Can get performance, but can be a hindrance as well Some say that there are those that will pay for a 2x speedup, just make it easy Reality is that many want at least 10x and more for a qualitative difference in results People forget that serial performance can depend on many bottlenecks including time to memory Performance (and large problems) are the reason for parallel computing, but difficult to get the “ease of use” vs “performance” trade-off right.

Places to Look Best current news: Huge Conference: http://www.hpcwire.com/ Huge Conference: http://sc11.supercomputing.org/

Architecture Diagrams from Sam Williams (formerly) @ Berkeley Bottom Up Performance Engineering: Understanding Hardware’s implications on performance up to software Top Down: measuring software and tweaking sometimes aware and sometimes unaware of hardware

http://www.cs.berkeley.edu/~samw/research/talks/sc07.pdf

Want to delve into hard numerical algorithms Examples: FFTs and Sparse Linear Algebra At the MIT level: Potential “not quite right” question: How do you parallelize these operations? Rather what issues arise and why is getting performance hard? Why is nxn matmul easy? Almost cliché? Comfort level in this class to delve in?

Old Homework (emphasized for effect) Download a parallel program from somewhere. Make it work Download another parallel program Now, …, make them work together!

SIMD SIMD (Single Instruction, Multiple Data) refers to parallel hardware that can execute the same instruction on multiple data. (Think the addition of two vectors. One add instruction applies to every element of the vector.) Term was coined with one element per processor in mind, but with today’s deep memories and hefty processors, large chunks of the vectors would be added on one processor. Term was coined with a broadcasting of an instruction in mind, hence the single instruction, but today’s machines are usually more flexible. Term was coined with A+B and elementwise AxB in mind and so nobody really knows for sure if matmul or fft is SIMD or not, but these operations can certainly be built from SIMD operations.  Today, it is not unusual to refer to a SIMD operation (sometimes but not always historically synonymous with Data Parallel Operations though this feels wrong to me) when the software appears to run “lock-step” with every processor executing the same instruction. Usage: “I hear that machine is particularly fast when the program primarily consists of SIMD operations.” Graphics processors such as NVIDEA seem to run fastest on SIMD type operations, but current research (and old research too) pushes the limits of SIMD.

SIMD summary SIMD (Single Instruction, Multiple Data) refers to parallel hardware that can execute the same instruction on multiple data.  One may also refer to a SIMD operation (sometimes but not always historically synonymous with a Data Parallel Operation) when the software appears to run “lock-step” with every processor executing the same instructions.

The Cloud Problems with HPC systems not what you think Users wrote codes that nobody could use Systems hard to install The Interactive Supercomputing Experience What the cloud could do What are the limitations