18.337 Introduction.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

1 Computational models of the physical world Cortical bone Trabecular bone.
Parallel computer architecture classification
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Prince Sultan College For Woman
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
High-Performance Computing 12.1: Concurrent Processing.
Lecture 1 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
CS591x -Cluster Computing and Parallel Programming
High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive.
Parallel Computing’s Challenges. Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another.
ADDING FRACTIONS. Adding Fractions How to do…… 1.You have to get the bottoms (denominators) the same 2.To get the bottoms the same you find the biggest.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
University of Washington Today Quick review? Parallelism Wrap-up 
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.
Introduction CSE 410, Spring 2005 Computer Systems
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
General Purpose computing on Graphics Processing Units
Conclusions on CS3014 David Gregg Department of Computer Science
Introduction to Parallel Processing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
September 2 Performance Read 3.1 through 3.4 for Tuesday
Introduction Super-computing Tuesday
Parallel computer architecture classification
Ramya Kandasamy CS 147 Section 3
Parallel Processing - introduction
What Exactly is Parallel Processing?
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Parallel Computing Lecture
The University of Adelaide, School of Computer Science
Modern Processor Design: Superscalar and Superpipelining
Real-Time Ray Tracing Stefan Popov.
Multi-Processing in High Performance Computer Architecture:
Introduction.
Summary Background Introduction in algorithms and applications
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
GPU Introduction: Uses, Architecture, and Programming Model
Chapter 1 Introduction.
Parallel Computing’s Challenges
By Brandon, Ben, and Lee Parallel Computing.
Chapter 4 Multiprocessors
Vrije Universiteit Amsterdam
Multicore and GPU Programming
Lecture 20 Parallel Programming CSE /27/2019.
Multicore and GPU Programming
Presentation transcript:

18.337 Introduction

News you can use Hardware Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) Servers (often 2 or 4 multicores sharing memory) Clusters (often several, to tens, and many more servers not sharing memory)

Performance Single processor speeds for now no longer growing. Moore’s law still allows for more real estate per core (transistors double/nearly every two years) http://www.intel.com/technology/mooreslaw/index.htm People want performance but hard to get Slowdowns seen before speedups Flops (floating point ops / second) Gigaflops (109), Teraflops (1012), Petaflops(1015) Compare matmul with matadd. What’s the difference?

Some historical machines

Earth Simulator was #1

Some interesting hardware Nvidia Cell Processor Sicortex – “Teraflops from Milliwatts” http://www.sicortex.com/products/sc648 http://www.gizmag.com/mit-cycling-human-powered-computation/8503/

Programming MPI: The Message Passing Interface Low level “lowest common denominator” language that the world has stuck with for nearly 20 years Can get performance, but can be a hindrance as well Some say that there are those that will pay for a 2x speedup, just make it easy Reality is that many want at least 10x and more for a qualitative difference in results People forget that serial performance can depend on many bottlenecks including time to memory Performance (and large problems) are the reason for parallel computing, but difficult to get the “ease of use” vs “performance” trade-off right.

Places to Look Best current news: Huge Conference: http://www.hpcwire.com/ Huge Conference: http://sc09.supercomputing.org/ MIT Home grown software, now Interactive Supercomputing (Star-P for MATLAB®, Python, and R) http://www.interactivesupercomputing.com

Architecture Diagrams from Sam Williams @ Berkeley Bottom Up Performance Engineering: Understanding Hardware’s implications on performance up to software Top Down: measuring software and tweaking sometimes aware and sometimes unaware of hardware

http://www.cs.berkeley.edu/~samw/research/talks/sc07.pdf

Want to delve into hard numerical algorithms Examples: FFTs and Sparse Linear Algebra At the MIT level: Potential “not quite right” question: How do you parallelize these operations? Rather what issues arise and why is getting performance hard? Why is nxn matmul easy? Almost cliché? Comfort level in this class to delve in?

Old Homework (emphasized for effect) Download a parallel program from somewhere. Make it work Download another parallel program Now, …, make them work together!

SIMD SIMD (Single Instruction, Multiple Data) refers to parallel hardware that can execute the same instruction on multiple data. (Think the addition of two vectors. One add instruction applies to every element of the vector.) Term was coined with one element per processor in mind, but with today’s deep memories and hefty processors, large chunks of the vectors would be added on one processor. Term was coined with a broadcasting of an instruction in mind, hence the single instruction, but today’s machines are usually more flexible. Term was coined with A+B and elementwise AxB in mind and so nobody really knows for sure if matmul or fft is SIMD or not, but these operations can certainly be built from SIMD operations.  Today, it is not unusual to refer to a SIMD operation (sometimes but not always historically synonymous with Data Parallel Operations though this feels wrong to me) when the software appears to run “lock-step” with every processor executing the same instruction. Usage: “I hear that machine is particularly fast when the program primarily consists of SIMD operations.” Graphics processors such as NVIDEA seem to run fastest on SIMD type operations, but current research (and old research too) pushes the limits of SIMD.

SIMD summary SIMD (Single Instruction, Multiple Data) refers to parallel hardware that can execute the same instruction on multiple data.  One may also refer to a SIMD operation (sometimes but not always historically synonymous with a Data Parallel Operation) when the software appears to run “lock-step” with every processor executing the same instructions.