18.337 Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

1 Computational models of the physical world Cortical bone Trabecular bone.
Parallel computer architecture classification
1. Microprocessor. mp mp vs. CPU Intel family of mp General purpose mp Single chip mp Bit slice mp.
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Introduction CS 524 – High-Performance Computing.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Prince Sultan College For Woman
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
High-Performance Computing 12.1: Concurrent Processing.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Lecture 1 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
2015/10/14Part-I1 Introduction to Parallel Processing.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
CS591x -Cluster Computing and Parallel Programming
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive.
Outline Why this subject? What is High Performance Computing?
Parallel Computing’s Challenges. Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another.
ADDING FRACTIONS. Adding Fractions How to do…… 1.You have to get the bottoms (denominators) the same 2.To get the bottoms the same you find the biggest.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
University of Washington Today Quick review? Parallelism Wrap-up 
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
University of Washington 1 What is parallel processing? When can we execute things in parallel? Parallelism: Use extra resources to solve a problem faster.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
1 A simple parallel algorithm Adding n numbers in parallel.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
General Purpose computing on Graphics Processing Units
Conclusions on CS3014 David Gregg Department of Computer Science
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction Super-computing Tuesday
Parallel computer architecture classification
Parallel Processing - introduction
The University of Adelaide, School of Computer Science
Real-Time Ray Tracing Stefan Popov.
Introduction.
Introduction.
Chapter 1 Introduction.
Parallel Computing’s Challenges
Chapter 4 Multiprocessors
Vrije Universiteit Amsterdam
Multicore and GPU Programming
Lecture 20 Parallel Programming CSE /27/2019.
Multicore and GPU Programming
Presentation transcript:

Introduction

News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often 2 or 4 multicores sharing memory) –Clusters (often several, to tens, and many more servers not sharing memory)

Performance Single processor speeds for now no longer growing. Moore’s law still allows for more real estate per core (transistors double/nearly every two years) – People want performance but hard to get Slowdowns seen before speedups Flops (floating point ops / second) –Gigaflops (10 9 ), Teraflops (10 12 ),Petaflops(10 15 ) Compare matmul with matadd. What’s the difference?

Some historical machines

Earth Simulator was #1 now #30

Some interesting hardware Nvidia Cell Processor Sicortex – “Teraflops from Milliwatts”

Programming MPI: The Message Passing Interface –Low level “lowest common denominator” language that the world has stuck with for nearly 20 years –Can get performance, but can be a hindrance as well Some say that there are those that will pay for a 2x speedup, just make it easy Reality is that many want at least 10x and more for a qualitative difference in results People forget that serial performance can depend on many bottlenecks including time to memory Performance (and large problems) are the reason for parallel computing, but difficult to get the “ease of use” vs “performance” trade-off right.

Places to Look Best current news: – Huge Conference: – MIT Home grown software, now Interactive Supercomputing (Star-P for MATLAB®, Python, and R) –

Architecture Diagrams from Sam Berkeley Bottom Up Performance Engineering: Understanding Hardware’s implications on performance up to software Top Down: measuring software and tweaking sometimes aware and sometimes unaware of hardware

Want to delve into hard numerical algorithms Examples: –FFTs and Sparse Linear Algebra At the MIT level: –Potential “not quite right” question: How do you parallelize these operations? –Rather what issues arise and why is getting performance hard? Why is nxn matmul easy? Almost cliché? Comfort level in this class to delve in?

Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another parallel program –Now, …, make them work together!

SIMD SIMD (Single Instruction, Multiple Data) refers to parallel hardware that can execute the same instruction on multiple data. (Think the addition of two vectors. One add instruction applies to every element of the vector.) –Term was coined with one element per processor in mind, but with today’s deep memories and hefty processors, large chunks of the vectors would be added on one processor. –Term was coined with a broadcasting of an instruction in mind, hence the single instruction, but today’s machines are usually more flexible. –Term was coined with A+B and elementwise AxB in mind and so nobody really knows for sure if matmul or fft is SIMD or not, but these operations can certainly be built from SIMD operations. Today, it is not unusual to refer to a SIMD operation (sometimes but not always historically synonymous with Data Parallel Operations though this feels wrong to me) when the software appears to run “lock-step” with every processor executing the same instruction. –Usage: “I hear that machine is particularly fast when the program primarily consists of SIMD operations.” –Graphics processors such as NVIDEA seem to run fastest on SIMD type operations, but current research (and old research too) pushes the limits of SIMD.

SIMD summary SIMD (Single Instruction, Multiple Data) refers to parallel hardware that can execute the same instruction on multiple data. One may also refer to a SIMD operation (sometimes but not always historically synonymous with a Data Parallel Operation) when the software appears to run “lock-step” with every processor executing the same instructions.