This module created with support form NSF under grant # DUE 1141022 Module developed Fall 2014 by Apan Qasem Parallel Computing Fundamentals Course TBD.

Slides:



Advertisements
Similar presentations
Parallelism Lecture notes from MKP and S. Yalamanchili.
Advertisements

Distributed Systems CS
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
Computer Abstractions and Technology
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Introduction Companion slides for
Parallel Algorithms Lecture Notes. Motivation Programs face two perennial problems:: –Time: Run faster in solving a problem Example: speed up time needed.
CS153 Greg Morrisett. Quick overview of the MIPS instruction set.  We're going to be compiling to MIPS assembly language.  So you need to know how to.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
This module created with support form NSF under grant # DUE Module developed Spring 2013 By Wuxu Peng Parallel Performance – Basic Concepts.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
CS 284a, 4 November 1997 Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 4 November, 1997.
Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640 at Penn, Spring.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Compiler Construction
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
CS 470/570:Introduction to Parallel and Distributed Computing.
Parallel Programming in.NET Kevin Luty.  History of Parallelism  Benefits of Parallel Programming and Designs  What to Consider  Defining Types of.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
This module created with support form NSF under grant # DUE Module developed Spring 2013 by Apan Qasem Task Orchestration : Scheduling and Mapping.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
This module created with support form NSF CDER Early Adopter Program Module developed Fall 2014 by Apan Qasem Parallel Performance: Analysis and Evaluation.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Compiled by Maria Ramila Jimenez
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.
Concurrent and Distributed Programming Lecture 1 Introduction References: Slides by Mark Silberstein, 2011 “Intro to parallel computing” by Blaise Barney.
1 12/4/1435 h Lecture 2 Programs and Programming Languages.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Computer Science 320 Load Balancing. Behavior of Parallel Program Why do 3 threads take longer than two?
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This subroutine initializes a lock associated with the lock variable.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
University of Washington Today Quick review? Parallelism Wrap-up 
Concurrency and Performance Based on slides by Henri Casanova.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India 9 – 20 July 2012.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Concurrency Idea. 2 Concurrency idea Challenge –Print primes from 1 to Given –Ten-processor multiprocessor –One thread per processor Goal –Get ten-fold.
Concurrent and Distributed Programming Lecture 1 Introduction References: Slides by Mark Silberstein, 2011 “Intro to parallel computing” by Blaise Barney.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Measuring Performance II and Logic Design
The Art of Parallel Processing
Parallel Processing - introduction
Pattern Parallel Programming
X10: Performance and Productivity at Scale
Exploiting Parallelism
Morgan Kaufmann Publishers
Summary Background Introduction in algorithms and applications
Distributed Systems CS
Chapter 11: Alternative Architectures
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Programming with Shared Memory Specifying parallelism
Multicore and GPU Programming
Compiler Construction CS 606 Sohail Aslam Lecture 1.
Presentation transcript:

This module created with support form NSF under grant # DUE Module developed Fall 2014 by Apan Qasem Parallel Computing Fundamentals Course TBD Lecture TBD Term TBD

TXST TUES Module : # 16 MHz 25 MHz Full Speed Level 2 Cache instruction pipeline longer issue pipeline double speed arithmetic Why Study Parallel Computing? more responsibility on software Source : Scientific American 2005, “A Split at the Core” Extended by Apan Qasem

TXST TUES Module : # Why Study Parallel Computing? Parallelism is mainstream We are dedicating all of our future product development to multicore designs. … This is a sea change in computing Paul Otellini, President, Intel (2004) Requires fundamental change in almost all layers of abstraction Dirk Meyer CEO, AMD (2006) In future, all software will be parallel Andrew Chien, CTO, Intel (~2007)

TXST TUES Module : # Why Study Parallel Computing? Parallelism is Ubiquitous

TXST TUES Module : # Basic Computer Architecture image source: Gaddis, Starting our with C++

TXST TUES module # Program Execution Tuning Level int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; } int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; }.text.globl _main _main: LFB2: pushq%rbp LCFI0: movq%rsp, %rbp LCFI1: movl$17, -4(%rbp) movl$13, -8(%rbp) movl-8(%rbp), %eax addl-4(%rbp), %eax movl%eax, -12(%rbp) movl-12(%rbp), %eax leave ret.text.globl _main _main: LFB2: pushq%rbp LCFI0: movq%rsp, %rbp LCFI1: movl$17, -4(%rbp) movl$13, -8(%rbp) movl-8(%rbp), %eax addl-4(%rbp), %eax movl%eax, -12(%rbp) movl-12(%rbp), %eax leave ret compile code … … execute

TXST TUES module # Program Execution Tuning Level x = 17; y = 13; result = x + y; return result; Processor executes one instruction at a time* int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; } int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; } Instruction execution follows program order code compile

TXST TUES module # Program Execution Tuning Level x = 17; y = 13; result = x + y; return result; int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; } int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; } code compile Parallel The two assignment statements x = 17; and y = 13; will execute in parallel

TXST TUES module # Parallel Program Execution x = 17;y = 13; result = x + y; return result; int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; } int main() { int x, y, result; x = 17; y = 13; result = x + y; return result; } Cannot arbitrarily assign instructions to processors

TXST TUES Module : # Dependencies in Parallel Code If statement s i needs the value produced by statement s j then, s i is said to be dependent on s j All dependencies in the program must be preserved This means that if s i is dependent on s j then we need to ensure that s j completes execution before s i y = 17; … x = y + foo(); sisi sjsj responsibility lies with programmer (and software)

TXST TUES module # Running Quicksort in Parallel quickSort(values, pivotNew + 1, right); quickSort(values, left, pivotNew - 1); int quickSort(int values[], int left, int right) { if (left < right) { int pivot = (left + right)/2; int pivotNew = partition(values, left, right, pivot); quickSort(values, left, pivotNew - 1); quickSort(values, pivotNew + 1, right); } int quickSort(int values[], int left, int right) { if (left < right) { int pivot = (left + right)/2; int pivotNew = partition(values, left, right, pivot); quickSort(values, left, pivotNew - 1); quickSort(values, pivotNew + 1, right); }...partition()... To get benefit from parallelism want run “big chunks” of code in parallel Also need to balance the load

TXST TUES Module : # Parallel Programming Tools Most languages have extensions that support parallel programming A collection of APIs pthreads, OpenMP, MPI Java Threads Some APIs will perform some of the dependence checks for you Some languages specifically designed for parallel programming Cilk, Charm++, Chapel

TXST TUES Module : # Parallel Performance In the ideal case, the more processors we add to the system the higher the performance speedup If a sequential program runs in T seconds on one processor then the parallel program should run in T/N seconds on N processors In reality this almost never happens. Almost all parallel programs will have some parts that must run sequentially aka Amdahl’s Law Amount of speedup obtained is limited by the amount of parallelism available in the program Gene Amdahl image source : Wikipedia

TXST TUES module # max theoretical speedup max speedup in relation to number of processors Reality is often different!