Advanced Computer Networks Lecture 1 - Parallelization 1.

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Parallelism Lecture notes from MKP and S. Yalamanchili.
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Distributed Systems CS
SE-292 High Performance Computing
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Parallel Algorithms Lecture Notes. Motivation Programs face two perennial problems:: –Time: Run faster in solving a problem Example: speed up time needed.
Potential for parallel computers/parallel programming
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
Recap.
Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.
Parallel System Performance CS 524 – High-Performance Computing.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
The hybird approach to programming clusters of multi-core architetures.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Computer System Architectures Computer System Software
1 Parallel Computing Basics of Parallel Computers Shared Memory SMP / NUMA Architectures Message Passing Clusters.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Compiled by Maria Ramila Jimenez
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
The Truth About Parallel Computing: Fantasy versus Reality William M. Jones, PhD Computer Science Department Coastal Carolina University.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Balance Point The basis for the argument against “putting all your (speedup)
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Data Structures and Algorithms in Parallel Computing Lecture 8.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Parallel IO for Cluster Computing Tran, Van Hoai.
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Concurrency and Performance Based on slides by Henri Casanova.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
30-Sep-16COMP28112 Lecture 21 A few words about parallel computing Challenges/goals with distributed systems: Heterogeneity Openness Security Scalability.
Potential for parallel computers/parallel programming
A few words about parallel computing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Parallel Processing - introduction
6/16/2010 Parallel Performance Parallel Performance.
Introduction to Parallelism.
EE 193: Parallel Computing
Parallel Processing Sharing the load.
CSE8380 Parallel and Distributed Processing Presentation
Amdahl's law.
A few words about parallel computing
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Lecture 20 Parallel Programming CSE /27/2019.
Presentation transcript:

Advanced Computer Networks Lecture 1 - Parallelization 1

Scale increases complexity 2 Single-core machine Cluster Multicore server Large-scale distributed system Wide-area network More challenges True concurrency Network Message passing More failure modes (faulty nodes,...) Wide-area network Even more failure modes Incentives, laws,...

Parallelization The algorithm works fine on one core Can we make it faster on multiple cores? – Difficult - need to find something for the other cores to do – There are other sorting algorithms where this is much easier – Not all algorithms are equally parallelizable 3 void bubblesort(int nums[]) { boolean done = false; while (!done) { done = true; for (int i=1; i<nums.length; i++) { if (nums[i-1] > nums[i]) { swap(nums[i-1], nums[i]); done = false; }

Parallelization If we increase the number of processors, will the speed also increase? – Yes, but (in almost all cases) only up to a point 4 Numbers sorted per second Cores used Ideal Expected Speedup: Completion time with one core Completion time with n cores

Amdahl's law Usually, not all parts of the algorithm can be parallelized Let f be the fraction of the algorithm that can be parallelized, and let S i be the corresponding speedup Then 5 Time.... Parallel part Sequential parts Core #1 Core #2 Core #3 Core #1 Core #2 Core #3 Core #4 Core #5 Core #6

Amdahl's law We are given a sequential task which is split into four consecutive parts: P1, P2, P3 and P4 with the percentages of runtime being 11%, 18%, 23% and 48% respectively. Then we are told that P1 does not speed up, so S1 = 1, while P2 speeds up 5×, P3 speeds up 20×, and P4 speeds up 1.6×. New sequential running time is: 6

Amdahl's law Or a little less than 1 ⁄ 2 the original running time The overall speed boost is 1 / = 2.186, or a little more than double the original speed. 7

Is more parallelism always better? Increasing parallelism beyond a certain point can cause performance to decrease! – Example: Need to send a message to each core to tell it what to do. Messages back and forth 8 Numbers sorted per second Cores Ideal Expected Reality (often) Sweet spot

Parallelization What size of task should we assign to each core? Frequent coordination creates overhead – Need to send messages back and forth, wait for other cores... – Result: Cores spend most of their time communicating – Bad: Ask each core to sort three numbers – Good: Ask each core to sort a million numbers 9