Parallel Processors Todd Charlton Eric Uriostique.

Slides:



Advertisements
Similar presentations
Parallelism Lecture notes from MKP and S. Yalamanchili.
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Distributed Systems CS
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Computer Abstractions and Technology
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
CHAPTER 5 THREADS & MULTITHREADING 1. Single and Multithreaded Processes 2.
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
Contemporary Languages in Parallel Computing Raymond Hummel.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Alex Becker.  Multi-core is short for “multiple cores”  Advances in technology allow for several discrete cores on one chip  This however is not multi-CPU.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Basics and Architectures
GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.
RM2D Let’s write our FIRST basic SPIN program!. The Labs that follow in this Module are designed to teach the following; Turn an LED on – assigning I/O.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
GPU Architecture and Programming
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
Classic Model of Parallel Processing
Processor Architecture
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 4: Multithreaded Programming.
Data Parallelism Task Parallel Library (TPL) The use of lambdas Map-Reduce Pattern FEN 20141UCN Teknologi/act2learn.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Single Node Optimization Computational Astrophysics.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Multithreading The objectives of this chapter are: To understand the purpose of multithreading To describe Java's multithreading mechanism.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Concurrency and Performance Based on slides by Henri Casanova.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Conclusions on CS3014 David Gregg Department of Computer Science
CS203 – Advanced Computer Architecture
Chapter 4: Threads.
Chapter 4: Threads.
Running R in parallel — principles and practice
The University of Adelaide, School of Computer Science
Constructing a system with multiple computers or processors
Introduction to Parallelism.
EE 193: Parallel Computing
Multi-Processing in High Performance Computer Architecture:
Chapter 4 Multithreading programming
Chapter 4: Threads.
Chapter 4: Threads.
Parallel Processing Sharing the load.
for Network Processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Multithreaded Programming
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallelism and Amdahl's Law
Multithreading Why & How.
Chapter 4: Threads & Concurrency
CSC3050 – Computer Architecture
CS Introduction to Operating Systems
Presentation transcript:

Parallel Processors Todd Charlton Eric Uriostique

Current Technology Hard to find a single core processor anymore. Cell phones, Labtops, etc. Large systems can contain up to 512+ processors

The Motivation Divide and Conquer – Higher Throughput Lower Power Consumption P = CV 2 f

The Motivation We need more performance on same power budget. How? Remember: P = CV 2 f Scale voltage and frequency to 80% P = C *.8 2 [V] *.8 [f] This drops power by 50% Add additional core Result = 1.6x Speedup with same power

The Motivation How about reducing power consumption but keeping the same performance? Remember: P = CV 2 f Scale voltage and frequency by 50% P = C *.5 2 [V] *.5 [f] This drops power to 12.5% Add additional core Result = 25% of original power consumption with same performance

Amdahl’s Law “Speed-up is limited by amount of work that can be done in parallel” Credit: watermint.org

Ways To Parallelize 1. Multi-Threading: Multi-thread your application on one chip More elegant 2. Multi-Processing: Flash serial code to separate chips No worrying about scheduling!

Let’s Multi-Thread One Application: Counting maize pixels 2 Processors 4 Processors

Multi-Threading in µProcessors Spin Propeller Processor Multi-Thread on 8 cores One application run on 8 cores Uses it’s own high level language and a form of Assembly In CMU Cam4

Problems with Multi-Threading Steep learning curve Learning the Language Parallel Slowdown Lot of time to set up a new thread. If that thread does not have much work, not worth the overhead

Multi-Threading Libraries Cannot program serially to take advantage of Parallel Processing Intel’s Thread Building Blocks (TBB) OpenMP Boost and pthread All of these are libraries in C/C++

Multi-Processing: Beaglebone Processor 720 MHz ARM Cortex-A8 3D graphics accelerator ARM Cortex-M3 for power management 2x Programmable Realtime Unit RISC CPUs PRUs share memory space with A8

Shared Memory Space

Multi-Processing: Custom with Message Passing Designate a processor for each frequent tasks Send messages to "Boss" as necessary Since every processor's workload is minimal, slower and low power chips can be used Overall = Same system performance

Message Passing

Problems with Multi-Processing Shared Memory Space Boards like this are hard to find and configure Message Passing Can’t assume messages are received immediately

Recap Go parallel if you want: Higher Throughput Lower Power Two Ways: Multi-Threading – Spin Speed up one Application Multi-Processing – Beaglebone Do more tasks at same time Don’t forget Amdahl’s Law!

Questions