Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Structure of Computer Systems
Parallel computer architecture classification
Parallelizing Audio Feature Extraction Using an Automatically-Partitioned Streaming Dataflow Language Eric Battenberg Mark Murphy CS 267, Spring 2008.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
What is a Distributed System? n From various textbooks: l “A distributed system is a collection of independent computers that appear to the users of the.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Multicore Systems CET306 Harry R. Erwin University of Sunderland.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
The fetch-execute cycle. 2 VCN – ICT Department 2013 A2 Computing RegisterMeaningPurpose PCProgram Counter keeps track of where to find the next instruction.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
Outline Why this subject? What is High Performance Computing?
Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Understanding Parallel Computers Parallel Processing EE 613.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
Classification of parallel computers Limitations of parallel processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Processor Level Parallelism 1
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Introduction to parallel programming
Parallel computer architecture classification
Parallel Processing - introduction
Parallel Computing Lecture
CS 147 – Parallel Processing
Flynn’s Classification Of Computer Architectures
Morgan Kaufmann Publishers
Challenges in Concurrent Computing
Multi-Processing in High Performance Computer Architecture:
MIMD Multiple instruction, multiple data
Chapter 17 Parallel Processing
Symmetric Multiprocessing (SMP)
Overview Parallel Processing Pipelining
AN INTRODUCTION ON PARALLEL PROCESSING
Part 2: Parallel Models (I)
Multicore and GPU Programming
Multicore and GPU Programming
Presentation transcript:

Multi-core

 What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models for parallel programming  Distributed memory model  Share memory model  Multi-core architectures  Steps in parallelization Jaruloj Chongstitvatana2Parallel Programming: Introduction

CONCURRENT  Two or more actions progress at the same time. PARALLEL  Two or more actions execute simultaneously. Jaruloj ChongstitvatanaParallel Programming: Introduction3 How to write a program that can execute correctly on two proccesors or more.

 Design considerations  Correctness  Efficiency  Simplicity  Scalability Jaruloj ChongstitvatanaParallel Programming: Introduction4

 Single Instruction Single Data: SISD  Single Instruction Multiple Data: SIMD  Multiple Instruction Single Data: MISD  Multiple Instruction Multiple Data: MIMD Jaruloj ChongstitvatanaParallel Programming: Introduction5

 No parallelism Jaruloj ChongstitvatanaParallel Programming: Introduction6 instructiondata

 Vector machine  Graphic processors Jaruloj ChongstitvatanaParallel Programming: Introduction7 instructiondata1data2data3

 No application (at this time) Jaruloj ChongstitvatanaParallel Programming: Introduction8 instruction2data1 instruction1instruction3

 Multi-core architectures Jaruloj ChongstitvatanaParallel Programming: Introduction9 instruction2data1data2data3 instruction1instruction3

SHARED MEMORY MODEL  Multiple processors connect to one shared memory.  All processors can access same memory locations. DISTRIBUTED MEMORY MODEL  Each processor connects to its private memory. Jaruloj ChongstitvatanaParallel Programming: Introduction10 memory bus processor bus processor memory processor memory processor memory

SHARED MEMORY  Bottleneck for memory access  No data transfer  Easy for share data DISTRIBUTED MEMORY  Better memory access  Data transfer from non- local memory  Easy for private data Jaruloj ChongstitvatanaParallel Programming: Introduction11

 AMD Multicore Opteron  Sun UltraSparc T1  IBM Cell Broadband Engine (CBE)  Intel Core2 Duo Jaruloj ChongstitvatanaParallel Programming: Introduction12

Jaruloj ChongstitvatanaParallel Programming: Introduction13 Source: C. Hughes &T. Hughes, Professional Multicore Programming: Design and Implementation for C++ Developers, Wiley, 2008.

Jaruloj ChongstitvatanaParallel Programming: Introduction14 Source: C. Hughes &T. Hughes, Professional Multicore Programming: Design and Implementation for C++ Developers, Wiley, 2008.

Jaruloj ChongstitvatanaParallel Programming: Introduction15 Source: C. Hughes &T. Hughes, Professional Multicore Programming: Design and Implementation for C++ Developers, Wiley, 2008.

Jaruloj ChongstitvatanaParallel Programming: Introduction16 Source: C. Hughes &T. Hughes, Professional Multicore Programming: Design and Implementation for C++ Developers, Wiley, 2008.

Jaruloj ChongstitvatanaParallel Programming: Introduction17 Source: C. Hughes &T. Hughes, Professional Multicore Programming: Design and Implementation for C++ Developers, Wiley, 2008.

Starting from sequential code  Identify possible concurrency  Design and implement  Test for correctness  Tuning for performance  May effect correctness  If tuning is impossible, consider redesign Jaruloj ChongstitvatanaParallel Programming: Introduction18

 Data race  Deadlock Notes:  Results might be slightly different for each run due to round-off error for different sequence of operations. Jaruloj ChongstitvatanaParallel Programming: Introduction19