RAM, PRAM, and LogP models

Slides:



Advertisements
Similar presentations
Parallel Algorithms.
Advertisements

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Instructor Neelima Gupta Table of Contents Parallel Algorithms.
Parallel vs Sequential Algorithms
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Models of Parallel Computation
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
2. Computational Models 1 MODELS OF COMPUTATION (Chapter 2) Models –An abstract description of a real world entity –Attempts to capture the essential features.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CS 240A: Complexity Measures for Parallel Computation.
Parallel Computers 1 The PRAM Model for Parallel Computation (Chapter 2) References:[2, Akl, Ch 2], [3, Quinn, Ch 2], from references listed for Chapter.
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
Models In Parallel Computation It is difficult to write programs without a good idea of how the target computer will execute the code. The most important.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
1 Lecture 2 Parallel Programming Platforms Parallel Computing Spring 2015.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
LogP Model Motivation BSP Model Limited to BW of Network (g) and Load of PE Requires large load per super steps. Need Better Models for Portable Algorithms.
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Data Structures and Algorithms in Parallel Computing Lecture 4.
2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.
MPI implementation – collective communication MPI_Bcast implementation.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Distributed and Parallel Processing
PRAM Model for Parallel Computation
Lecture 2: Parallel computational models
Lecture 22 review PRAM: A model developed for parallel machines
Parallel computation models
PRAM Algorithms.
PRAM Model for Parallel Computation
Data Structures and Algorithms in Parallel Computing
Complexity Measures for Parallel Computation
Parallel and Distributed Algorithms
Objective of This Course
Architectural Interactions in High Performance Clusters
Part 2: Parallel Models (I)
Unit –VIII PRAM Algorithms.
Chapter 11: Alternative Architectures
Parallel Algorithms A Simple Model for Parallel Processing
Module 6: Introduction to Parallel Computing
Presentation transcript:

RAM, PRAM, and LogP models

Why models? What is a machine model? Why do we need models? An abstraction that describes the operation of a machine Associates a value (cost) with each machine operation Why do we need models? Makes it easier to analyze and develop algorithms Hides the machine implementation details so that general results that apply to a broad class of machines are obtainable Analyzes the achievable complexity (time, space, etc.) bounds Analyzes maximum parallelism Conversely, models are directly related to algorithms.

RAM (random access machine) model Memory consists of infinite array (memory cells). Instructions executed sequentially, one at a time All instructions take unit time: Load/store Arithmetic Logic Running time of an algorithm: the number of instructions executed Memory requirement: the number of memory cells used in the algorithm

RAM (random access machine) model The RAM model is the base of algorithm analysis for sequential algorithms although it is not perfect: Memory is not infinite Not all memory accesses take the same time Not all arithmetic operations take the same time Instruction pipelining is not taken into consideration The RAM model (with asymptotic analysis) often gives relatively realistic results

PRAM (Parallel RAM) A unbounded collection of processors Each process has an infinite number of registers A unbounded collection of shared memory cells All processors can access all memory cells in unit time (when there is no memory conflict) All processors execute PRAM instructions synchronously (some processors may be idle) Each PRAM instruction executes in a 3-phase cycle: Read from a share memory cell (if needed) Computation Write to a share memory cell (if needed)

PRAM (Parallel RAM) The only way processors exchange data is through the shared memory. Parallel time complexity: the number of synchronous steps in the algorithm Space complexity: the number of shared memory Parallelism: the number of processors used

PRAM All processors can do things in a synchronous manner (with infinite shared memory and infinite local memory). How many steps does it take to complete a task?

PRAM – further refinement PRAMs are further classifed based on how the memory conflicts are resolved. Read Exclusive Read (ER) – all processors can only simultaneously read from distinct memory locations (but not the same location). What if two processors want to read from the same location? Concurrent Read (CR) – all processors can simultaneously read from all memory locations.

PRAM – further refinement PRAMs are further classified based on how the memory conflicts are resolved. Write Exclusive Write (EW) – all processors can only simultaneously write to distinct memory locations (but not the same location) Concurrent Write (CR) – all processors can simultaneously write to all memory locations: Common CW: only allow the same value to be written to the same location simultaneously Random CW: randomly pick a value Priority CW: processors have priority, the value in the highest priority processor wins

PRAM model variations EREW, CREW, CRCW (common), CRCW (random), CRCW (Priority) Which model is closer to actual SMP machines? Model A is computationally stronger than model B if and only if any algorithm written in B will run unchanged in A. We can prove, EREW <= CREW <= CRCW (common) <= CRCW (random)

PRAM algorithm example SUM: Add N numbers in memory M[0, 1, …, N-1] Sequential SUM algorithm (O(N) complexity) for (i=0; i<N; i++) sum = sum + M[i]; PRAM SUM algorithm?

PRAM SUM algorithm Which mo Which PRAM model?

PRAM SUM algorithm complexity Time complexity? Number of processors needed? Speedup (vs. sequential program)?

Parallel search algorithm P processors PRAM with unsorted N numbers (P<=N) Does x exist in the N numbers? p_0 has x initially, p_0 must know the answer at the end.

Parallel search algorithm PRAM Algorithm: Step 1: Inform everyone what x is Step 2: every processor checks N/P numbers and sets a flag Step 3: Check if any flag is set to 1. EREW: O(log(p)) step 1, O(N/P) step 2, and O(log(p)) step 3. CREW: O(1) step 1, O(N/P) step 2, and O(log(p)) step 3. CRCW (common): O(1) step 1, O(N/P) step 2, and O(1) step 3.

PRAM strengths Natural extension of RAM It is simple and easy to understand Communication and synchronization issues are hidden Can be used as benchmarks If an algorithm performs badly in the PRAM model, it will perform badly on real machines A good PRAM program may not be practical, however It is useful in analyzing threaded algorithms for SMP/multicore machines

PRAM weaknesses Model inaccuracies Unaccounted costs Unbounded local memory (register) All operations take unit time Processors run in lock steps Unaccounted costs Non-local memory access Latency Bandwidth Memory access contention

PRAM variations Bounded memory PRAM, PRAM(m) In a given step, only m memory accesses can be serviced Bounded number of processors PRAM Any problem that can be solved by a p processor PRAM in t steps can be solved by a p’ processor PRAM in t = O(tp/p’) steps LPRAM L units to access global memory Any algorithm that runs in a p processor PRAM can run in LPRAM with a loss of a factor of L BPRAM L units for the first message B units for subsequent messages

PRAM summary The RAM model is widely used PRAM is simple and easy to understand This model rarely reaches beyond the algorithm community. It is getting more important as threaded programming becomes more popular. The BSP (bulk synchronous parallel) model is another try after PRAM Asynchronously progress Model latency and limited bandwidth

LogP model PRAM model: shared memory Common MPP organization: complete machine connected by a network LogP attempts to capture the characteristics of such organization M M M °°° P P P network

Deriving LogP model ° Processing – powerful microprocessor, large DRAM, cache => P ° Communication + significant latency => L + limited bandwidth => g + significant overhead => o - on both ends – no consensus on topology => should not exploit structure + limited capacity – no consensus on programming model => should not enforce one

LogP Latency in sending a (small) mesage between modules P ( processors ) P M P M P M ° ° ° o (overhead) o g (gap) L (latency) Limited Volume Interconnection Network ( L/ g to or from a proc) Latency in sending a (small) mesage between modules overhead felt by the processor on sending or receiving msg gap between successive sends or receives (1/BW) Processors

Using the model o L o o L o g time ° Send n messages from proc to proc in time 2o + L + g(n-1) – each processor does o n cycles of overhead – has (g-o)(n-1) + L available compute cycles ° Send n messages from one to many in same time ° Send n messages from many to one – all but L/g processors block so fewer available cycles P P

Using the model Two processors send n words to each other: 2o + L + g(n-1) Assumes no network contention Can underestimate the communication time

LogP philosophy Think about: – mapping of a task onto P processors – computation within a processor, its cost, and balance – communication between processors, its cost, and balance You are given a characterization of processor and network performance Do not think about what happens within the network

Develop optimal broadcast algorithm based on the LogP model Broadcast a single datum to P-1 processors

Strengths of the LogP model Simple, 4 parameters Can easily be used to guide the algorithm development, especially for algorithms for communication between processors This model has been used to analyze many collective communication algorithms.

Weaknesses of the LogP model Accurate only at the very low level (machine instruction level) Inaccurate for more practical communication systems with layers of protocols (e.g., TCP/IP) Many variations LogP family models: LogGP, logGPC, pLogP, etc. Making the model more accurate and more complex