Data Structures and Algorithms in Parallel Computing Lecture 1.

Slides:



Advertisements
Similar presentations
Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
Advertisements

PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
SE-292 High Performance Computing
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Today’s topics Single processors and the Memory Hierarchy
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
Efficient Parallel Algorithms COMP308
Parallel Computers Chapter 1
Reference: Message Passing Fundamentals.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
LIGO-G Z 8 June 2001L.S.Finn/LDAS Camp1 How to think about parallel programming.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
RAM, PRAM, and LogP models
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Parallel Computing.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Parallel Computing Presented by Justin Reschke
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Processor Level Parallelism 1
Overview Parallel Processing Pipelining
Higher Level Parallelism
Distributed and Parallel Processing
buses, crossing switch, multistage network.
Parallel Processing - introduction
Course Outline Introduction in algorithms and applications
Lecture 2: Parallel computational models
Pipelining and Vector Processing
Data Structures and Algorithms in Parallel Computing
buses, crossing switch, multistage network.
Overview Parallel Processing Pipelining
AN INTRODUCTION ON PARALLEL PROCESSING
Advanced Computer and Parallel Processing
Part 2: Parallel Models (I)
Chapter 4 Multiprocessors
Advanced Computer and Parallel Processing
Presentation transcript:

Data Structures and Algorithms in Parallel Computing Lecture 1

Parallel computing Form of computation in which many calculations are done simultaneously Divide and conquer – Split problem and solve each sub-problem in parallel – Pay the communication cost

A bit of history 1958 – S. Gill discusses parallel programming – J. Cocke and D. Slotnick discuss parallel numerical computing 1967 – Amdahl’s law is introduced Defines the speed-up due to parallelism 1969 – Honneywell introduces the first symmetric multiprocessor It allowed for up to 8 parallel processors July 2015 – China’s Tianhe-2 is the fastest computer in the world petaflops

Classification Bit level – Increase word size to reduce number of instructions 2 instructions to add a 16 bit number on 8 bit processor 1 instruction to add a 16 bit number on 16 bit processor Instruction level – Hardware level – Software level – Example 1.e = a + b 2.f = c + d 3.m = e * f 3 depends on 1 and 2 both of which can be executed in parallel

Classification (2) Data parallelism – Big Data Volume, Velocity, Variety, Veracity Does not fit in memory – Split data among different processors Each processor executes same code on different data piece – MapReduce Task parallelism – Distribute tasks on processors and execute them in parallel

Architecture classification Flynn’s taxonomy (1966) – Single Instruction Single Data stream (SISD) No parallelism Uniprocessor PCs – Single Instruction Multiple Data streams (SIMD) Data parallelism GPUs – Multiple Instructions Single Data streams (MISD) Fault tolerant systems – Multiple Instructions Multiple Data streams (MIMD) Different tasks handle different data streams Distributed computing

Architecture classification (2) MIMD can be further divided: – Single Program Multiple Data Autonomous processors execute asynchronously the same program – Multiple Program Multiple Data Autonomous processors execute different programs – Manager/worker strategy

Memory models Shared memory – Multiple programs access the same memory – Example: Cray machines Distributed shared memory – Memory physically distributed – Programs access the same address space Distributed memory – Each processor has its own private memory – Example: Grid computing

Need for speed Amdahl’s law

Algorithm design How to transform a sequential algorithm in a parallel one? Example: Compute the sum of n numbers – Numbers are stored in a matrix A 1.Pair A[i] with A[i+1] 2.Add the pair on machine k We need n/k machines We obtain a new sequence of n/k numbers 3.Repeat from step 1 4.After log 2 n iterations we get a sequence of 1 number: the sum

Modeling parallel computations No consensus on the right model Random-Access Machine (PRAM) – Ignores many of the computer architecture details – Captures enough detail for reasonable accuracy – Each CPU operation including arithmetic and logical operations, and memory accesses requires 1 time step

Multiprocessor model Local memory – Each processor has its own local memory – Processors are attached to a local network Modular memory – M memory modules Parallel RAM (PRAM) – Shared memory – No real machine lives up to its ideal of unit time access to a shared memory

Network limitations Communication bottlenecks Bus topology – Processors take turn to access the bus 2 dimensional mesh – Remote accesses are done by routing messages – Appears in local memory machines Multistage network – Used to connect one set of input switches to another set of output switches – Designed for telephone networks – Appears in modular memory machines – Processors are attached to input switches and memory to output switches

Network limitations (2) Algorithms designed for one topology may not work for another Algorithms considering network topology are more complicated than the ones designed for simpler models such as PRAM

Model routing capabilities Alternative to topology modeling Consider – Bandwidth Rate at which a processor can inject data in the network – Latency Time to traverse the network

Model routing capabilities (2) Existing models: – Postal model Model only latency – Bulk Synchronous Parallel Adds g, i.e., the minimum ration of computation steps to communication steps – LogP Adds o, i.e., the overhead of a processor upon sending/receiving a message

Primitive operations Basic operations that processors and network can perform – All processors can perform the same local instructions as the single processor in the RAM model – Processors can also issue non-local memory requests For message passing For global operations

Restrictions on operations Restrictions on operations can exist – E.g., two processors may not write the same memory location at the same time Exclusive vs. concurrent access – Exclusive read exclusive write (EREW) – Concurrent read concurrent write (CRCW) – Concurrent read exclusive write (CREW) Solving concurrent writes – Random picking – Priority picking – Queued access: Queued read queued write

Examples of operations Read-write to non-local memory or other processors Synchronization Broadcast messages to processors Gather messages from processors

Work-depth model Focus on algorithm instead of the multiprocessor model Cost of an algorithm is determined based on the number of operations and their dependencies: P=W/D – Where W is the total number of operations (work) – And D is the longest chain of dependencies among them (depth)

Types of work-depth models Vector model – Sequence of steps operating on a vector Circuit model – Nodes (operations) and directed arcs (communication) – Input arcs Provide input to the whole circuit – Output arcs Return the final output values of the circuit – No directed cycles allowed Language model

Circuit model example

Importance of cost Cost can be applied to multiprocessor models too – The work is equal to the number of processors times the time required for the algorithm to finish – The depth is equal to the total time required to execute the algorithm E.g., weather forecasting, real-time planning A parallel algorithm is work-efficient if asymptotically it requires at most a constant factor more work than the best sequential algorithm known

What’s next? Parallel algorithmic techniques – Divide and conquer – Randomization – Parallel pointer techniques Graphs – Breadth first search – Connected components – Page Rank – Single source shortest path – Vertex centric vs. subgraph centric models Sorting – Quicksort – Radix sort Computational geometry – Closest pair – Planar convex hull Numerical algorithms – Matrix operations – Fourier transform

Evaluation 100% of the grade comes from projects & assignments – 7 assignments requiring to implement a parallel algorithm – Passing grade if at least 2 are completed