Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.

Slides:



Advertisements
Similar presentations
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advertisements

ISO 9126: Software Quality Internal Quality: Maintainability
The Central Processing Unit: What Goes on Inside the Computer.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
CSC401 – Analysis of Algorithms Lecture Notes 1 Introduction
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Technology Component: Computer Hardware – Part 2 Basic Components Dr. V.T. Raja Oregon State University.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Operating Systems Lecture 1 Crucial hardware concepts review M. Naghibzadeh Reference: M. Naghibzadeh, Operating System Concepts and Techniques, iUniverse.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
The Pentium: A CISC Architecture Shalvin Maharaj CS Umesh Maharaj:
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Writer:-Rashedul Hasan Editor:- Jasim Uddin
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
3 1 3 C H A P T E R Hardware: Input, Processing, and Output Devices.
Basics and Architectures
Week 2 CS 361: Advanced Data Structures and Algorithms
Chapter 2 The CPU and the Main Board  2.1 Components of the CPU 2.1 Components of the CPU 2.1 Components of the CPU  2.2Performance and Instruction Sets.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Microprocessor Dr. Rabie A. Ramadan Al-Azhar University Lecture 2.
Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Computer Engineering Rabie A. Ramadan Lecture 1. 2 Welcome Back.
Chapter 17 Looking “Under the Hood”. 2Practical PC 5 th Edition Chapter 17 Getting Started In this Chapter, you will learn: − How does a computer work.
Algorithm Analysis Part of slides are borrowed from UST.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
M211 – Central Processing Unit
Von Neumann Machines. 3 The Von Neumann Architecture Model for designing and building computers, based on the following three characteristics: 1)The.
1 A simple parallel algorithm Adding n numbers in parallel.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
What’s going on here? Can you think of a generic way to describe both of these?
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Algorithm Complexity is concerned about how fast or slow particular algorithm performs.
Microprocessor and Microcontroller Fundamentals
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
مقدمة في علوم الحاسب Lecture 5
The Goal: illusion of large, fast, cheap memory
RAM, CPUs, & BUSES Egle Cebelyte.
Introduction to Algorithms
Lecture 2: Parallel computational models
INTRODUCTION TO MICROPROCESSORS
Lecture-1 Introduction
INTRODUCTION TO MICROPROCESSORS
INTRODUCTION TO MICROPROCESSORS
Central Processing Unit
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
T Computer Architecture, Autumn 2005
3.1 Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Performance of computer systems
Chapter 1 Introduction.
COMS 361 Computer Organization
Performance of computer systems
Memory System Performance Chapter 3
Course Code 114 Introduction to Computer Science
Presentation transcript:

Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006

2 A simple parallel algorithm Adding n numbers in parallel

3 A simple parallel algorithm Example for 8 numbers: We start with 4 processors and each of them adds 2 items in the first step. The number of items is halved at every subsequent step. Hence log n steps are required for adding n numbers. The processor requirement is O(n). We have omitted many details from our description of the algorithm. How do we allocate tasks to processors? Where is the input stored? How do the processors access the input as well as intermediate results? We do not ask these questions while designing sequential algorithms.

4 How do we analyze a parallel algorithm? A parallel algorithms is analyzed mainly in terms of its time, processor and work complexities. Time complexity T(n) : How many time steps are needed? Processor complexity P(n) : How many processors are used? Work complexity W(n) : What is the total work done by all the processors? Hence, For our example: T(n) = O(log n) P(n) = O(n) W(n) = O(n log n)

5 How do we judge efficiency? We say A 1 is more efficient than A 2 if W 1 (n) = o(W 2 (n)) regardless of their time complexities. For example, W 1 (n) = O(n) and W 2 (n) = O(n log n) Consider two parallel algorithms A 1 and A 2 for the same problem. A 1 : W 1 (n) work in T 1 (n) time. A 2 : W 2 (n) work in T 2 (n) time. If W 1 (n) and W 2 (n) are asymptotically the same then A 1 is more efficient than A 2 if T 1 (n) = o(T 2 (n)). For example, W 1 (n) = W 2 (n) = O(n), but T 1 (n) = O(log n), T 2 (n) = O(log 2 n)

6 How do we judge efficiency? It is difficult to give a more formal definition of efficiency. Consider the following situation. For A 1, W 1 (n) = O(n log n) and T 1 (n) = O(n). For A 2, W 2 (n) = O(n log 2 n) and T 2 (n) = O(log n) It is difficult to say which one is the better algorithm. Though A 1 is more efficient in terms of work, A 2 runs much faster. Both algorithms are interesting and one may be better than the other depending on a specific parallel machine.

7 Optimal parallel algorithms Consider a problem, and let T(n) be the worst-case time upper bound on a serial algorithm for an input of length n. Assume also that T(n) is the lower bound for solving the problem. Hence, we cannot have a better upper bound. Consider a parallel algorithm for the same problem that does W(n) work in T par (n) time. The parallel algorithm is work optimal, if W(n) = O(T(n)) It is work-time-optimal, if T par (n) cannot be improved.

8 A simple parallel algorithm Adding n numbers in parallel

9 A work-optimal algorithm for adding n numbers Step 1. Use only n/log n processors and assign log nnumbers to each processor. Each processor adds log n numbers sequentially in O(log n) time. Step 2. We have only n/log n numbers left. We now execute our original algorithm on these n/log n numbers. Now T(n) = O(log n) W(n) = O(n/log n x log n) = O(n)

10 Why is parallel computing important? We can justify the importance of parallel computing for two reasons. Very large application domains, and Physical limitations of VLSI circuits Though computers are getting faster and faster, user demands for solving very large problems is growing at a still faster rate. Some examples include weather forecasting, simulation of protein folding, computational physics etc.

11 Physical limitations of VLSI circuits The Pentium III processor uses 180 nano meter (nm) technology, i.e., a circuit element like a transistor can be etched within 180 x m. Pentium IV processor uses 160nm technology. Intel has recently trialed processors made by using 65nm technology.

12

13 How many transistors can we pack? Pentium III has about 42 million transistors and Pentium IV about 55 million transistors. The number of transistors on a chip is approximately doubling every 18 months (Moore ’ s Law). There are now 100 transistors for every ant on Earth (Moore said so in a recent lecture).

14 Physical limitations of VLSI circuits All semiconductor devices are Si based. It is fairly safe to assume that a circuit element will take at least a single Si atom. The covalent bonding in Si has a bond length approximately 20nm. Hence, we will reach the limit of miniaturization very soon. The upper bound on the speed of electronic signals is 3 x 10 8 m/sec, the speed of light. Hence, communication between two adjacent transistors will take approximately sec. If we assume that a floating point operation involves switching of at least a few thousand transistors, such an operation will take about sec in the limit. Hence, we are looking at 1000 teraflop machines at the peak of this technology. (TFLOPS, FLOPS) 1 flop = a floating point operation This is a very optimistic scenario.

15 Other Problems The most difficult problem is to control power dissipation. 75 watts is considered a maximum power output of a processor. As we pack more transistors, the power output goes up and better cooling is necessary. Intel cooled its 8 GHz demo processor using liquid Nitrogen !

16 The advantages of parallel computing Parallel computing offers the possibility of overcoming such physical limits by solving problems in parallel. In principle, thousands, even millions of processors can be used to solve a problem in parallel and today ’ s fastest parallel computers have already reached teraflop speeds. Today ’ s microprocessors are already using several parallel processing techniques like instruction level parallelism, pipelined instruction fetching etc. Intel uses hyper threading in Pentium IV mainly because the processor is clocked at 3 GHz, but the memory bus operates only at about MHz.

17

18 Problems in parallel computing The sequential or uni-processor computing model is based on von Neumann ’ s stored program model. A program is written, compiled and stored in memory and it is executed by bringing one instruction at a time to the CPU.

19 Problems in parallel computing Programs are written keeping this model in mind. Hence, there is a close match between the software and the hardware on which it runs. The theoretical RAM model captures these concepts nicely. There are many different models of parallel computing and each model is programmed in a different way. Hence an algorithm designer has to keep in mind a specific model for designing an algorithm. Most parallel machines are suitable for solving specific types of problems. Designing operating systems is also a major issue.

20 The PRAM model n processors are connected to a shared memory.

21 The PRAM model Each processor should be able to access any memory location in each clock cycle. Hence, there may be conflicts in memory access. Also, memory management hardware needs to be very complex. We need some kind of hardware to connect the processors to individual locations in the shared memory. The SB-PRAM developed at University of Saarlandes by Prof. Wolfgang Paul ’ s group is one such model.

22 The PRAM model A more realistic PRAM model