Pipelining, Parallelism, and Simplified Circuits Discrete Math April 13, 2006 Harding University Jonathan White.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Instruction Level Parallelism
CSCI 4717/5717 Computer Architecture
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
CMPT 334 Computer Organization
Pipeline and Vector Processing (Chapter2 and Appendix A)
Chapter 8. Pipelining.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
10/11: Lecture Topics Slides on starting a program from last time Where we are, where we’re going RISC vs. CISC reprise Execution cycle Pipelining Hazards.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
EECS 318 CAD Computer Aided Design LECTURE 2: DSP Architectures Instructor: Francis G. Wolff Case Western Reserve University This presentation.
Goal: Describe Pipelining
Chapter Six 1.
Chapter 12 Pipelining Strategies Performance Hazards.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
Pipelining By Toan Nguyen.
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Parallelism Processing more than one instruction at a time. Pipelining
9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.
CS1104: Computer Organisation School of Computing National University of Singapore.
What have mr aldred’s dirty clothes got to do with the cpu
RISC Architecture RISC vs CISC Sherwin Chan.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Processor Architecture
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
EKT303/4 Superscalar vs Super-pipelined.
Pipelining Example Laundry Example: Three Stages
Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
CBP 2005Comp 3070 Computer Architecture1 Last Time … All instructions the same length We learned to program MIPS And a bit about Intel’s x86 Instructions.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
CS 141 ChienMay 3, 1999 Concepts in Pipelining u Last Time –Midterm Exam, Grading in progress u Today –Concepts in Pipelining u Reminders/Announcements.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter One Introduction to Pipelined Processors.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CS203 – Advanced Computer Architecture Pipelining Review.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
PipeliningPipelining Computer Architecture (Fall 2006)
DICCD Class-08. Parallel processing A parallel processing system is able to perform concurrent data processing to achieve faster execution time The system.
Chapter Six.
Advanced Architectures
Lecture 18: Pipelining I.
Pipelines An overview of pipelining
Central Processing Unit Architecture
CMSC 611: Advanced Computer Architecture
Pipeline Implementation (4.6)
Chapter One Introduction to Pipelined Processors
Central Processing Unit
Lecturer: Alan Christopher
Serial versus Pipelined Execution
Chapter Six.
Chapter Six.
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Chapter 8. Pipelining.
Introduction to Microprocessor Programming
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Pipelining.
Presentation transcript:

Pipelining, Parallelism, and Simplified Circuits Discrete Math April 13, 2006 Harding University Jonathan White

Outline What Pipelining is What Pipelining is BenefitsBenefits DownsidesDownsides How modern processors use PipeliningHow modern processors use Pipelining Parallelism Parallelism ThreadsThreads CircuitsCircuits Pros/ConsPros/Cons

Pipelining Definition: Definition: Pipelining is an implementation technique where multiple instructions are overlapped in execution on a processor.Pipelining is an implementation technique where multiple instructions are overlapped in execution on a processor. Each stage completes part of an instruction in parallel. Each stage completes part of an instruction in parallel. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end.

Pipelining Laundry Example 4 loads of laundry that need to washed, dried, and folded. 4 loads of laundry that need to washed, dried, and folded. 30 minutes to wash, 40 min. to dry, and 20 min. to fold.30 minutes to wash, 40 min. to dry, and 20 min. to fold. We have 1 washer, 1 dryer, and 1 folding station.We have 1 washer, 1 dryer, and 1 folding station. What’s the most efficient way to get the 4 loads of laundry done? What’s the most efficient way to get the 4 loads of laundry done?

Non Pipelined Laundry Wash, dry, fold. Wash, dry, fold. Then wash, dry, fold.Then wash, dry, fold. Then wash, dry fold…. Then wash, dry fold…. Takes a total of 6 hours; nothing is done in parallel Takes a total of 6 hours; nothing is done in parallel

Pipelined Laundry A better idea would be start the next load washing while the first is drying. A better idea would be start the next load washing while the first is drying. Then, while the first load was being folded, the second load would dry and a new load could be put in the washer.Then, while the first load was being folded, the second load would dry and a new load could be put in the washer. Using this method, the laundry would be done at 9:30. Using this method, the laundry would be done at 9:30.

Processors Computers, like laundry, typically perform the exact same steps for every instruction: Computers, like laundry, typically perform the exact same steps for every instruction: Fetch an instruction from memoryFetch an instruction from memory Decode the instructionDecode the instruction Execute the instructionExecute the instruction Read memory to get inputRead memory to get input Write the result back to memoryWrite the result back to memory

Example of a Basic Non-Pipelined Instruction

Example of a Pipelined Architecture

Pipelining Aspects The length of the longest step dictates the length of the pipeline stages. The length of the longest step dictates the length of the pipeline stages. So, the slowest resource affects the entire process.So, the slowest resource affects the entire process. What’s the slowest process in a processor’s 5 steps? What’s the slowest process in a processor’s 5 steps? Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of any individual instruction. Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of any individual instruction.

Pipeline Video

Pipelining Benefits For the right instruction language, pipelining increases performance linearly with the number of pipeline stages. For the right instruction language, pipelining increases performance linearly with the number of pipeline stages. Languages are designed to be pipelined now.Languages are designed to be pipelined now. RISC vs CISC architectures RISC vs CISC architectures Pipelining is easy to do with only a few additionsPipelining is easy to do with only a few additions Pipelining makes efficient use of resources. Pipelining makes efficient use of resources. Circuits consume similar amounts of power whether performing calculations or just waiting.Circuits consume similar amounts of power whether performing calculations or just waiting.

Pipelining Downsides Pipelining requires additional hardware Pipelining requires additional hardware Every instruction must be able to be performed in each of the stagesEvery instruction must be able to be performed in each of the stages ie, some instruction require the ALU in more than one step. ie, some instruction require the ALU in more than one step. Registers to hold data between cyclesRegisters to hold data between cycles More ALU’s are required.More ALU’s are required. For example, 1 ALU is needed just to increase the program counter. For example, 1 ALU is needed just to increase the program counter. Branch prediction and collision avoidance units are required.Branch prediction and collision avoidance units are required. Often times, you will have to clear the pipeline when you’ve written code that causes a hazard. Often times, you will have to clear the pipeline when you’ve written code that causes a hazard. X = Y +4X = Y +4 Z = X + 1Z = X + 1

Branch Prediction How many times will this loop execute? How many times will this loop execute? for(int x = 0; x<100; x++)for(int x = 0; x<100; x++){ do something…. } It would be nice for the processor to be able to predict that this code will be executed more than once… Some modern processors just assume branch will never be taken. Also, compilers will often do out of order execution of commands to avoid stalling the pipe.

More benefits of Pipelining The parallelism is invisible to the programmer. The parallelism is invisible to the programmer.

Modern processors Pentium 4’s have a 30 stage pipeline. Pentium 4’s have a 30 stage pipeline. If the pipeline gets too large, there is too much overhead (flushing 300 stages is easier than 30).If the pipeline gets too large, there is too much overhead (flushing 300 stages is easier than 30). However, new processors like the CELL processor in the Playstation 3 are moving to multicore architectures. However, new processors like the CELL processor in the Playstation 3 are moving to multicore architectures. The pipeline is much smaller; between 5 and 10.The pipeline is much smaller; between 5 and 10. Multicore processors work best for applications that run a lot of threaded applications that are easily seperable.Multicore processors work best for applications that run a lot of threaded applications that are easily seperable.

Other Levels of Parallelism Threads Threads Way for an application to split itself into 2 separate tasks.Way for an application to split itself into 2 separate tasks. MS WordMS Word Logic circuits Logic circuits These are naturally parallelThese are naturally parallel

Pros of Parallelism The average throughput is greatly increased. The average throughput is greatly increased. Very little time is wasted.Very little time is wasted. A lot of things are naturally parallel. A lot of things are naturally parallel.

Cons of Parallelism Requires more overhead. Requires more overhead. More power, more componentsMore power, more components For threaded computer programs, either the kernel or your program must do some work to switch between individual threads.For threaded computer programs, either the kernel or your program must do some work to switch between individual threads. At some point, more parallelism actually makes things slower. At some point, more parallelism actually makes things slower. You spend too much time switching between tasks instead of doing actual work.You spend too much time switching between tasks instead of doing actual work.