Pipelining, Parallelism, and Simplified Circuits Discrete Math April 13, 2006 Harding University Jonathan White
Outline What Pipelining is What Pipelining is BenefitsBenefits DownsidesDownsides How modern processors use PipeliningHow modern processors use Pipelining Parallelism Parallelism ThreadsThreads CircuitsCircuits Pros/ConsPros/Cons
Pipelining Definition: Definition: Pipelining is an implementation technique where multiple instructions are overlapped in execution on a processor.Pipelining is an implementation technique where multiple instructions are overlapped in execution on a processor. Each stage completes part of an instruction in parallel. Each stage completes part of an instruction in parallel. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end.
Pipelining Laundry Example 4 loads of laundry that need to washed, dried, and folded. 4 loads of laundry that need to washed, dried, and folded. 30 minutes to wash, 40 min. to dry, and 20 min. to fold.30 minutes to wash, 40 min. to dry, and 20 min. to fold. We have 1 washer, 1 dryer, and 1 folding station.We have 1 washer, 1 dryer, and 1 folding station. What’s the most efficient way to get the 4 loads of laundry done? What’s the most efficient way to get the 4 loads of laundry done?
Non Pipelined Laundry Wash, dry, fold. Wash, dry, fold. Then wash, dry, fold.Then wash, dry, fold. Then wash, dry fold…. Then wash, dry fold…. Takes a total of 6 hours; nothing is done in parallel Takes a total of 6 hours; nothing is done in parallel
Pipelined Laundry A better idea would be start the next load washing while the first is drying. A better idea would be start the next load washing while the first is drying. Then, while the first load was being folded, the second load would dry and a new load could be put in the washer.Then, while the first load was being folded, the second load would dry and a new load could be put in the washer. Using this method, the laundry would be done at 9:30. Using this method, the laundry would be done at 9:30.
Processors Computers, like laundry, typically perform the exact same steps for every instruction: Computers, like laundry, typically perform the exact same steps for every instruction: Fetch an instruction from memoryFetch an instruction from memory Decode the instructionDecode the instruction Execute the instructionExecute the instruction Read memory to get inputRead memory to get input Write the result back to memoryWrite the result back to memory
Example of a Basic Non-Pipelined Instruction
Example of a Pipelined Architecture
Pipelining Aspects The length of the longest step dictates the length of the pipeline stages. The length of the longest step dictates the length of the pipeline stages. So, the slowest resource affects the entire process.So, the slowest resource affects the entire process. What’s the slowest process in a processor’s 5 steps? What’s the slowest process in a processor’s 5 steps? Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of any individual instruction. Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of any individual instruction.
Pipeline Video
Pipelining Benefits For the right instruction language, pipelining increases performance linearly with the number of pipeline stages. For the right instruction language, pipelining increases performance linearly with the number of pipeline stages. Languages are designed to be pipelined now.Languages are designed to be pipelined now. RISC vs CISC architectures RISC vs CISC architectures Pipelining is easy to do with only a few additionsPipelining is easy to do with only a few additions Pipelining makes efficient use of resources. Pipelining makes efficient use of resources. Circuits consume similar amounts of power whether performing calculations or just waiting.Circuits consume similar amounts of power whether performing calculations or just waiting.
Pipelining Downsides Pipelining requires additional hardware Pipelining requires additional hardware Every instruction must be able to be performed in each of the stagesEvery instruction must be able to be performed in each of the stages ie, some instruction require the ALU in more than one step. ie, some instruction require the ALU in more than one step. Registers to hold data between cyclesRegisters to hold data between cycles More ALU’s are required.More ALU’s are required. For example, 1 ALU is needed just to increase the program counter. For example, 1 ALU is needed just to increase the program counter. Branch prediction and collision avoidance units are required.Branch prediction and collision avoidance units are required. Often times, you will have to clear the pipeline when you’ve written code that causes a hazard. Often times, you will have to clear the pipeline when you’ve written code that causes a hazard. X = Y +4X = Y +4 Z = X + 1Z = X + 1
Branch Prediction How many times will this loop execute? How many times will this loop execute? for(int x = 0; x<100; x++)for(int x = 0; x<100; x++){ do something…. } It would be nice for the processor to be able to predict that this code will be executed more than once… Some modern processors just assume branch will never be taken. Also, compilers will often do out of order execution of commands to avoid stalling the pipe.
More benefits of Pipelining The parallelism is invisible to the programmer. The parallelism is invisible to the programmer.
Modern processors Pentium 4’s have a 30 stage pipeline. Pentium 4’s have a 30 stage pipeline. If the pipeline gets too large, there is too much overhead (flushing 300 stages is easier than 30).If the pipeline gets too large, there is too much overhead (flushing 300 stages is easier than 30). However, new processors like the CELL processor in the Playstation 3 are moving to multicore architectures. However, new processors like the CELL processor in the Playstation 3 are moving to multicore architectures. The pipeline is much smaller; between 5 and 10.The pipeline is much smaller; between 5 and 10. Multicore processors work best for applications that run a lot of threaded applications that are easily seperable.Multicore processors work best for applications that run a lot of threaded applications that are easily seperable.
Other Levels of Parallelism Threads Threads Way for an application to split itself into 2 separate tasks.Way for an application to split itself into 2 separate tasks. MS WordMS Word Logic circuits Logic circuits These are naturally parallelThese are naturally parallel
Pros of Parallelism The average throughput is greatly increased. The average throughput is greatly increased. Very little time is wasted.Very little time is wasted. A lot of things are naturally parallel. A lot of things are naturally parallel.
Cons of Parallelism Requires more overhead. Requires more overhead. More power, more componentsMore power, more components For threaded computer programs, either the kernel or your program must do some work to switch between individual threads.For threaded computer programs, either the kernel or your program must do some work to switch between individual threads. At some point, more parallelism actually makes things slower. At some point, more parallelism actually makes things slower. You spend too much time switching between tasks instead of doing actual work.You spend too much time switching between tasks instead of doing actual work.