Instruction level parallelism And Superscalar processors By Kevin Morfin
What are Superscalar processors A Superscalar processor is a processor in which multiple indepent instructions pipeline are used. - Each pipeline consists of multiple stages, which allow the pipeline to handle multiple instuctions at one time
Superscalar processor fetches multiple instructions at one time and attempts to fine nearby instructions that are independent of others and can execute in parrallel Superscalar approach can be used in RISC or CISC architecture. Superscalar processors are now standard method for implementing high performance microprocessors There are multiple functional units, each of which is implemented as a pipeline which support parallel execution of instructions
What is Instruction level parallelism Superscalar processors exploit what is called instruction level parallelism. Instruction level parallelism is the degree to which intructions of a program can be executed in parallel.
An alternative to Superscalar machines is Superpipelining. Superpipelining uses multiple pipeline stages per clock cycle. A pipeline can only execute only one instruction per clock cycle and has four stages. Functions performed in each stage can be split into more than one nonoverlapping parts and each can execute in half a clock cycle.
Limitations to parallelism True Data Depencency Procedural Dependency Resource conflict Output depency Antidepency
True data dependency True data dependency refers to when an instruction needs data produced by a previous instruction in order to execute. The second instruction is delayed as many clock cycles as required to remove the dependency. Example. consider the following code. ADD EAX, ECX ; //load register EAX with the contents of ECX plus EAX. MOV EBX, EAX; //load EBX with the contents of EAX.
Procedural Dependencies Procedural Dependency happens when there is a branch on an instruction. The instructions following the branch have a procedural dependency on the branch and cannot be executed until the branch is executed. There is another type of procedural dependency when there are variable length instructions are used. Because the variable-length instructions are not known, the instuctions must be partially decoded before the following instruction can be fecthed.
Resource Conflict Resource conflict is when there is a competition between two or more instructions for the same resource (memories, caches, buses, register file ports, functional units). Some what like data dependency but it can be overcome by duplicating resources.
Output Dependency Consider the code. I1: R3 R3 op R5 I2: R4 R3 + 1 I3: R3 R5 + 1 I4: R7 R3 op R4 There is no data dependency between I1 and I3, but if I3 execute before I1 then the wrong contents will be fetched for I4
Antidependency Consider the following code again. I1:R3 R3 op R5 I2:R4 R3 +1 I3:R3 R5 + 1 I4:R7 R3 op R4 The constaint in antidepency is similar to that of true data dependency but reversed Instead of the first instruction producing a value the second instruction uses, the second instruction destroys the value the first instruction produces
Pentium 4 The original Pentium had a modest superscalar component that consisted of two integer execution units. Pentium pro had a full blown superscalar design
Processor fetches instructions from memory in the order of the static program. Each instruction is translated into one or more fixed length RISC instructions known as micro operations Then the processor executes the micro operations on a superscalar pipeline organization which allows the micro operations to execute out of order. Finally the processor sends the result of each micro operation execution to the processsors register set in the order of the program flow.
Pipeline used by Pentium 4 The Pentium 4 has an outer CISC shell and RISC core. The mircro operations pass through a pipeline with at least 20 stages.
The ARM Cortex-A8 The ARM Cortex-A8 is a RISC based superscalar design. Implements a 13 stage pipeline.
References Stallings,W. (2010). Computer Organization and Architecture. Upper Saddle River, NJ: Prentice Hall Stallings,W.(2010). Computer Organization and Architecture. Retrieved November 20, 2010, from William Stallings' website: