EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer Systems Lecture 5 FP Pipelining & Dynamically Scheduled Pipelines and Overview of ARM Architecture Part I
EENG449b/Savvides Lec 5.2 1/27/04 Floating-Point Support in Pipelines Floating point operations will take more than 1 or 2 cycles to complete –Structural hazards –Data hazards Multiple functional units required –Loads, stores and integer ALUs –FP and integer multiplier –FP adder that handles FP add, subtract and conversion –FP and integer divider Initiation interval – number of cycles that must elapse before issuing two operations of a given type
EENG449b/Savvides Lec 5.3 1/27/04 Multiple FUs and Latencies Functional UnitLate ncy Initiation Interval Integer ALU01 Data memory (integer and FP Loads) 11 FP add31 FP Multiply61 FP Divide2425
EENG449b/Savvides Lec 5.4 1/27/04 Support for Multiple Outstanding Operations Additional pipeline registers needed
EENG449b/Savvides Lec 5.5 1/27/04 Hazards in Longer Pipelines 1.Divide unit is not fully pipelined - structural hazards can occur 2.Instructions have varying running times so the number of register writes required in a cycle can be larger than 1. 3.WAW hazards are possible, since instructions don’t reach WB in order 4.Instructions can complete in different order than the one they were issued causing problems with exceptions 5.Because of longer latency of operations, stalls for RAW hazards will be more frequent
EENG449b/Savvides Lec 5.6 1/27/04 FP Pipeline Hazards Example Figure A.34 Simultaneous writeback Stall an instruction in the ID stage Stall the instruction when it tries to enter WB
EENG449b/Savvides Lec 5.7 1/27/04 Checks for Detecting Hazards Three checks to be performed before a multicycle instruction can issue in the ID stage: Check for structural hazards –A structural unit is not busy and a write register port is available when needed Check for a RAW data hazard –Wait until the source registers are not listed as pending destinations Check for WAW data hazard –Determine an instruction that already issued has the same destination as this instruction. If so stall the instruction issue in ID.
EENG449b/Savvides Lec 5.8 1/27/04 MIPS R4000 Pipeline Decompose the 5-stage pipeline to a deeper 8-stage pipeline(superpipeline) –achieve higher clock rates => better performance Extra stages come from decomposing memory accesses Longer pipelines increase the amount of forwarding and branch delays
EENG449b/Savvides Lec 5.9 1/27/04 Branch Delay Cycles Branch outcome needs 3 cycles
EENG449b/Savvides Lec /27/04 Dynamic Scheduled Pipelines Simple pipelines result in hazards that require stalling. Static scheduling – compilers rearrange instructions to avoid stalls. Dynamic scheduling – processor executes instructions out-of-order to minimize stalls Dynamic scheduling requires splitting the ID stage into stages: –Issue – Decode instructions, check for structural hazards –Read operands – Wait until there are no data hazards, then read operands –Also need to know when each instruction begins and ends execution Requires a lot more bookkeeping! More when we discuss Tomasulo’s algorithm in chapter 3…
EENG449b/Savvides Lec /27/04 Scoreboarding Scoreboarding – a technique that allows out- of-order execution when resources are available and there are no data dependencies – originated in CDC6600 in the mid 60s. Scoreboard fully responsible for instruction execution and hazard detection –Requires changes in # of functional units and latency of operations –Needs to keep track of status of all instructions in execution
EENG449b/Savvides Lec /27/04 Scoreboarding II
EENG449b/Savvides Lec /27/04 More Hazards WAR and WAW hazards are now possible! DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F8, F8, F14 DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F10, F8, F14 WAR! If SUB.D Executes first WAW! If SUB.D Executes first
EENG449b/Savvides Lec /27/04 Refer to figures A.52 – A.54 for example scoreboard tables Scoreboarding is limited by: Amount of parallelism among instructions The number of scoreboard entries The number and types of functional units Presence of antidependencies and output dependencies
EENG449b/Savvides Lec /27/04 Announcements Example on page 44 of the textbook is wrong –CPI for FPSQR not included in the computation of CPI… –Everything after that is affected… Midterm I, Thursday Feb, 19 –Chapters 1, 2, Appendix A and microcontroller material from class. Readings for next class and project related material posted on the class website
EENG449b/Savvides Lec /27/04 ARM Architecture Part I
EENG449b/Savvides Lec /27/04 Where is ARM Today?
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04 Not the case when you have loads and stores!!!!
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04
EENG449b/Savvides Lec /27/04 Microcontroller View
EENG449b/Savvides Lec /27/04 Price/Performance/Peripheral Tradeoffs For many consumer electronics cost is an issue –ARM7TDMI cores have less HW and cost less –With today’s prices you can get an ARM7 based chip for < $5.00 Power Tradeoffs –Power performance is given in Watts/MIPS but –Lifetime is a bandwidth vs. throughput issue »Bandwidth vs. thoughput of battery life
EENG449b/Savvides Lec /27/04 Features ARM7TDMI ROM-less (ML675001) 256KB MCP Flash (ML67Q5002) 512KB MCP Flash (ML67Q5003) 8KB Unified Cache 32KB RAM Interrupts FIQ I2C (1-ch x master) DMA (2-ch) Timers (7 x 16-bit) WDT (16-bit) PWM (2 x 16-bit) UART (2-ch)/ SIO (1-ch) GPIO (5 x 8-bit) ADC (4-ch x 10-bit) up to 66MHz -40 ~ +85 C Package 144 LFBGA 144 QFP ML675001/67Q5002/67Q5003
EENG449b/Savvides Lec /27/04 Next Time Power Metrics Dynamic Voltage Scaling Microcontroller Programming Cycle