Presentation is loading. Please wait.

Presentation is loading. Please wait.

Team Antelope Final Presentation

Similar presentations


Presentation on theme: "Team Antelope Final Presentation"— Presentation transcript:

1 Team Antelope Final Presentation
What doesn’t kill you, makes you stronger "The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.“ James Zirkle John Lange Peter Johnson Chris

2 Processor Overview 5 stage pipeline 10 nanosecond clock 128 bit memory
Despite all my rage I'm still just a rat in a cage --Bullet With Butterfly Wings 5 stage pipeline 10 nanosecond clock 128 bit memory Split Caches Write back policy CLZ and Multiply simplified to 1 clock cycle MicroSequencer used to handle complex operations

3 Who did what James Jack Peter Chris Register File, Integration
Cache, Memory, ALU Peter Shifter, hazard detection unit Chris Multiplier, CLZ, interrupts

4 “Quidquid latine dictum sit, altum viditur”
ALU “Quidquid latine dictum sit, altum viditur” Handles all 16 data processing instructions Determines PSR flag values 4 bit carry look ahead units, combined into 16 blocks

5 Shifter 32 bit Barrel Shifter
Logical Shift Left/Right, Arithmetic Shift Right, Rotate Right, Rotate Right Extended Special Cases (LSR #0 encodes LSR #32, etc) Generates result by combining individual bit shifters

6 Result propagated through bit shifters
Barrel Shifter Added 32-bit Shifters Result propagated through bit shifters

7 16 Bit-Right Shifter

8 32-Bit Barrel Shifter Carry In / Carry Out
-Carry in only used in RRX (rotate right extended) operations -Carry out always computed, even though not needed in rotate operations

9 Carry Out Logic: Two Options
Separate logic computes Cout early using input and shift amount Pros: -Cout signal ready much earlier, no need for propagation -Simpler bit shifter designs Cons: -Many more gates needed

10 Carry Out Logic: Two Options
Individual bit shifters compute and propagate Cout signal Pros: -Simpler overall design -Fewer logic gates Cons: -Takes longer for Cout to be ready (propagation delay) -More complicated bit shifters

11 Carry out: Conclusion Went ahead and implemented Cout logic in the bit shifters -Don’t really need the signal to be ready any earlier than the rest of the shifter output, especially not at the addition gate cost -Each shifter computes Cout for its own shift amount and passes it on, or leaves Cout alone if it is disabled

12 Complete Shifter

13 Multiplier (MUL/MLA) 32 additions in parallel Logarithmic time result
25 = 32, so time equals 5 adds Multiply w/accumulate inserted at the end with a multiplexor

14

15 Count leading zeros (CLZ)
Output equals number of leading zeros on the input (Ex:  ) First step:  Then, add one:  Lastly, convert to binary. With a 32-bit input, output will have a 6-digit maximum. Timing: Only four gate delays.

16

17 Register File 37 Total Registers
Different modes select between different registers. Registers r0-r7 and the PC (r15) are common to all modes PSR Mode bits select between different register banks

18 Register File, Continued
3 normal (r0-r15) register outputs. 1 input that can access r0-r15 An input and an output dedicated to the PC An input and an output dedicated to the SPSR

19 Pipeline Design and Component Integration
“The manual for a ferrari 250 states that replacing the timing chain is a five-step process. Step one is the simple (?) instruction: ‘Invert motor on bench.’”

20 Pipeline Selection Selected a 5 stage pipeline design
Fetch: Instruction is retrieved from memory Decode: Instruction is processed, control signals sent Execute: ALU, Shift, Multiply and CLZ operations Memory: Data cache/memory access Writeback: Results are written back to the register file Fetch->Decode->Execute->Memory->Writeback

21 Advantages Breaks datapath into logical operational blocks.
Slower stages can be broken up to increase the clock speed. Results in higher throughput

22 Disadvantages More time consuming to implement.
Data hazards appear, so must implement forwarding and stalls in certain circumstances. This further complicates the design.

23 Fetch Decode Execute Memory Writeback

24 Pipelined Datapath Construction
“Purpose—to drive you to insanity” Implemented simple single stage datapath first. Used D flip-flops to break up the datapath into the 5 different stages. Added memory and cache. Stall the pipeline by holding the clock.

25 Fetch Stage Consists of Data Cache Runs almost every cycle.
Stalled independently of the rest of the stages while the Sequencer is running.

26 Execute Stage Contains: Shifter ALU Multiplier CLZ unit
Conditional Execution unit PSR and PSR control

27 Memory Writeback Contains the interface to Data Cache
Writes back to registers

28 Decode Stage, Continued
Stage contains: Register File Sequencer Branching logic 32 bit shift extender 32 bit full adder PC is output from the register file straight into the Instruction Cache address

29 Decode Stage Modular design, each instruction type has one module that is connected to a mux PLA takes instruction and outputs a 4 bit select signal that selects between all modules. Control is contained in a 32 bit bus that is piped through the entire processor.

30 Current Processor Implementation

31 Hazards Read after Write: 1. 2. FETCH DEC EXEC DATA WB FETCH DEC STALL

32 Hazards Branch: 1. 2. 3. 4. FETCH DEC EXEC DATA WB FETCH DEC EXEC DATA
(Branch Target) FETCH DEC EXEC DATA WB

33 Hazard Checking Logic Checks to see if Rd (destination register) is read from in next 2 commands

34 Data Forwarding FETCH DECODE EXEC Result DATA BUFFER Data WRITEBACK

35 Overview

36 CONDITION EVALUATE CPSR Flags

37 Interrupt Handler Component must handle the following seven cases:
Reset (Highest Priority) Data Abort FIQ IRQ Prefetch Abort Undefined Instruction Software Interrupt (SWI) (Lowest Priority)

38 Implementation One ROM file handles memory addresses.
3-bit input leads to 32-bit address for PC. Second ROM file handles CPSR alterations. 4-bit input leads to lower 8 bits of CPSR. Priorities of the interrupts are handled with CLZ functionality. Lastly, no interrupts leads to “Active = 0”.

39

40 Memory 128 bit wide Main Memory 32 bit Split cache system
"Memory is like an orgasm. It's a lot better if you don't have to fake it.“ -- Seymour Cray 128 bit wide Main Memory 32 bit Split cache system Data and Instruction Data Cache operates with Write Back Policy 2 State Machines in charge of Memory Control

41 Main Memory Control Simulates memory latency with a delay component
It wasn't very sporting, but what the hell. - Chuck Yeager on shooting down a landing Me-262 Simulates memory latency with a delay component Implemented with a state machine Enters a wait state while holding for memory to finish Operation order: Data first, Instruction second Signals when data is valid, and when operation is finished

42 Memory State Machine

43 “I'm just here for moral support. Ignore the gun.”
Caches “I'm just here for moral support. Ignore the gun.” 128 bit lines separated into 32 bit blocks Hits determined by using high address bits, as well as a valid bit Write strategy uses Dirty bit to signal when to write to memory On reset valid and dirty bits are cleared Can operate in 128, 32, and 8 bit modes Necessary for memory and processor interface

44 "A day without killing... is like a day without sunshine“
Cache Reset "A day without killing... is like a day without sunshine“ -John Wayne Cache reset controlled by two signals RESET and MEM_CLEAR When MEM_CLEAR is pulsed a sequencer is engaged Adder attached to a flip-flop Cycles through addresses, setting values to 0 Asserts pipeline hold signal while running RESET clears all the state machines back to initial state

45 "He spoke, I had no clue, it was a mutual relationship.“
Memory System Control "He spoke, I had no clue, it was a mutual relationship.“ Implemented with a state machine Interfaces I-Cache, D-Cache, Main Memory, and Pipeline During operation, pipeline hold signal is asserted Autonomous operation, requires no special datapath control Took so much time, that it made my girlfriend jealous

46 Memory Control Overview

47 Memory Control FSMs

48 Memory Control FSMs

49 Memory Control FSMs

50 Interrupts “The nice thing about standards is that there are so many of them to choose from.”

51 Sequencer Built to handle complex operations
Interrupts, block load/store Is basically a clocked ROM file. Has a start address and a start signal Runs through a sequence of instructions in the ROM file until sequence signals it is done. One instruction per cycle is injected into instruction stream, Fetch stage is stalled.

52 Instructions: Data Processing, Multiply and CLZ
These instructions move linearly through the pipeline, and don’t require stalls as they are all single cycle in our implementation. Present some data hazard problems, but hazard detection and forwarding logic maintains linear execution.

53 Branch On decode, branch immediately adds the PC to the shifted offset and updates the PC. No stall necessary, since PC is updated before the next instruction is fetched. Branch w/link has r14 updated when branch finishes moving through the entire pipeline.

54 LDR, STR Used asynchronous logic to make LDR and STR single cycle. During the first part of the clock cycle, the updated base register is written, the writeback register is changed, and the value is loaded from memory into that register. Simplifies load and store logic greatly.

55 Multicycle Instructions
Multiple Register Transfer Swap Implemented with our sequencer: Each of these instructions translates into a sequence of single cycle instructions. These instructions are modified to correspond with the specific multicycle instruction.

56 "Time commitment--eternity.“
Where are we now? "Time commitment--eternity.“ --CTEC All 5 stages and Memory/Cache integrated. Data Processing, Multiply, CLZ, Shifting, Load, Store, Branch, MRS, MSR Not yet fully functional: Load/Store Multiple Swap Conditional execution (in regards to branch) Interrupts


Download ppt "Team Antelope Final Presentation"

Similar presentations


Ads by Google