Download presentation
Presentation is loading. Please wait.
1
RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing Laboratory University of California, Berkeley
2
RAMP Gold Overview Tiled CMP simulator ISA: SPARC V8 – (ARM/Thumb-2 later?) Split timing and function (both on FPGA) Host-multithreaded Runs on V5LX110T (XUP) Par Lab InfiniCore Functional Model Pipeline Arch State Timing Model Pipeline Timing State
3
RAMP Gold Target Machine SPARC V8 CORE SPARC V8 CORE I$ D$ DRAM Shared L2$ / Interconnect SPARC V8 CORE SPARC V8 CORE I$ D$ SPARC V8 CORE SPARC V8 CORE I$ D$ SPARC V8 CORE SPARC V8 CORE I$ D$ … 64 cores
4
RAMP Gold v1 Target Features 64 single issue in-order SPARCv8 processors – Simple, 5-stage pipeline – FPU Cache Timing model – Configurable size, line size, associativity, miss penalty, shared/private – Change parameters without resynthesis
5
RAMP Gold Architecture Mapping the target machine directly to an FPGA is inefficient Solution: split timing and functionality + Multithreading – The timing logic decides how many target cycles an instruction sequence should take – Simulating the functionality of an instruction might take multiple host cycles
6
Function/Timing Split Advantages Flexibility – Can configure target at runtime – Synthesize design once, change target model parameters at will Efficient FPGA resource usage – Example 1: model a 2-cycle FPU in 10 host cycles – Example 2: model a 16MB L2$ using only 256KB host BRAM to store tags/metadata Enables multithreading
7
Split Timing and Function Functional model executes ISA correctly Timing model determines how long a program takes to run CPU L1 D$ MEM = Target Machine CPU FM MEM FM Functional ModelTiming Model CPU TM L1 D$ TM MEM TM L1 D$ FM +
8
Functional model executes ISA correctly Timing model determines how long a program takes to run CPU L1 D$ MEM CPU FM MEM FM = Target MachineFunctional ModelTiming Model CPU TM L1 D$ TM MEM TM + Split Timing and Function
9
TM + FM from 30,000 ft CPU Timing Model CPU Timing Model L1 D$ Timing Model CPU Functional Model CPU Functional Model Memory Timing Model Memory Timing Model Memory Functional Model Memory Functional Model instruction ld/st address store data ld/st addressstall load data ld/st address store data stall instruction complete
10
TM + FM from 3,000 ft Memory Timing Model Memory Timing Model Memory Functional Model Memory Functional Model instruction ld/st address, store data ld/st addressstall load data ld/st address, store data stall instruction complete CPU TM IF CTRL DEC EX MEM WB CPU FM TM1 TM2 L1 D$ TM
11
Example: Target Load Miss Memory Timing Model Memory Timing Model Memory Functional Model Memory Functional Model instruction ld/st address, store data ld/st addressstall load data ld/st address, store data stall instruction complete CPU TM IF CTRL DEC EX MEM WB CPU FM TM1 TM2 L1 D$ TM 1 1 2 2 3 3 4 4 4 4 4 4 5 5 6 6 7 7
12
Timing-Driven Host Pipeline TS IF DE EX WB MEM2 TM1 TARGET MEMORY TM/FM TM2 TM3 L1 D$ TM MEM1 Store Buffer Load Result Buffer CPU/D$ Timing Model CPU Functional Model {TID,INST}{TID,ADDR} T0T1T2 ADDLDST LD ADD
13
Cache Modeling The cache model maintains tag, state, protocol bits internally Whenever the functional model issues a memory operation, the cache model determines how many target cycles to stall … tag index offset tag, state = = = = = = hit/miss associativity
14
Multithreaded, Pipelined Cache TM tag, state = = Address tag, state = = = = Index hit?
15
Quick & Dirty Validation 32KB, 2-way L1 D$, 64B lines 256KB, 4-way L2$, 64B lines
16
Status Functional + simple timing model work in HW – Running real programs (e.g. SPLASH2) Near term future work – Move from current “functional-first + stall” configuration to timing-driven described here – More interesting memory system timing model – Functional potpourri (FDIV, MMU, …)
17
DEMO Run OCEAN with different L1 D$ parameters
18
Questions? Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.