1 SS and Pipelining: The Sequel Data Forwarding Caches Branch Prediction Michele Co, September 24, 2001.

Slides:



Advertisements
Similar presentations
ECE ECE1773 Spring ‘02 © A. Moshovos (Toronto) Simplescalar’s out-of-order simulator (v3) ECE1773 Andreas Moshovos Visit for additional.
Advertisements

Morgan Kaufmann Publishers The Processor
Pipelining and Control Hazards Oct
SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Branch Prediction in SimpleScalar
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Chapter 12 Pipelining Strategies Performance Hazards.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Goal: Reduce the Penalty of Control Hazards
Chapter 12 CPU Structure and Function. Example Register Organizations.
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.
CH12 CPU Structure and Function
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Computer Architecture Pipeline
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
1 SimpleScalar3.0: Selected Topics aka Hopefully enough to get you started Michele Co September 10, 2001 Department of Computer Science University of Virginia.
CDA 5155 Week 3 Branch Prediction Superscalar Execution.
Simulator Outline of MIPS Simulator project  Write a simulator for the MIPS five-stage pipeline that does the following: Implements a subset of.
Instruction-Level Parallelism and Its Dynamic Exploitation
Lecture: Out-of-order Processors
CS 352H: Computer Systems Architecture
Computer Organization CS224
Data Prefetching Smruti R. Sarangi.
CS2100 Computer Organization
Computer Structure Advanced Branch Prediction
Computer Architecture Advanced Branch Prediction
ELEN 468 Advanced Logic Design
PowerPC 604 Superscalar Microprocessor
Single Clock Datapath With Control
Decompression After The Cache For Compressed Code Execution
Pipeline Implementation (4.6)
Morgan Kaufmann Publishers The Processor
Computer Architecture Lecture 3
Pipelining review.
The processor: Pipelining and Branching
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
Lecture: Branch Prediction
Lecture: Out-of-order Processors
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
Lecture 20: OOO, Memory Hierarchy
Data Prefetching Smruti R. Sarangi.
Computer Architecture
Control unit extension for data hazards
Dynamic Hardware Prediction
Control unit extension for data hazards
Conceptual execution on a processor which exploits ILP
Computer Structure Advanced Branch Prediction
Presentation transcript:

1 SS and Pipelining: The Sequel Data Forwarding Caches Branch Prediction Michele Co, September 24, 2001

2 Outline Overview: The Big Picture SimpleScalar Code needed to do this

3 Data Forwarding IFIDEXMEMWB

4 Data Forwarding Before: –Hazard detect: Need operands from ex_mem_s or mem_wb_s? – stall Now: –What’s still a hazard? ex_mem_s –load? still a hazard, wait till it reaches mem_wb_s –others? not a hazard anymore due to forwarding mem_wb_s –nothing is really a hazard here anymore due to forwarding

5 Caches IFIDEXMEMWB I-cacheD-cache

6 Caches Instruction Cache –probed on every fetch in IF Data Cache –probed in MEM if performing a load/store Getting them into sim-pipe2.c –declare vars for structures –sim_reg_options() –sim_check_options() –sim_reg_stats() –cache_access() in appropriate stages

7 SimpleScalar Code for Caches /* level 1 instruction cache, entry level instruction cache */ static struct cache_t *cache_il1 = NULL; /* level 2 instruction cache */ static struct cache_t *cache_il2 = NULL; /* level 1 data cache, entry level data cache */ static struct cache_t *cache_dl1 = NULL; /* level 2 data cache */ static struct cache_t *cache_dl2 = NULL;

8 sim_reg_options() Register all commandline options –level 1 data and instruction caches sizes/configurations latency –memory width latency –for our purposes set -mem:lat to 10 0 as a default

9 sim_reg_options() From sim-outorder.c : /* cache options */ opt_reg_string(odb, "-cache:dl1", "l1 data cache config, i.e., { |none}", &cache_dl1_opt, "dl1:256:32:1:l", /* print */TRUE, NULL); opt_reg_string(odb, "-cache:il1", "l1 inst cache config, i.e., { |dl1|dl2|none}", &cache_il1_opt, "il1:256:32:1:l", /* print */TRUE, NULL); /* cache latency options */ opt_reg_int(odb, "-cache:dl1lat", "l1 data cache hit latency (in cycles)", &cache_dl1_lat, /* default */1, /* print */TRUE, /* format */NULL); /* memory options */ opt_reg_int_list(odb, "-mem:lat", "memory access latency ( )", mem_lat, mem_nelt, &mem_nelt, mem_lat, /* print */TRUE, /* format */NULL, /* !accrue */FALSE); opt_reg_int(odb, "-mem:width", "memory access bus width (in bytes)", &mem_bus_width, /* default */8, /* print */TRUE, /* format */NULL);

10 sim_check_options Check if options are specified correctly by user –cache options if ok, go ahead and create the cache using cache_create() –memory latency options if negative call fatal

11 sim_check_options() /* use a level 1 D-cache? */ if (!mystricmp(cache_dl1_opt, "none")) { cache_dl1 = NULL; /* the level 2 D-cache cannot be defined */ if (strcmp(cache_dl2_opt, "none")) fatal("the l1 data cache must defined if the l2 cache is defined"); cache_dl2 = NULL; } else /* dl1 is defined */ { if (sscanf(cache_dl1_opt, "%[^:]:%d:%d:%d:%c", name, &nsets, &bsize, &assoc, &c) != 5) fatal("bad l1 D-cache parms: : : : : "); cache_dl1 = cache_create(name, nsets, bsize, /* balloc */FALSE, /* usize */0, assoc, cache_char2policy(c), dl1_access_fn, /* hit latency */1); /* is the level 2 D-cache defined? */ if (!mystricmp(cache_dl2_opt, "none")) cache_dl2 = NULL; else { if (sscanf(cache_dl2_opt, "%[^:]:%d:%d:%d:%c", name, &nsets, &bsize, &assoc, &c) != 5) fatal("bad l2 D-cache parms: " " : : : : "); cache_dl2 = cache_create(name, nsets, bsize, /* balloc */FALSE, /* usize */0, assoc, cache_char2policy(c), dl2_access_fn, /* hit latency */1); }

12 sim_reg_stats() register appropriate cache stats depending on the cache config you’re using: if (cache_dl1) cache_reg_stats(cache_dl1, sdb); if (cache_dl2) cache_reg_stats(cache_dl2, sdb);

13 Accessing the cache unsigned int/* latency of access in cycles */ cache_access(struct cache_t *cp,/* cache to access */ enum mem_cmd cmd,/* access type, Read or Write */ md_addr_t addr,/* address of access */ void *vp,/* ptr to buffer for input/output */ int nbytes,/* number of bytes to access */ tick_t now,/* time of access */ byte_t **udata,/* for return of user data ptr */ md_addr_t *repl_addr);/* for address of replaced block */ EXAMPLE: /* read cache */ if(cache_dl1) cache_lat = cache_access(cache_dl1, Read, ex_mem_s.addr, NULL, ex_mem_s.ls_size, sim_cycle, NULL, NULL);

14 Branch Prediction IFIDEXMEMWB Branch predictor bpred_lookup()bpred_update()

15 Branch Prediction Bimodal branch predictor (sim-bpred.c) –declarations –sim_reg_options() –sim_check_options() –sim_reg_stats() –accessing and updating the predictor Also a macro to add target PC /* target program counter */ #undef SET_TPC #define SET_TPC(EXPR) (target_PC = (EXPR))

16 Declarations /* branch predictor type {nottaken|taken|perfect|bimod|2lev} */ static char *pred_type; /* branch predictor */ static struct bpred_t *pred; /* total number of branches executed */ static counter_t sim_num_branches = 0; /* bimodal predictor config ( ) */ static int bimod_nelt = 1; static int bimod_config[1] = { /* bimod tbl size */2048 }; /* BTB predictor config ( ) */ static int btb_nelt = 2; static int btb_config[2] = { /* nsets */512, /* assoc */4 }; /* return address stack (RAS) size */ static int ras_size = 8;

17 sim_reg_options() Register commandline options for specifying the configuration and type of branch predictor (PHT, BTB, RAS)

18 sim_reg_options() /* branch predictor options */ opt_reg_string(odb, "-bpred", "branch predictor type {nottaken|taken|bimod|2lev|comb}", &pred_type, /* default */"bimod", /* print */TRUE, /* format */NULL); opt_reg_int_list(odb, "-bpred:bimod", "bimodal predictor config ( )", bimod_config, bimod_nelt, &bimod_nelt, /* default */bimod_config, /* print */TRUE, /* format */NULL, /* !accrue */FALSE); opt_reg_int(odb, "-bpred:ras", "return address stack size (0 for no return stack)", &ras_size, /* default */ras_size, /* print */TRUE, /* format */NULL); opt_reg_int_list(odb, "-bpred:btb", "BTB config ( )", btb_config, btb_nelt, &btb_nelt, /* default */btb_config, /* print */TRUE, /* format */NULL, /* !accrue */FALSE);

19 sim_check_options() Make sure that the user specifies the “right” kind of information

20 sim_check_options() /* check branch predictor options */ if (!mystricmp(pred_type, "bimod")) { if (bimod_nelt != 1) fatal("bad bimod predictor config ( )"); if (btb_nelt != 2) fatal("bad btb config ( )"); /* bimodal predictor, bpred_create() checks BTB_SIZE */ pred = bpred_create(BPred2bit, /* bimod table size */bimod_config[0], /* 2lev l1 size */0, /* 2lev l2 size */0, /* meta table size */0, /* history reg size */0, /* history xor address */0, /* btb sets */btb_config[0], /* btb assoc */btb_config[1], /* ret-addr stack size */ras_size); } else fatal("We don't support predictor type `%s' at this time", pred_type);

21 sim_reg_stats() register counters for number of branches executed register bpred provided stats: /* register branch prediction related stats */ stat_reg_counter(sdb, "sim_num_branches", "total number of branches executed", &sim_num_branches, /* initial value */0, /* format */NULL); stat_reg_formula(sdb, "sim_IPB", "instruction per branch", "sim_num_insn / sim_num_branches", /* format */NULL); if (pred) bpred_reg_stats(pred, sdb);

22 Accessing the branch predictor md_addr_t/* predicted branch target addr */ bpred_lookup(struct bpred_t *pred,/* branch predictor instance */ md_addr_t baddr,/* branch address */ md_addr_t btarget,/* branch target if taken */ enum md_opcode op,/* opcode of instruction */ int is_call,/* is fn call */ int is_return,/* is fn return */ struct bpred_update_t *dir_update_ptr, /* pred state pointer */ int *stack_recover_idx); /* Non-speculative top-of-stack used on mispredict recovery */

23 Updating the predictor void bpred_update(struct bpred_t *pred,/* branch predictor instance */ md_addr_t baddr,/* branch address */ md_addr_t btarget,/* resolved branch target */ int taken,/* non-zero if branch was taken */ int pred_taken,/* non-zero if branch was pred taken */ int correct,/* was earlier prediction correct? */ enum md_opcode op,/* opcode of instruction */ struct bpred_update_t *dir_update_ptr); /* pred state pointer */