Project Tutorial COMP4611 Tutorial 7 29, 30 Oct 2014.

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Pipelining and Control Hazards Oct
1 SS and Pipelining: The Sequel Data Forwarding Caches Branch Prediction Michele Co, September 24, 2001.
Dynamic Branch Prediction
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Branch Prediction in SimpleScalar
SimpleScalar CS401. A Computer Architecture Simulator Primer What is an architectural simulator? – Tool that reproduces the behavior of a computing device.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Branch Target Buffers BPB: Tag + Prediction
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.
Analysis of Branch Predictors
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
1 SimpleScalar3.0: Selected Topics aka Hopefully enough to get you started Michele Co September 10, 2001 Department of Computer Science University of Virginia.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Dynamic Branch Prediction
CSL718 : Pipelined Processors
Lecture: Out-of-order Processors
COSC6385 Advanced Computer Architecture Lecture 9. Branch Prediction
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
Lecture: Branch Prediction
COSC3330 Computer Architecture Lecture 15. Branch Prediction
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
CMSC 611: Advanced Computer Architecture
Operation System Program 4
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
So far we have dealt with control hazards in instruction pipelines by:
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Lecture: Branch Prediction
Lecture: Out-of-order Processors
Dynamic Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Pipelining: dynamic branch prediction Prof. Eric Rotenberg
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Aliasing and Anti-Aliasing in Branch History Table Prediction
rePLay: A Hardware Framework for Dynamic Optimization
So far we have dealt with control hazards in instruction pipelines by:
Gang Luo, Hongfei Guo {gangluo,
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

Project Tutorial COMP4611 Tutorial 7 29, 30 Oct 2014

Outline Objectives Background Information Project Task I Task description Project Task II Skeleton code Project Task III Deliverables Submission & Grading

Objectives To have hands-on experiments with the branch prediction using SimpleScalar To design your own branch predictor for higher prediction accuracy

Background Default Branch predictors in SimpleScalar Taken or not-taken 2-bit saturating counter predictor 2-level adaptive predictor (correlating predictor)

Background Specifying the branch predictor type Option: -bpred <type> <type> nottaken always predict not taken taken always predict taken bimod bimodal predictor, using a branch target buffer (BTB) with 2-bit saturating counters 2lev 2-level adaptive predictor comb combined predictor (bimodal and 2-level adaptive, the technique uses bimod and 2-level adaptive predictors and selects the one which is performing best for each branch) Default is a bimodal predictor with 2048 entries

Background – static predictors Configuring taken/nottaken predictor Option: -bpred taken, -bpred nottaken Command line example for “-bpred taken”: ./sim-bpred -bpred taken benchmarks/cc1.alpha -O benchmarks/1stmt.1 Command line example for “-bpred nottaken”: ./sim-bpred -bpred nottaken benchmarks/cc1.alpha -O benchmarks/1stmt.i

Simulating Not-Taken Predictor Branch prediction accuracy: bpred_dir_rate = total number of direction predicted hits total number of branches executed

Background – Bimodal Predictor(aka 2-bit predictor) Configuring the bimodal predictor -bpred:bimod <size> <size>: the size of direct-mapped branch target buffer (BTB) Command line example: ./sim-bpred -bpred:bimod 2048 benchmarks/cc1.alpha -O benchmarks/1stmt.i T NT 11 10 00 Predict: Taken 01 Predict: Not Taken

Background – Bimodal Predictor(aka 2-bit predictor) The figure shows a table of counters indexed by the low order address bits in the program counter. Each counter is two bits long. For each taken branch, the appropriate counter is incremented. Likewise for each not-taken branch, the appropriate counter is decremented. In addition, the counter is saturating: the counter is not decremented past zero, nor is it incremented past three. The most significant bit determines the prediction.

Background – 2-level adaptive predictor Configuring 2-level adaptive predictor -bpred:2lev <l1size> <l2size> <hist_size> <xor> <l1size>: the number of entries in the first-level table <l2size>: the number of entries in the second-level table <hist_size>: the history width <xor>: xor the history and the address in the second level of the predictor Command line example: ./sim-bpred –bpred 2lev -bpred:2lev 1 1024 8 0 benchmarks/cc1.alpha -O benchmarks/1stmt.i branch address pattern history 2-bit predictors l1size l2size hist_size

Background – 2-level adaptive predictor First step to map instruction to a table of branching history Second step to map each history pattern into another table of 2 bit predictor e.g. <l1size> = 1, <hist_size> = 2, <l2size> = 4, <xor> = 0. Suppose the branch sequence is 001001001… there are 2^2 = 4 possible branching history pattern Entry number 00 in history will map to 2-bit predictor of 11 state, because branch is always taken after 2 consecutive not taken in the sequence Same logic, 01 in history will map to 2-bit predictor of 00 state, because branch is always NOT taken after first not taken followed by taken in the sequence

Benchmark for the Project Go (Alpha) http://www.eecs.umich.edu/~taustin/eecs573_public/instruct-progs.tar.gz The benchmark program Go is an Artificial Intelligent algorithm for playing the game Go against itself. The input file for the benchmark is 2stone9. It runs 550 million instructions.

Project Task I Evaluation of bimodal branch predictor Varied number of table entries (512, 1024, 2048) Benchmark: Go (Alpha) Input file for Go: 2stone9.in Branch prediction accuracy (bpred_dir_rate) and command lines to be included in the project report Command line example: ./sim-bpred -redir:prog results/go-2bit-512-2-17.progout -redir:sim results/go-2bit-512-2-17.simout -bpred:bimod 512 benchmarks/go.alpha 2 17 benchmarks/2stone9.in

Project Task II Write a 3-bit branch predictor on SimpleScalar Implementation Evaluation Guideline Skeleton code at http://course.cse.ust.hk/comp4611/project/comp4611_project_2014_fall.tar.gz Extract the package and compile it by typing “make config-alpha” and then “make” Fill in the missing code in “bpred.c ” and recompile

Code Glimpse Workflow of branch prediction in SimpleScalar main() sim_check_options() bpred_create() bpred_dir_create() allocate BTB sim_main() bpred_lookup() bpred_update() pred->dir_hits *(dir_update_ptr->pdir1)

Code Glimpse Source code that implements the branch predictor is in simplesim-3.0/ sim-bpred.c: simulating a program with configured branch prediction instance bpred.c: implementing the logic of several branch predictors bpred.h: defining the structure of several branch predictors main.c: simulating a program on SimpleScalar

Code Glimpse – important data structures bpred.h bpred_class: branch predictor types (enumeration) enum bpred_class { BPredComb, /* combined predictor (McFarling) */ BPred2Level, /* 2-level correlating pred w/2-bit counters */ BPred2bit, /* 2-bit saturating cntr pred (dir mapped) */ BPredTaken, /* static predict taken */ BPredNotTaken, /* static predict not taken */ BPred_NUM }; bpred_t: branch predictor structure bpred_dir_t: branch direction structure bpred_update_t: branch state update structure (containing predictor state counter) bpred_btb_ent_t: entry structure in a BTB

Code Glimpse Predictor’s state counter struct bpred_update_t { char *pdir1; /* direction-1 predictor counter */ char *pdir2; /* direction-2 predictor counter */ char *pmeta; /* meta predictor counter */ struct { /* predicted directions */ unsigned int ras : 1; /* RAS used */ unsigned int bimod : 1; /* bimodal predictor */ unsigned int twolev : 1; /* 2-level predictor */ unsigned int meta : 1; /* meta predictor (0..bimod / 1..2lev) */ } dir; };

Code Glimpse – Important Interfaces bpred.c bpred_create(): create a new branch predictor instance bpred_dir_create(): create a branch direction instance bpred_lookup(): predict a branch target bpred_dir_lookup(): predict a branch direction bpred_update(): update an entry in BTB

Project Task II A 3-bit branch predictor has 8 states in total NT NT 111 110 101 100 T T T T Predict Taken NT NT NT NT NT 000 001 010 011 T T T Predict Not Taken

Project Task II Command line interface for your 3-bit predictor -bpred:tribit <size> <size>: the size of direct-mapped BTB Command line example: ./sim-bpred -v -redir:prog results/go-3bit-2048-2-17.progout -redir:sim results/go-3bit-2048-2-17.simout -bpred:tribit 2048 benchmarks/go.alpha 2 17 benchmarks/5stone21.in Command must include verbose option –v

Skeleton Code branch_lookup() in bpred.c /* comp4611 3-bit predict saturating cntr pred (dir mapped) */ if (pbtb == NULL) { if (pred->class != BPred3bit) { return ((*(dir_update_ptr->pdir1) >= 2)? /* taken */ 1 : /* not taken */ 0); } else { // code to be filled in here …… /************************************************************/

Skeleton Code branch_update() in bpred.c /* comp4611 3-bit predict saturating cntr pred (dir mapped) */ if (dir_update_ptr->pdir1) { if (pred->class != BPred3bit) { ……. } else { if (taken) { // code to be filled in here else { /* not taken */ /****************************************************/

Evaluation 3-bit predictor with the table size as 2048 Benchmark: Go (Alpha) Parameters for Go: 2 17 Input file for Go: 2stone9.in Branch prediction accuracy and command line to be included in the project report Output trace files (using the verbose option: –v) are the redirected program and simulation output should be saved in the “results” directory can be larger than a few GBs and make sure you have sufficient disk storage for them in your PC

Project Task III Design and implement your own predictor Use existing predictors (e.g. 2-level) or create your own predictor to achieve higher accuracy than the 2-bit predictor Evaluate your predictor using Go (Alpha) Parameters for Go: 2 17 Input file for Go : 2stone9.in Restricted additional peak memory usage at 16MB Use “peak_win.sh” or “peak_linux.sh” script to keep track of peak memory usage of the SimpleScalar. Use 3bit predictor at table size of 1 as reference Any designs exceeding restriction of 16MB will not be graded

Project Task III Table size at 1 for 3bit-predictor, memory peak at 4584KB Command used : ./peak_win.sh ./sim-bpred.exe -bpred tribit -bpred:tribit 1 benchmarks/go.alpha 2 17 benchmarks/2stone9.in > /dev/null

Project Task III Table size at 1048576 for 3bit-predictor, memory peak at 5616KB 5616KB – 4584KB = around 0.9MB additional memory used Command used : ./peak_win.sh ./sim-bpred.exe -bpred tribit -bpred:tribit 1048576 benchmarks/go.alpha 2 17 benchmarks/2stone9.in > /dev/null

Project Task III For those who are using Cygwin in Windows, please use “peak_win.sh” to measure peak memory usage For those who are using Linux/VM, please use “peak_linux.sh” to measure peak memory usage Note: The peak memory usage maybe different across different platforms. Don’t worry, we will compare your predictor memory usage with standard 3bit predictor with table size at 1 in the same platform.

Project Task III Branch prediction accuracy, peak memory and command line must be included in the project report Output trace files (using option –v) the redirected program (“go-own.progout”) and simulation output (“go-own.simout”) should be saved in the “results” directory can be larger than a few GBs and make sure you have sufficient disk storage for them in your PC these trace files will be collect separately later

Deliverables Put the following into a directory that is created using your group ID: comp4611_proj_<groupID>/ Project report (no longer than 2 pages) Evaluation result (2-bit, 3-bit, your own predictor) Description of your own predictor Source code Code for 3-bit predictor: bpred.c, saved as “3bit/bpred.c” Code for your own predictor: including bpred.h, bpred.c, sim-bpred.c , readme (specify your command line format) and other relevant files, saved under “own/” Zip the folder into “comp4611_proj_<groupID>.zip” Submit the zip file to CASS Output trace files Output trace files for 3-bit predictor Output trace files for your own predictor To be submitted separately (not CASS)

Deliverables Sample List of files to be submitted to CASS(comp4611_proj_<groupID>.zip) : comp4611_proj_<groupID>/report.doc comp4611_proj_<groupID>/3bit/bpred.c comp4611_proj_<groupID>/own/bpred.c comp4611_proj_<groupID>/own/bpred.h comp4611_proj_<groupID>/own/sim-bpred.c comp4611_proj_<groupID>/own/readme comp4611_proj_<groupID>/own/… Sample List of output trace files to be submitted separately (Not to CASS): go-3bit-2048-2-17.simout go-3bit-2048-2-17.progout go-own.simout go-own.progout

Grading Scheme 2-bit predictor (15%) Correctness 3-bit predictor (35%) Your own predictor (40%) If correct, valid and under 16MB additional memory restriction score = max{0, (prediction accuracy – 0.8) * 200 } Project report (10%) Completeness Clarity

Project Task III Brainstorming What are the design philosophy of the predictors we have learnt? What are their advantages and disadvantages? How researchers tackle these flaws? How would you tackle these flaws? Some ideas: Static prediction Saturating counter Two-level adaptive predictor Local branch prediction Global branch prediction Hybrid predictor … More can be found: http://en.wikipedia.org/wiki/Branch_predictor

Submission Guideline Submit your source code and report to CASS Report should contain your group ID, each group member’s name, UST ID, email on the first page Package the code and report files in one zip file as “comp4611_proj_groupID.zip” Deadline: 23:59 Nov 30, 2014 Submit the hardcopy of your report to the homework box On the first page of your report , there should be your group ID, and each group member’s name, UST ID, email Submission of your output trace files will be informed later

References SimpleScalar LLC: www.simplescalar.com Introduction to SimpleScalar: www.ecs.umass.edu/ece/koren/architecture/Simplescalar/SimpleScalar_introduction.htm SimpleScalar Tool Set: http://www.ece.uah.edu/~lacasa/tutorials/ss/ss.htm

Appendix Branch direction structure struct bpred_dir_t { enum bpred_class class; /* type of predictor */ union { struct { unsigned int size; /* number of entries in direct-mapped table */ unsigned char *table; /* prediction state table */ } bimod; …… } config; };

Appendix sim-bpred.c sim_main: execute each conditional branch instruction sim_reg_options: register command options sim_check_options: determine branch predictor type and create a branch predictor instance pred_type: define the type of branch predictor *_nelt & *_config[]: configure the parameters for each branch predictor

Appendix sim_main() in sim-bpred.c if (MD_OP_FLAGS(op) & F_CTRL) { if (pred) { pred_PC = bpred_lookup(pred, …, &update_rec, …); if (!pred_PC) { pred_PC = regs.regs_PC + sizeof(md_inst_t); } bpred_update(pred, …, &update_rec, …); …… regs.regs_PC = regs.regs_NPC; regs.regs_NPC += sizeof(md_inst_t);

Appendix main() in main.c opt_reg_flag(sim_odb, …); …… sim_odb = opt_new(orphan_fn); opt_reg_flag(sim_odb, …); …… sim_reg_options(sim_odb); opt_process_options(sim_odb, argc, argv); sim_check_options(sim_odb, argc, argv); sim_reg_stats(sim_sdb); sim_main();