FAT predictor Sabareesh Ganapathy, Prasanna Venkatesh Srinivasan, Maribel Monica.

Slides:



Advertisements
Similar presentations
André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Advertisements

H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.
TAGE-SC-L Branch Predictors
Image Processing A brief introduction (by Edgar Alejandro Guerrero Arroyo)
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Sampling, Reconstruction, and Elementary Digital Filters R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Chapter 15 Digital Signal Processing
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
CS 7810 Lecture 9 Effective Hardware-Based Data Prefetching for High-Performance Processors T-F. Chen and J-L. Baer IEEE Transactions on Computers, 44(5)
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Requirements Determine processor core Determine the number of hardware profiles and the benefits of each profile Determine functionality of each profile.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Edge Detection. 256x256 Byte image UART interface PC FPGA 1 Byte every a few hundred cycles of FPGA Sobel circuit Edge and direction.
NISC set computer no-instruction
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
SIMD Implementation of Discrete Wavelet Transform Jake Adriaens Diana Palsetia.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Dynamic Branch Prediction
Lecture: Out-of-order Processors
The content of lecture This lecture will cover: Fourier Transform
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
COSC3330 Computer Architecture Lecture 15. Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
Samira Khan University of Virginia Dec 4, 2017
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
Dynamic Branch Prediction
Lecture 10: Branch Prediction and Instruction Delivery
TAGE-SC-L Again MTAGE-SC
Serene Banerjee, Lizy K. John, Brian L. Evans
Adapted from the slides of Prof
The O-GEHL branch predictor
Gang Luo, Hongfei Guo {gangluo,
Phase based adaptive Branch predictor: Seeing the forest for the trees
Presentation transcript:

FAT predictor Sabareesh Ganapathy, Prasanna Venkatesh Srinivasan, Maribel Monica

UNIVERSITY OF WISCONSIN-MADISON What is FAT? :) FAT is a Frequency-Analysis based branch predictor, integrated with TAGE. Frequency analysis involves studying the frequency transformation characteristics of a branch to predict the branch outcome as Taken/ Not taken. Historical context : Frequency analysis using Fourier Transform has been explored in FAB [1] by static profiling of the branch frequency characteristics across different workloads. FAT is a dynamic branch predictor. [1] M.Kampe, P.Stenstromand M. Dubois, “The FAB predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches”,Proceedings of the Eighth International Symposium on High Performance Computer Architecture (HPCA), 2002.

UNIVERSITY OF WISCONSIN-MADISON Underlying philosophy History repeats itself! May ‘15 December ‘15 May ‘16

UNIVERSITY OF WISCONSIN-MADISON Frequency Analysis Local history table 256 entries, 128 bits for each PC-based address Frequency table 256-entry 4-way set associative PC and Global history-based address b0b1 bn pc FABentry0FABe1FABe2FABe3 TAG IFFT.. ghist hash LHRtableFtable TAG1 TAG2IFFT Time Count THConfid

UNIVERSITY OF WISCONSIN-MADISON TAGE predictor …… T12 ……… TAGE: 1 bimodal predictor and 12 tables of TAGE used. Minimum history length = 4 and maximum global history=640 Folded history and PC are hashed to compute the index, TAG for each TAGE entry

UNIVERSITY OF WISCONSIN-MADISON FAB + TAGE algorithm Yes No

UNIVERSITY OF WISCONSIN-MADISON Update FAB Time_count==3 Increment time_count Calculate FFT from LHR Remove DC component Normalize FFT Filter Top N f freq comp Compute IFFT from filtered array and store Compute TH and store Yes No Update threshold Threshold = *Freq_sum / (Number of frequency components N f ), where Freq_sum = sum of absolute values of filtered array.

UNIVERSITY OF WISCONSIN-MADISON Infrastructure CBP-2016 infrastructure was used. Branch traces for server and mobile benchmarks were provided. The main program decodes the instructions and passes only conditional branches to the predictor. The predictor function was written for our custom predictor. FFTW library was used for computing FFT and DCT transforms in C++ program.

UNIVERSITY OF WISCONSIN-MADISON Perl script was written to scan the edge sequence in trace files. Dependencies between edges was determined and the local history was found for all branches allocated on FAB table. Local history determined using script was used in MATLAB. FAB predictor was modelled in MATLAB and analysis was performed to determine the threshold and number of frequency components required for correct prediction. MATLAB Analysis

UNIVERSITY OF WISCONSIN-MADISON Parameters such as local history register bits, # frequency components were varied to observe the effect on misprediction rate. FAB predictor was modified to use Discrete Cosine Transform instead of Discrete Fourier Transform. DC component was used in prediction when local history register was all 1’s. Regression analysis in CBP infrastructure

UNIVERSITY OF WISCONSIN-MADISON X axis label TechniqueFFT DCT – No opt DCT - opt Time count LHR length LHR table entries 2^14 2^202^14 2^10

UNIVERSITY OF WISCONSIN-MADISON # Frequency Components

UNIVERSITY OF WISCONSIN-MADISON HW Budget and Implementation ConfigurationStorage 1 Bimodal + 12 TAGE tables 250 Kbits 1 Bimodal + 11 TAGE + 1 LHR table + 1 FAB table 370 Kbits TAG1TAG2IFFT/IDCT Time Count ThresholdConfidence Implementation of frequency transform(DCT/FFT) in HW is complex. Stochastic implementation of DCT was explored. FAB ENTRY

UNIVERSITY OF WISCONSIN-MADISON Stochastic Logic 1,1,0,1,0,1,1,1 1,1,0,0,0,0,1,0 a = 6/8 1,1,0,0,1,0,1,0 b = 4/8 c = 3/8 A real value x(0-1) is represented by sequence of random bits. Simple logic and fault tolerant characteristics. Applicable to frequency transforms and image processing. Slide content derived from Mark Reidel's circuits course in UMinn

UNIVERSITY OF WISCONSIN-MADISON Stochastic DCT Xc(k) = (1/N) Σ x(n)cos(k2πn/N), k=0...N-1. Steps were taken for finding top frequency components, thresholding and IDCT. Results comparable to DCT using fftw library. (result degrades by 5%). Angle Mapper (0-pi/4) SNG Cos(x) Cos(2x) Cos(4x) Cos(8x) Multiplier, Adder Branch History DCT

UNIVERSITY OF WISCONSIN-MADISON FTAGE Using filtered local history as tag and index for a Pattern history table. FILTER FTABLE Local History FFT Filtered History Choose top 10 IFFT+Thres holding TAG128 bit filtered history History folding PC TAGCTRConfid FTAGE-TABLE Used for prediction

UNIVERSITY OF WISCONSIN-MADISON Results The best way to predict the future is to invent it – Alan Kay.

UNIVERSITY OF WISCONSIN-MADISON FTAGE-Future Work An IIR filter can be used for filtering. The top ten frequency components were measured for a number of traces.

UNIVERSITY OF WISCONSIN-MADISON References [1] M.Kampe, P.Stenstromand M. Dubois, “The FAB predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches”,Proceedings of the Eighth International Symposium on High Performance Computer Architecture (HPCA), [2] A.Seznecand P. Michaud,“A case for (partially) Tagged Geometric history length branch prediction”,Journal of Instruction Level Parallelism, Feb [3] Weikang Qian, Xin Li, Marc D. Riedel, Kia Bazargan, and David J. Lilja, “An Architecture for Fault-Tolerant Computationwith Stochastic Logic”, IEEE Transactions on Computers, Vol 60, pp [4]Xiaowei Qin, Shenglong Shang, Adong Fan, “Low-complexity FPGA Implementation of Sine/CosineGenerator Based on Stochastic Computation”.

UNIVERSITY OF WISCONSIN-MADISON