Power and Frequency Analysis for Data and Control Independence in Embedded Processors Farzad Samie Amirali Baniasadi Sharif University of Technology University.

Slides:

Advertisements

Similar presentations

NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer.

Advertisements

Increasing the Energy Efficiency of TLS Systems Using Intermediate Checkpointing Salman Khan 1, Nikolas Ioannou 2, Polychronis Xekalakis 3 and Marcelo.

Final Project : Pipelined Microprocessor Joseph Kim.

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University

Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.

CS 7810 Lecture 7 Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching E. Rotenberg, S. Bennett, J.E. Smith Proceedings of MICRO-29.

June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.

UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,

Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

Data value prediction Bas van der Tol. Limits to ILP Instruction Level Parallelism is limited by Control flow Data flow: true dependencies.

1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.

1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )

Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.

Better Branch Prediction Through Prophet/Critic Hybrids A. Falcón, J. Stark, A. Ramirez, K. Lai, M. Valero Paper Presentation and Discussion.

CS Lecture 24 Exceeding the Dataflow Limit via Value Prediction M.H. Lipasti, J.P. Shen Proceedings of MICRO-29 December 1996.

Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of.

Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings.

Address-Value Delta (AVD) Prediction Onur Mutlu Hyesoon Kim Yale N. Patt.

Trace Processors Presented by Nitin Kumar Eric Rotenberg Quinn Jacobson, Yanos Sazeides, Jim Smith Computer Science Department University of Wisconsin-Madison.

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.

Revisiting Load Value Speculation:

Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.

Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.

ReSlice: Selective Re-execution of Long-retired Misspeculated Instructions Using Forward Slicing Smruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan.

Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.

Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.

ImanFaraji Time-based Snoop Filtering in Chip Multiprocessors Amirkabir University of Technology Tehran, Iran University of Victoria Victoria, Canada Amirali.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Performance in GPU Architectures: Potentials and Distances

Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.

Prophet/Critic Hybrid Branch Prediction B B B

1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,

Lynn Choi Dept. Of Computer and Electronics Engineering

Half-Price Architecture

Energy-Efficient Address Translation

On-demand solution to minimize I-cache leakage energy

Chang Joo Lee Hyesoon Kim* Onur Mutlu** Yale N. Patt

Module 3: Branch Prediction

Address-Value Delta (AVD) Prediction

15-740/ Computer Architecture Lecture 24: Control Flow

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Lecture 10: Branch Prediction and Instruction Delivery

Serene Banerjee, Lizy K. John, Brian L. Evans

Patrick Akl and Andreas Moshovos AENAO Research Group

Lois Orosa, Rodolfo Azevedo and Onur Mutlu

Samira Khan University of Virginia Mar 6, 2019

Phase based adaptive Branch predictor: Seeing the forest for the trees

Spring 2019 Prof. Eric Rotenberg

Transparent Control Independence (TCI)

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Presentation transcript:

Power and Frequency Analysis for Data and Control Independence in Embedded Processors Farzad Samie Amirali Baniasadi Sharif University of Technology University of Victoria

This Work Goal Power and frequency analysis for control independent and data independent instructions in embedded processors Motivation Embedded processors are becoming complex Modern embedded processors use speculation Mis-speculation causes performance and power penalty Power is a major concern in embedded processors Save power and gain performance 2

This Work (cont.) Our Approach Reducing wasted energy and time in mispredictions. How? Identify and bypass Control Independent (CI) and Data Independent (DI) instructions. CIs: Instruction executing independent of branch outcome. CI-DI: CI Instructions executing with the same operands. Key Result: 12% processor energy reduction. 3

Background Branch Prediction 4 Branch Predictor Branch History Program Counter Predicted direction Predicted target address

Wrong Path (squashed) ?? Background (cont.) 5 I1I1 I2I2 I3I3 I4I4 I7I7 I8I8 I9I9 I5I5 I6I6 Branch Inst. Not taken Misprediction Detection Taken Right Path I9I9 I8I8 I7I7 I 12 I 11 I 10 Control Independent Instructions (CIs)

Background (cont.) 6 R 1 ←R 1 +R 2 Not takenTaken R 4 ←R 1 If (R4=0) R 2 ←R 4 -R 1 R 5 ←R 2 -R 3 R 3 ←0 R 5 ←R 4 +1 R 1 ←R 1 -1 R 3 ←0 R 4 ←R 6 +R 4 R 1 ←R 4 +R 1 R 5 ←R 5 -2 R 3 ←R 3 -R 4 Data Independent (CI-DI) Data Dependent (CI-DD) Data Independent (CI-DI) R 1 ←R 1 -1 R 5 ←R 2 -R 3 R 5 ←R 4 +1

CI-DI vs. CI-DD Bypassing CI-DIs saves more energy No need to read operands/execute again Bypassing CI-DIs provides higher performance Not need to waste time for reading operand/executing 7 FetchIssueDispatchExecute Write Back CI-DD CI-DI

Methodology Modified SimpleScalar Wattch for power measurement MiBench: Embedded Benchmark Suite 8

Distribution Wrong Path: 12%, CI: 5%, CI-DI: 2% 9

CI Power Reduction in Different Units Max: branch predictor unit, Min: instruction cache 10

CI Power Reduction in Stages 11 Rijndael: low misprediction  low wrong path  low CIs

Power Sensitivity to RUU size 12 CI CI-DI Higher power dissipation for bigger RUU sizes

Power Sensitivity to Execution Bandwidth 13 CI CI-DI Higher power dissipation for wider execution bandwidth

Power Sensitivity to Branch Predictor Size 14 Little sensitivity to branch predictor size

Related Work Rotenberg et. al: studied control independence in superscalar processors, HPCA99. Collins et. al: suggested mechanism to predict re-convergent point, Micro04. Lam and Wilson: studied impact of CIs on instruction level parallelism, ISCA92. Gandhi et. al: recover selected branch mis-prediction, HPCA04. 15

Conclusion Categorize CI to CI-DI and CI-DD Potential power saving for bypassing CI and CI-DI instructions up-to 12% High sensitivity to RUU size High sensitivity to execution bandwidth Little sensitivity to branch predictor size 16

Question Thank you 17