OOE v.s. EPIC Hridesh Rajan Zhendong Yu Weilin Zhong.

Slides:

Advertisements

Similar presentations

Computer Architecture Instruction-Level Parallel Processors

Advertisements

Superscalar and VLIW Architectures Miodrag Bolic CEG3151.

® IA-64 Architecture Innovations John Crawford Architect & Intel Fellow Intel Corporation Jerry Huck Manager & Lead Architect Hewlett Packard Co.

VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:

CPE 631: ILP, Static Exploitation Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University

8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.

Chapter 4 CSF 2009 The processor: Instruction-Level Parallelism.

3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.

Instruction Level Parallelism (ILP) Colin Stevens.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.

Multiscalar processors

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li.

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

Basics and Architectures

Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar.

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.

10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.

Hardware Support for Compiler Speculation

Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.

OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

The Central Processing Unit (CPU) and the Machine Cycle.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

Spring 2003CSE P5481 Midterm Philosophy What the exam looks like. Definitions, comparisons, advantages & disadvantages what is it? how does it work? why.

CS5222 Advanced Computer Architecture Part 3: VLIW Architecture

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

Lecture 8Fall 2006 Chapter 6: Superscalar Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design, Patterson & Hennessy,

Pipelining and Parallelism Mark Staveley

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

Unit II Intel IA-64 and Itanium Processor By N.R.Rejin Paul Lecturer/VIT/CSE CS2354 Advanced Computer Architecture.

IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury.

Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

CSIE30300 Computer Architecture Unit 13: Introduction to Multiple Issue Hsin-Chou Chi [Adapted from material by and

Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

PipeliningPipelining Computer Architecture (Fall 2006)

CS 352H: Computer Systems Architecture

Computer Architecture Principles Dr. Mike Frank

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

/ Computer Architecture and Design

Henk Corporaal TUEindhoven 2009

Yingmin Li Ting Yan Qi Zhao

Henk Corporaal TUEindhoven 2011

Sampoorani, Sivakumar and Joshua

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

/ Computer Architecture and Design

Superscalar and VLIW Architectures

CSC3050 – Computer Architecture

Presentation transcript:

OOE v.s. EPIC Hridesh Rajan Zhendong Yu Weilin Zhong

Outline zIntroduction yEPIC yOOE zComparison yILP, Power Consumption, Code Size, Performance, Compiler techniques zPerformance Evaluation zConclusion

Introduction - EPIC zEPIC (Explicit Parallelism Instruction Computing) yAn evolution of VLIW yCan be considered more as “philosophy”, than “architecture”.

Introduction - EPIC (2) zInstruction example (IA-64): yLong instruction contains multiple operations and a template specifying dependencies between instructions. Op1Op2Op3template Op1Op2Op3template

Introduction - OOE zOOE (Out of Order Execution) superscalar yNot an explicit way to demonstrate dependencies between instructions.

Comparison - Complexity zOOE: zComplexity = Complexity(bpred) + Complexity (Register Renaming) + Complexity(Dependency Checking)+ Complexity(Alias Detection) zEPIC: zComplexity = Complexity(Nat) + Complexity (ALAT) + Complexity (CFM) + Complexity (RSE)

Comparison – power consumption zOOE: yLess power consumption zEPIC: yMore power consumption

Code Size zOOE: yCompact code (more branches) zEPIC ySparse code (code bloat) zIt depends on compilers

Comparison - ILP zOOE: (disadvantages) yParallelism at the level of machine instructions: which can be issued in a single cycle in a processor. yLimited ILP, ILP is not evenly distributed yData dependency, control dependency yResource dependency x# of registers, x# of ports to registers and memory x# of parallel instruction decoders, x# of function units x# of data paths between various CPU components

Comparison – ILP(2) zOOE (advantages) yPredicted path yDynamic adjustment of instruction schedule based on the actual execution path and cache miss results xIt can deal with stalls smartly

Comparison - ILP (3) zEPIC (Disadvantage) yDynamic path tends to be longer yStatic decisions based on compiler xWhat if the program stalls? yRecovery Code

Comparison – ILP(4) zEPIC: (advantages) yMassive resources xLarger register sets, more function units, etc. yPredication reduces branch penalties ySpeculation reduces cache miss

Role of Compiler vs. Hardware zOOE: yParallelism detection and scheduling: Hardware yMore powerful hardware, less powerful compiler zEPIC: yParallelism detection and scheduling: Compiler/Hardware yMore powerful compiler, less powerful hardware

Comparison - frequency zOOE: yHigh frequency zEPIC: yLow frequency due to: xFocus on CPI xPerformance compares and dependent branches in the same cycle. xPredicated Execution xPower Consumption

Performance zMethodologies in performance comparison yCPI, CPU frequency, and the tradeoffs. zHowever, Itatium does not show great improvement over Alpha or Pentium IV.

Conclusion zEPIC seems to be a good alternate to OOE (can OOE use EPIC techniques?) zBut there is no explicit proof in the performance gain. zTradeoffs are always there. It depends on what kind of processor behavior we need. zTime will prove everything.

References zA Critical look at IA-64, M. Hopkins zIs Out-of-Order Out of Date?, W. S. Worley, J. Huck zEPIC: An Architecture for Instruction-level Parallel Processors, M. S. Schlansker, B. R. Rao

Thank you! Questions?