IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury.

Slides:



Advertisements
Similar presentations
The IA-64 Architectural Innovations Hardware Support for Software Pipelining José Nelson Amaral 1.
Advertisements

Computer Architecture Instruction-Level Parallel Processors
® IA-64 Architecture Innovations John Crawford Architect & Intel Fellow Intel Corporation Jerry Huck Manager & Lead Architect Hewlett Packard Co.
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
Chapter 4 Predication CSE 820. Michigan State University Computer Science and Engineering Go over midterm exam.
Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 Lecture 18: VLIW and EPIC Static superscalar, VLIW, EPIC and Itanium Processor (First introduce fast and high- bandwidth L1 cache design)
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Instruction Level Parallelism (ILP) Colin Stevens.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
Chapter 15 IA-64 Architecture No HW, Concentrate on understanding these slides Next Monday we will talk about: Microprogramming of Computer Control units.
The IA-64 architecture and Itanium processors Explicitly Parallel Instruction Computing Frans Dondorp Presentation et 4 074, January 8 th 2001 Frans Dondorp.
Performance Potentials of Compiler- directed Data Speculation Author: Youfeng Wu, Li-Ling Chen, Roy Ju, Jesse Fang Programming Systems Research Lab Intel.
Compiler Optimizations for Modern Hardware Architectures - Part II Bob Wall CS 550 (Fall 2003) Class Presentation Compiling for the Intel® Itanium® – A.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Multiscalar processors
NYU DARPA DIS kick-off September 24, Comparing IA-64 and HPL-PD NYU.
Chapter 15 IA-64 Architecture. Reflection on Superscalar Machines Superscaler Machine: A Superscalar machine employs multiple independent pipelines to.
Chapter 21 IA-64 Architecture (Think Intel Itanium)
IA-64 Architecture (Think Intel Itanium) also known as (EPIC – Extremely Parallel Instruction Computing) a new kind of superscalar computer HW 5 - Due.
Chapter 15 IA-64 Architecture or (EPIC – Extremely Parallel Instruction Computing)
Static Optimizations (aka: the complier) Dr. Mark Brehob EECS 470.
IA-64 ISA A Summary JinLin Yang Phil Varner Shuoqi Li.
Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar.
® Compiling for the Intel® Itanium™ Architecture Steve Skedzielewski Intel Corporation Compiler Tricks.
 Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed.
The Arrival of the 64bit CPUs - Itanium1 นายชนินท์วงษ์ใหญ่รหัส นายสุนัยสุขเอนกรหัส
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
Hardware Support for Compiler Speculation
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
IA-64 Architecture RISC designed to cooperate with the compiler in order to achieve as much ILP as possible 128 GPRs, 128 FPRs 64 predicate registers of.
OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day11:
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Unit II Intel IA-64 and Itanium Processor By N.R.Rejin Paul Lecturer/VIT/CSE CS2354 Advanced Computer Architecture.
IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004.
现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年.
Use of Pipelining to Achieve CPI < 1
CS 352H: Computer Systems Architecture
COSC6385 Advanced Computer Architecture
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Henk Corporaal TUEindhoven 2009
Superscalar Processors & VLIW Processors
The EPIC-VLIW Approach
CS 704 Advanced Computer Architecture
Yingmin Li Ting Yan Qi Zhao
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Lecture 23: Static Scheduling for High ILP
Henk Corporaal TUEindhoven 2011
Sampoorani, Sivakumar and Joshua
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Midterm 2 review Chapter
Additional ILP topic #5: VLIW Also: ISA topics Prof. Eric Rotenberg
CSC3050 – Computer Architecture
Design of Digital Circuits Lecture 19a: VLIW
rePLay: A Hardware Framework for Dynamic Optimization
IA-64 Vincent D. Capaccio.
Presentation transcript:

IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury

Outline EPIC Style ISA Predication Register Model Speculative Control Flow Data Speculation

Explicitly Parallel Instruction Computing (EPIC) Style ISA IA-64 is just one implementation of EPIC Main idea behind EPIC: –Make Hardware Simpler –Make Compiler Smarter Similar to VLIW –Three Instructions per “bundle” –Includes template that describes dependencies

Predication Conditional execution of an instruction based on predicate register Reduces branching –Increases ILP Allows instructions to be moved across branches

Predication (Cont.) if(a<5) then b=c+d; else b=c-d; cmp.lt p1= ra,#5 (p1) add rc,rd,rb (p1) sub rc,rd,rb cmp ra,#5 bgeL1 add rc,rd,rb jmpL2 L1: subrc,rd,rb L2: Traditional: Predicated:

Register Model IA-64 has 128 integer registers –R0-R31 are always program visible –R32-R127 are stacked Each procedure can have it’s own variable sized stack frame Register Stack Engine (RSE) handles spills in hardware Compiler doesn’t have to worry about managing fills/spills –Done dynamically in hardware –Allows for shorter critical path length through code OS has to flush stack on context switch

Speculative Control Flow Allows for loads to be moved outside of a basic block even if address is not known to be safe Speculative Load (ld.s) instruction Speculation Check (chk.s) instruction TraditionalIA-64

Speculative Control Flow (cont.) Every register has a NaT bit set if there is an exception NaT bits are propagated through all instructions using the speculated value The chk.s instruction will branch to recovery code if speculation fails Deferral Models to allow for tradeoffs between OS and hardware exception handling

Data Speculation Allows for loads to be moved ahead of stores if compiler is unsure if addresses are the same Advanced load (ld.a) Alias check (chk.a) TraditionalIA-64

Data Speculation (cont.) Implemented using an Advanced Load Address Table (ALAT) –All speculative load addresses stored in table –All stores remove entries with same address –On chk, if address is in the table, speculation is successful The chk.a will branch to recovery code if speculation has not succeeded

Interesting Research Questions How do the new IA-64 instructions impact existing compiler optimizations (partial redundancy elimination, liveness analysis...) ? Does the EPIC approach outperform the runtime optimizations of traditional superscalar processors? What other hardware aspects can be exposed to the compiler?

Additional Papers Used J. Huck, et. al., Introducing the IA-64 Architecture. IEEE Micro, Sept./Oct R. Krishnaiyer, et. al., An Advanced Optimizer for the IA-64 Architecture. IEEE Micro, Nov./Dec M. Schlansker, B. Ramakrishna Rau. EPIC: Explicitly Parallel Instruction Computing. IEEE Computer, Feb