CS5222 Advanced Computer Architecture Part 3: VLIW Architecture

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism
Advertisements

Instruction Set Design
Computer Architecture Instruction-Level Parallel Processors
Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
HW 2 is out! Due 9/25!. CS 6290 Static Exploitation of ILP.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
CS252 Graduate Computer Architecture Spring 2014 Lecture 9: VLIW Architectures Krste Asanovic
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Instruction Level Parallelism (ILP) Colin Stevens.
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Chapter One Introduction to Pipelined Processors.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Basics and Architectures
Very Long Instruction Word (VLIW) Architecture. VLIW Machine It consists of many functional units connected to a large central register file Each functional.
TECH 6 VLIW Architectures {Very Long Instruction Word}
Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Madhavi Valluri and Lizy John Laboratory for Computer Architecture.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Pipelining and Parallelism Mark Staveley
CS5222 Adv. Comp. Arch. Part 0 Page.1 Chi C.H. Fall 2003 NUS CS5222 Advanced Computer Architecture Part 0: Course Introduction Fall Term, 2003/2004 Chi.
CS5222 Adv. Comp. Arch. Part 0 Page.1 Chi C.H. Fall 2004 NUS CS5222 Advanced Computer Architecture Part 0: Course Introduction Fall Term, 2004/2005 Chi.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
EKT303/4 Superscalar vs Super-pipelined.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
CISC. What is it?  CISC - Complex Instruction Set Computer  CISC is a design philosophy that:  1) uses microcode instruction sets  2) uses larger.
CS 352H: Computer Systems Architecture
Topics to be covered Instruction Execution Characteristics
Advanced Architectures
Computer Architecture
VLIW Architecture FK Boachie..
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
5.2 Eleven Advanced Optimizations of Cache Performance
/ Computer Architecture and Design
Instruction Scheduling for Instruction-Level Parallelism
Superscalar Processors & VLIW Processors
CS 704 Advanced Computer Architecture
Coe818 Advanced Computer Architecture
How to improve (decrease) CPI
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Superscalar and VLIW Architectures
Lecture 4: Instruction Set Design/Pipelining
Lecture 5: Pipeline Wrap-up, Static ILP
Instruction Level Parallelism
Presentation transcript:

CS5222 Advanced Computer Architecture Part 3: VLIW Architecture Fall Term, 2004/2005 Chi Chi Hung (email: chich@comp.nus.edu.sg) Building S/17, Rm 5-13 Phone: 6874-2832 Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall

Basic Working Principles of VLIW Aim at speeding up computation by exploiting instruction- level parallelism. Same hardware core as superscalar processors, having multiple execution units (EUs) working in parallel. An instruction is consisted of multiple operations; typical word length from 52 bits to 1 Kbits. All operations in an instruction are executed in a lock-step mode. One or multiple register files for FX and FP data. Rely on compiler to find parallelism and schedule dependency free program code.

Basic VLIW Approach

Register File Structure for VLIW What is the challenge to register file in VLIW? R/W ports

Differences Between VLIW & Superscalar Architecture (I)

Differences Between VLIW & Superscalar Architecture (II) Instruction formulation: Superscalar: Receive conventional instructions conceived for seq. processors. VLIW: Receive (very) long instruction words, each comprising a field (or opcode) for each execution unit. Instruction word length depends (a) number of execution units, and (b) code length to control each unit (such as opcode length, register names, …). Typical word length is 64 – 1024 bits, much longer than conventional machine word length.

Differences Between VLIW & Superscalar Architecture (III) Instruction scheduling: Superscalar: Done dynamically at run-time by the hardware. Data dependency is checked and resolved in hardware. Need a lookahead hardware window for instruction fetch.

Differences Between VLIW & Superscalar Architecture (IV) Instruction scheduling (cont’d): VLIW: Static scheduling done at compile-time by the compiler. Advantages: Reduce hardware complexity. Tasks such as decoding, data dependency detection, instruction issue, …, etc. becoming simple. Potentially higher clock rate. Higher degree of parallelism with global program information.

Differences Between VLIW & Superscalar Architecture (V) Instruction scheduling (cont’d): VLIW: Disadvantages Higher complexity of the compiler. Compiler optimization needs to consider technology dependent parameters such as latencies and load-use time of cache. (Question: What happens to the software if the hardware is updated?) Non-deterministic problem of cache misses, resulting in worst case assumption for code scheduling. In case of un-filled opcodes in a (V)LIW, memory space and instruction bandwidth are wasted.

Development history of Proposed/Commercial VLIWs

Case Study of VLIW: Trace 200 Family (I)

Case Study of VLIW: Trace 200 Family (II) Only two branches might be used in Trace 7/2000

Code Expansion in VLIW It is found that code in VLIW is expanded roughly by a factor of three. For “long” VLIW, more opcode fields will be emptied. This will result in wasting bandwidth and storage space. Can you propose a solution for it?

END