INTRODUCTION TO MULTISCALAR ARCHITECTURE

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors
Computer Organization and Architecture
Computer architecture
Computer Architecture Instruction-Level Parallel Processors
CSCI 4717/5717 Computer Architecture
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
1 Advanced Computer Architecture Limits to ILP Lecture 3.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Instruction Level Parallelism (ILP) Colin Stevens.
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
The ESW Paradigm Manoj Franklin & Guirndar S. Sohi 05/10/2002.
Multiscalar processors
Trace Processors Presented by Nitin Kumar Eric Rotenberg Quinn Jacobson, Yanos Sazeides, Jim Smith Computer Science Department University of Wisconsin-Madison.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
SUPERSCALAR ARCHITECTURE
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Chapter One Introduction to Pipelined Processors.
Very Long Instruction Word (VLIW) Architecture. VLIW Machine It consists of many functional units connected to a large central register file Each functional.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Chapter One Introduction to Pipelined Processors.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
ECE 4100/6100 Advanced Computer Architecture Lecture 2 Instruction-Level Parallelism (ILP) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
CS 258 Spring The Expandable Split Window Paradigm for Exploiting Fine- Grain Parallelism Manoj Franklin and Gurindar S. Sohi Presented by Allen.
Unit II Intel IA-64 and Itanium Processor By N.R.Rejin Paul Lecturer/VIT/CSE CS2354 Advanced Computer Architecture.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
CS 352H: Computer Systems Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
COMP 740: Computer Architecture and Implementation
Advanced Architectures
Distributed Processors
William Stallings Computer Organization and Architecture 8th Edition
Multiscalar Processors
Chapter 7.2 Computer Architecture
A Common Machine Language for Communication-Exposed Architectures
Lynn Choi Dept. Of Computer and Electronics Engineering
Architecture & Organization 1
CS203 – Advanced Computer Architecture
Chapter 14 Instruction Level Parallelism and Superscalar Processors
SUPERSCALAR ARCHITECTURE
Instruction Level Parallelism and Superscalar Processors
Superscalar Processors & VLIW Processors
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Instruction Level Parallelism and Superscalar Processors
Dr. Javier Navaridas Pipelining Dr. Javier Navaridas COMP25212 System Architecture.
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Computer Architecture
The Vector-Thread Architecture
Chapter 13 Instruction-Level Parallelism and Superscalar Processors
Prof. Onur Mutlu Carnegie Mellon University
CSC3050 – Computer Architecture
Created by Vivi Sahfitri
The University of Adelaide, School of Computer Science
Lecture 5: Pipeline Wrap-up, Static ILP
Microprocessor I 7/18/2019.
Presentation transcript:

INTRODUCTION TO MULTISCALAR ARCHITECTURE

When and how ? Multiscalar Architecture was a concept developed by Manoj Franklin. He was a student of the Computer Science dept in the university of Wisconsin-- Madison. He wrote a thesis on the subject for his doctorate in 1993.

What is multiscalar architecture ? The Multiscalar architecture uses a distributed processor organization and task-level speculation to exploit high degrees of instruction level parallelism (ILP) in sequential programs without improvements in clock speeds.

But how is it any better than Scalar or superscalar architectures ?

Scalar Processors Instruction Queue Execution Unit addu $20, $20, 16 ld $23, SYMVAL -16($20) move $17, $21 beq $SKIPINNER17, $0, ld $8, LELE($17) 5

SuperScalar Processors Instruction Queue Execution Unit addu $20, $20, 16 ld $23, SYMVAL -16($20) move $17, $21 beq $17, $0, SKIPINNER ld $8, LELE($17)

Multiscalar Architecture 7

Multiscalar Architecture Sequencer Queue of processing units Unidirectional ring. Each has an instruction cache, processing element, register file. Interconnect Data Bank Each has address resolution buffer, data cache.

Modern microprocessors achieve high performance by exploiting instruction level parallelism (ILP) in sequential programs. They establish a large dynamic window of instructions and employ wide-issue organizations to extract ILP and execute multiple instructions simultaneously. Larger windows enable more dynamic instructions to be examined, which leads to the identification of more independent instructions that can be executed by wider processors. However, large centralized hardware structures for larger windows and wider processors may be harder to engineer at high clock speeds due to quadratic wire delays, limiting overall performance.

Point 1 WINDOW 1 Point 2 WINDOW 2 Point 3 WINDOW 3 Point 4 A1 A2 A3 Task A B1 B2 B3 Task B C1 C2 C3 Task C

Multiscalar Programs Code for the tasks Small changes to existing ISA add specification of tasks no major overhaul Structure of the CFG and tasks Communications between tasks

So what is multiscalar processing ? The idea here is to connect multiple sequential processors in a decoupled and decentralized manner to achieve overall multiple issue. So it can be compared to multiple superscalar processors performing simultaneously. The fundamental performance issues are control flow speculation, data communication, data dependence speculation, load imbalance, and task overhead.

Continuation . . . In a Multiscalar processor, sequential programs are partitioned into sequential tasks. Task A Task B Task C PU PU PU Memory Disambiguation Unit (ARB)

Continuation . . . The above Figure shows a static program partitioned into three tasks and three points of search in the dynamic stream with three corresponding windows. Execution proceeds by assigning tasks to PUs(Processing Units). After assigning a task for execution, one of the possible successors of the task is predicted to be the next task. This is similar to branch prediction employed by superscalar machines, i.e., control flow speculation is used.

Address Resolution Buffer(ARB) ARB plays a key role in multiscalar processing. The basic idea behind the ARB is to allow out-of-order issue and execution of memory references. It also provide a hardware platform to order the references sequentially.

Thank You! References used for this presentation are The original thesis of Dr.Manoj Franklin. IEEE journal entry “Task Selection for a Multiscalar Processor” by T.N Vijaykumar,School of Electrical and Computer Engineering Purdue University and Gurindar S. Sohi ,Computer Sciences Department University of Wisconsin-Madison. The presentation “Multiscalar processors” by Matthew Misler,Gurindar S. Sohi, Scott E. Breach, T. N. Vijayjumar University of Wisconsin-Madison.