CSE 8383 - Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383.

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

Functions S S T : S P R E A D S H E E T S SST 5 Spreadsheet 5 Function.
Computer Organization and Architecture
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
CS5365 Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
Parallell Processing Systems1 Chapter 4 Vector Processors.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Word Lesson 11 Customizing Tables and Creating Charts Microsoft Office 2010 Advanced Cable / Morrison 1.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
Chapter 12 CPU Structure and Function. Example Register Organizations.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
C++ for Engineers and Scientists Third Edition
CS 326 A: Motion Planning Coordination of Multiple Robots.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Introduction to IT and Communications Technology Justin Champion C208 – 3292 Ethernet Switching CE
Spreadsheets Objective 6.02
2 Explain advanced spreadsheet concepts and functions Advanced Calculations 1 Sabbir Saleh_Lecture_17_Computer Application_BBA.
Final Exam Review II Chapters 5-7, 9 Objectives and Examples.
Chapter 7: Arrays. In this chapter, you will learn about: One-dimensional arrays Array initialization Declaring and processing two-dimensional arrays.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
High Speed, Low Power FIR Digital Filter Implementation Presented by, Praveen Dongara and Rahul Bhasin.
 2006 Pearson Education, Inc. All rights reserved Arrays.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
ARRAYS 1 Week 2. Data Structures  Data structure  A particular way of storing and organising data in a computer so that it can be used efficiently 
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 3.
Anshul Kumar, CSE IITD CSL718 : Pipelined Processors  Types of Pipelines  Types of Hazards 16th Jan, 2006.
Chapter One Introduction to Pipelined Processors.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
FUNCTIONS FUNCTIONS are : Special formulas that do not use operators to calculate a result (i.e., a shortcut formula) Example: =SUM(A6:A9) SUM is the.
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 2, 2006 Session 6.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Chapter One Introduction to Pipelined Processors
1 Bottleneck Routing Games on Grids Costas Busch Rajgopal Kannan Alfred Samman Department of Computer Science Louisiana State University.
Matrix Multiplication The Introduction. Look at the matrix sizes.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
 Tata consultancy services Production Planning WORK CENTERS.
Multiplication table. x
Pipeline Design Problems
Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.
Morgan Kaufmann Publishers The Processor
Array Processor.
How to create other Graphics
In the power 52 , the base is In the power 52 , the base is
2.2 Introduction to Matrices
Arrays Week 2.
Appendix C Practice Problem Set 1
Memory System Performance Chapter 3
COMPUTER ORGANIZATION AND ARCHITECTURE
Linear Pipeline Processors
Pipelining and Superscalar Techniques
Presentation transcript:

CSE Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383

Contents Reservation Table Latency Analysis State Diagrams MAL and its bounds Delay Insertion Throughput Group Work Introduction to Multiprocessors

Reservation Table A reservation table displays the time- space flow of data through the pipeline for one function evaluation A static pipeline is specified by a single reservation table A dynamic pipeline may be specified by multiple reservation tables

Static Pipeline X X X X S1 S2 S3 S4 Time

Dynamic Pipeline XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3

Reservation Table (Cont.) The number of columns in a reservation table is called the evaluation time of a given function. The checkmarks in a row correspond to the time instants (cycles) that a particular stage will be used. Multiple checkmarks in a row  repeated usage of the same stage in different cycles

Reservation Table (Cont.) Contiguous checkmarks  extended usage of a stage over more than one cycle Multiple checkmarks in one column  multiple stages are used in parallel A dynamic pipeline may allow different initiations to follow a mix of reservation table

Reservation Table AXXX BXX CXX DX

Latency Analysis The number of cycles between two initiations is the latency between them A latency of k  two initiations are separated by k cycles Collision  resource conflict between two initiations Latencies that cause collision  forbidden latencies

Collision with latency 2 & 5 in evaluating X X1X2X1X2 X1 X1X2 X1X2 X1X2 X1 S1 S2 S3 X1X2 X1X1 X2 X1 X2 S1 S2 S3 5 2

Latency Analysis (cont.) Latency Sequence  a sequence of permissible latencies between successive initiations Latency Cycle  a latency sequence that repeats the same subsequence (cycle) indefinitely Latency Sequence  1, 8 Latencies Cycle  (1,8)  1, 8, 1, 8, 1, 8 …

Latency Analysis (cont.) Average Latency (of a latency cycle)  sum of all latencies / number of latencies along the cycle Constant Cycle  One latency value Objective  Obtain the shortest average latency between initiations without causing collisions.

Latency Cycle (1,8) X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X3 X4X4 X3X3 X4X4 X3X3 X4X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X3X3 X4X4 X3X3 X4X4 X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X3 X4X4 X3X3 X4X4 X3X3 X4X4 X5X5 Average Latency = (1+8)/2 = 4.5

Latency Cycle (6) X1X1 X1X1 X2X2 X1X1 X2X2 X3X3 X2X2 X3X3 X4X4 X3X3 X1X1 X1X1 X2X2 X2X2 X3X3 X3X3 X4X4 X1X1 X1X1 X1X1 X2X2 X2X2 X2X2 X3X3 X3X3 X3X3 X4X4 Average Latency = 6

Collision Vector C = (C m, C m-1, …, C 2, C 1 ) C i = 1 if latency i causes collision (forbidden) C i = 0 if latency i is permissible C m = 1 (always) maximum forbidden latency Maximum forbidden latency: m <= n-1 n = number of column in reservation table

Collision Vector (X after X) Forbidden Latencies: 2, 4, 5, 7 Collision Vector =

Collision Vector (Y after Y) Forbidden Latencies: 2, 4 Collision Vector =

State Diagram It specifies the permissible state transitions among successive initiations Collision vector corresponds to the initial state at time t = 1 (initial collision vector) The next state comes at time t + p, where p is a permissible latency in the range 1 <= p < m

Right Shift Register The next state can be obtained with the help of an m-bit shift register Collision Safe to allow an initiation Each 1-bit shift corresponds to increase in the latency by 1

The next state The next state is obtained by bitwise ORing the initial collision vector with the shifted register C.V. = (first state) C.V. 1-bit right shifted initial C.V OR

State Diagram for X *3* 1*1*

Cycles Simple cycles  each state appears only once (3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles  simple cycles whose edges are all made with minimum latencies from their respective starting states (1,8), (3)  one of them is MAL

MAL Minimum Average latency At least one of the greedy cycles will lead to the MAL Consider state diagram for Y, MAL is 3 (See diagram)

State Diagram for Y *3* 1*1*

Bounds on the MAL MAL is lower bounded by the maximum number of checkmarks in any row of the reservation table. (Shar, 1972) MAL is lower than or equal to the average latency of any greedy cycle in the state diagram. (Shar, 1972) The average latency of any greedy cycle is upper-bounded by the number of 1’s in the initial collision vector plus 1. This is also an upper bund on the MAL. (Shar, 1972)

Delay Insertion The purpose is to modify the reservation table, yielding a new collision vector This may lead to a modified state diagram, which may produce greedy cycles meeting the lower bound on MAL

Example S1 S2 S3 output

Example (Cont.) S1XX S2XX S3XX Forbidden Latencies: 1, 2, 4 C.V. 

Example (Cont.) State Diagram * 5+ MAL = 3

Example (Cont.) S1 S2 S3 output D1 D2

Example (Cont.) S1XX S2XX S3XX D1X D2X Forbidden: 2, 6 C.V. 

Group Activity 1 Find the State Diagram

Pipeline Throughput The average number of task initiations per clock cycle The inverse of MAL

Group Activity S1XX S2X S3X C.V State DiagramSimple Cycles Greedy Cycles MAL Throughput (t = 20 ns)

Multiprocessors

Introduction Uniprocessor systems are not capable of delivering solutions to some problems in reasonable time Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution Speed-up versus Quality-up

Architecture Background Three major Components Processors Memory Modules Interconnection Network

Parallel and Distributed Computers MIMD Shared Memory Bus based Switch based CC-NUMA MIMD Distributed Memory SIMD Computers Clusters Grid Computing

MIMD Shared Memory Systems Interconnection Networks MMMM PPPPP

Bus Based & switch based SM Systems Global Memory P C P C P C P C P C P C P C MMMM

Cache Coherent NUMA Interconnection Network M C P M C P M C P M C P

MIMD Distributed Memory Systems Interconnection Networks MMMM PPPP

SIMD Computers Processor Memory P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M von Neumann Computer Some Interconnection Network

Clusters M C P I/O OS M C P I/O OS M C P I/O OS Middleware Programming Environment Interconnection Network

Grids Grids are geographically distributed platforms for computation. They provide dependable, consistent, pervasive, and inexpensive access to high end computational capabilities.

Interconnection Network Taxonomy Interconnection Network Static Dynamic Bus-basedSwitch-based 1-D2-DHC SingleMultiple SSMS Crossbar