Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.

Slides:



Advertisements
Similar presentations
The CPU The Central Presentation Unit What is the CPU?
Advertisements

Computer Architecture
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
PIPELINE AND VECTOR PROCESSING
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
CSCI 4717/5717 Computer Architecture
ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Structure of Computer Systems
The University of Adelaide, School of Computer Science
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Computer Architecture and the Fetch-Execute Cycle Parallel Processor Systems.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.
Assembly Language for Intel-Based Computers Chapter 2: IA-32 Processor Architecture Kip Irvine.
CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.
Chapter 12 CPU Structure and Function. Example Register Organizations.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Prince Sultan College For Woman
SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.
Advanced Computer Architectures
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Processor Structure & Operations of an Accumulator Machine
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
PIPELINING AND VECTOR PROCESSING
Computer Architecture Lecture 3 Cache Memory. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
The fetch-execute cycle. 2 VCN – ICT Department 2013 A2 Computing RegisterMeaningPurpose PCProgram Counter keeps track of where to find the next instruction.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Computer Studies/ICT SS2
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
E X C E E D I N G E X P E C T A T I O N S VLIW-RISC CSIS Parallel Architectures and Algorithms Dr. Hoganson Kennesaw State University Instruction.
Electronic Analog Computer Dr. Amin Danial Asham by.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.
Vector and symbolic processors
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
Outline Why this subject? What is High Performance Computing?
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Fundamentals of Programming Languages-II
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:
M211 – Central Processing Unit
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Chapter Overview General Concepts IA-32 Processor Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Computer Organization and Architecture + Networks
Chapter 10: Computer systems (1)
Computer Architecture Chapter (14): Processor Structure and Function
Distributed Processors
William Stallings Computer Organization and Architecture 8th Edition
Parallel Processing - introduction
COMP4211 : Advance Computer Architecture
Pipelining and Vector Processing
Array Processor.
Multivector and SIMD Computers
Presentation transcript:

Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section 18.7

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Vector/Array Computing Optimized for calculation rather than multitasking and I/O Design focus is to perform parallel mathematical operations on a vector or array of data elements Scalar processor would need to handle one element at a time. Limited market -- Research, government agencies, meteorology

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Vector/Array Computing (continued) Target applications: –data-intensive/scientific research such as: Aerodynamics, seismology, meteorology Continuous field simulation –specialized (high-performance) graphics applications Applicable because of ever-increasing need for improved resolution and model capabilities

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Array Processor Alternative to supercomputer Configured as a peripheral to mainframe or minicomputer Processor is only responsible for running vector portion of problem The Sony PlayStation 3 uses a processor consisting of one scalar processor and eight vector processors. Developed by IBM, Toshiba and Sony. (Source:

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Vector/Array Operation Power of vector computing comes in the form of special processing instructions (Single Instruction, Multiple Data or SIMD) Lock-step execution of code issuing single instruction to a large number of identical processors (or ALUs) with a large register set working on different data elements Single master CPU keeps control of the entire process

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Speed-Up Not Linear As with any parallel processing architecture, the realized speed up of a vector processor is not linear because of: –Overhead for managing parallel computations –Bottlenecks for communication and storage –Load of application doesn't always match available processors These problems have an increasing effect with increases in the number of processors

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Data Pipelining The sequential nature of instructions allows for an instruction pipeline Vector computing tends to have data that is well organized too This allows for pipelining the data too Single decode for instruction Stages to fetch data, process data, store result in register

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Data Pipelining (continued) Example: To add an array of numbers, processor must have the following information: –a single "add" instruction –start address for the data –end address for the data

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Vector/Array Programming The programming goal is to divide a large dataset into independent sets that can be operated on in parallel Requires a deep understanding of the algorithm being applied to the data Distribute data to different processors Initiate parallel processing Bring everything back together when parallel processing is complete

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Vector/Array Programming (continued) Example: Count the number of times a specific value appears in a large array Begin by breaking up array into smaller arrays, one for each array processor Each array processor, in parallel, counts the number of occurrences of the value Final sum is then computed by adding the results from all of the processors

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Vector/Array Applications Which of the following applications would be better served by a vector or array computer than an SMP, cluster, or scalar processor? What component of the problem is parallel? –Web search indexing –Generating Fibonacci Sequence: f(i) = f(i-1) + f(i-2) –Weather prediction –Image processing for a game –Web site server –Photoshop-type image processing

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Scalar Programming The following two slides are based on the multiplication of two 100X100 matrices A and B DO 100 I = 1,N DO 100 J = 1,N C(I,J) = 0.0 DO 100 K = 1,N C(I,J) = C(I,J) + A(I,K)*B(K,J) (J = 1,N) 100CONTINUE

Vector/Array ProcessorsCSCI 4717 – Computer Architecture (J = 1,N) Vector Programming The notation (J = 1,N) indicates that operations on all indices J are to be carried out on N processors as a single operation DO 100 I=1,N C(I,J) = 0.0 (J = 1,N) DO 100 K = 1,N C(I,J) = C(I,J) + A(I,K)*B(K,J) (J = 1,N) 100 CONTINUE

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Fork/Join Parallel Programming One method of parallel programming is the fork-join. Programs start as a single process known as a master thread The operation "fork" is used to indicate the beginning of sections of the program that are to be executed in parallel The operation "join" is used to terminate the parallel threads created by "fork" to bring the program back to a single, master thread

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Fork/Join Method (continued) DO 50 J=1,N – 1 FORK CONTINUE J = N 100DO 200 I=1,N C(I,J) = 0.0 DO 200 K = 1,N C(I,J) = C(I,J) + A(I,K)*B(K,J) 200 CONTINUE

Vector/Array ProcessorsCSCI 4717 – Computer Architecture Neural Networks

Vector/Array ProcessorsCSCI 4717 – Computer Architecture What?! A Blank Slide?! It must be over!!!