STARAN Parallel processor system hardware By KENNETH E. BATCHER Presented by Manoj k. Yarlagadda Manoj k. Yarlagadda.

Slides:



Advertisements
Similar presentations
Computer Organization, Bus Structure
Advertisements

Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
PIPELINE AND VECTOR PROCESSING
Chapter 2 Data Manipulation Dr. Farzana Rahman Assistant Professor Department of Computer Science James Madison University 1 Some sldes are adapted from.
MICROPROCESSORS TWO TYPES OF MODELS ARE USED :  PROGRAMMER’S MODEL :- THIS MODEL SHOWS FEATURES, SUCH AS INTERNAL REGISTERS, ADDRESS,DATA & CONTROL BUSES.
Computer Organization and Architecture
Processor System Architecture
Computer Organization and Architecture
CS-334: Computer Architecture
Computer Architecture and Data Manipulation Chapter 3.
Computer Organization. This module surveys the physical resources of a computer system. –Basic components CPUMemoryBus I/O devices –CPU structure Registers.
Input/Output Management and Disk Scheduling
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Computer Science: An Overview Tenth Edition by J. Glenn Brookshear Chapter.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 5: CPU and Memory.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Computer Architecture
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Microcontroller based system design
Group 7 Jhonathan Briceño Reginal Etienne Christian Kruger Felix Martinez Dane Minott Immer S Rivera Ander Sahonero.
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Processor Structure & Operations of an Accumulator Machine
Computer Organization Computer Organization & Assembly Language: Module 2.
Chapter 10: Input / Output Devices Dr Mohamed Menacer Taibah University
CPU BASICS, THE BUS, CLOCKS, I/O SUBSYSTEM Philip Chan.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Invitation to Computer Science 5th Edition
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 1 Computer System Overview.
2009 Sep 10SYSC Dept. Systems and Computer Engineering, Carleton University F09. SYSC2001-Ch7.ppt 1 Chapter 7 Input/Output 7.1 External Devices 7.2.
Input/Output Computer component : Input/Output I/O Modules External Devices I/O Modules Function and Structure I/O Operation Techniques I/O Channels and.
Ch. 2 Data Manipulation 4 The central processing unit. 4 The stored-program concept. 4 Program execution. 4 Other architectures. 4 Arithmetic/logic instructions.
© 2005 Pearson Addison-Wesley. All rights reserved Figure 2.1 This chapter focuses on key hardware layer components.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
Execution of an instruction
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Input-Output Organization
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
8085. Microcomputer Major components of the computer - the processor, the control unit, one or more memory ICs, one or more I/O ICs, and the clock Major.
Computer Science/Ch.3 Data Manipulation 3-1 Chapter 3 Data Manipulation.
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Computer Hardware A computer is made of internal components Central Processor Unit Internal External and external components.
Computer Architecture 2 nd year (computer and Information Sc.)
EKT 221 : Chapter 4 Computer Design Basics
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Data Manipulation Brookshear, J.G. (2012) Computer Science: an Overview.
Stored Program A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write,
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
THE MICROPROCESSOR A microprocessor is a single chip of silicon that performs all of the essential functions of a computer central processor unit (CPU)
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
IT3002 Computer Architecture
Chapter 2: Data Manipulation
Parallel Algorithms for array processors
MODULE 5 INTEL TODAY WE ARE GOING TO DISCUSS ABOUT, FEATURES OF 8086 LOGICAL PIN DIAGRAM INTERNAL ARCHITECTURE REGISTERS AND FLAGS OPERATING MODES.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ read/write and clock inputs Sequence of control signal combinations.
Computer Organization & Assembly Language Chapter 3
Multivector and SIMD Computers
Chapter 2: Data Manipulation
Chapter 2: Data Manipulation
Chapter 2: Data Manipulation
Presentation transcript:

STARAN Parallel processor system hardware By KENNETH E. BATCHER Presented by Manoj k. Yarlagadda Manoj k. Yarlagadda

Presentation Topics Parallel Processors Why Parallelism? Why Parallelism Now? EVOLUTION OF STARAN! STARAN Configuration Diagram Multi-Dimensional Access (MDA) STARAN BLOCK DIAGRAM

Parallel Processors Interconnection Networks SIMD Computers MIMD Computers Other Architectures –Dataflow and Neural Network

SIMD MIMD There are N data streams, one per processor so different data can be used in each processor. Each processor operates under the control of an instruction stream issued by its own control unit

Why Parallelism? Even though the CPU-memory connection is a bottleneck, we are still greatly interested in processor speed up. Parallelism can be used in the following are: –Simulations of complex physical systems (e.g., weather forecasting, molecular modeling) –Image processing –Massive data processing (e.g., seismic data) –Large databases

Why Parallelism Now? Parallel Processors have been available for decades, but only due to recent technological changes have they become feasible: –Evolution of ICs to current VLSI (or VVLSI) –Dramatic reduction in power requirements –Decreased cost of production –Increased speed of processors –Increased reliability of processors Current SIMD machines have up to 65,336 PEs!

EVOLUTION OF STARAN High cost of semiconductor memory and logic elements. The Versions of Associative processor (AP): 1)Built for USAF by Goodyear Aerospace Corporation  June 1969 at Akron, Ohio. 2)The same machine updated including large Instruction memory, was loaned by USAF in )The lessons learned in programming and testing the USAF AP model resulted in a new design called STARAN S which was commited to production in 1971.

…Contd 4) Demonstrations in May 1972 at TRANSPO exhibit in Washington D.C. and June, 1972 at Boston. The initial uses of AP’s would be weighted toward real- time applications involving interface with a wide variety of sensors, Conventional computers, signal processors, interactive displays and mass storage devises. To accommodate all such interfaces the STARAN was divided into

STARAN Configuration Dig Standardized main frame unit Custom interface unit: a) A variety of I/O operation includes Direct memory access (DMA) Buffered I/O channels External function channels Unique interface called Parallel I/O

MDA MEMORIES The Memory for such an associative processor could be a simple random-access memory with data rotated 90-deg, so that it is accessed by bit- slices instead of by words. The MDA memory is treated as a square array of bits, 256 words with 256 bits in each word. To Accommodate both bit-slice accesses for associative processing and word-slice accesses for STARAN input/output the Data are stored in MDA (Multi dimensional access memory)

..Contd It has Read/Write busses for parallel access to a large number of (256) of memory bits. Write mask bus for selective writing of bits. Memory accesses (Read & Write) are controlled by address & access mode controlled I/P’s

Bit-Slice & Word access modes Bit-slice used to access one bit of all words in parallel. Word-slice: used for I/O operations a) all bits of one word in parallel.

…Contd The MDA memory structure is not limited to a square array of 256 by 256. One Can access 32 Consecutive bytes of a record in parallel. One can access the corresponding bytes of all records. One can access the a bit from each byte in parallel.

STARAN ARRAY MODULES

…Contd 1)Array module components communicate through a network called flip network. 2)Selector  Chooses a 256-bit source item from MDA read bus. 3)Flip network  Which may shift & permute the bits in various ways. a) It allows the inter-PE communication. A PE can read the data from another PE directly or indirectly MDA or from registers. b) It can permute the 256-bit data item as whole or divide it into groups like 2, 4, 8, 16, 32, 64 or 128 bits. 4) Mirroring  Reduce the number of passes.

…Contd 5) Three 256-bit Registers (M,X, and Y) through a flip network. Note: X & Y-> logic registers 6) The general logic associated with the X-register can perform any 16 Boolean functions of two variables If x i is the state of the i th X-Register bit, and f I is the state of the i th flip network output Then, x i <- Ø (x i, f i ) (i = 0, 1,..., 255) Ø Boolean function Y-Register: y i <- Ø( y i, f i ) ( i = 0,1,..., 255)

4) If X & Y are operated together, the same Boolean function, F is applied to both registers. x i <- Ø (xi, fi) y i <- Ø(yi, fi) 5) The programmer also can choose to operate on X selectively, using Y as a mask: x i <- Ø(x i, f i ) (where yi = 1) x i <- x i (where y i = 0) 6) Another choice is to operate on X selectively while operating on Y: x i <- Ø (x i, f i ) (where y i = 1) x i <- x i (where y i = 0) y i <- Ø (y i, f i ) In this case, the old state of Y (before modification by f ) is used as the mask for the X operation.

Programming example This operation adds the contents of a Field A of all memory words to the contents of a Field B of the words and stores the sum in a Field S of the words. At the beginning of each loop execution, the carry (c) from the previous bits is stored in Y, and X contains zeroes: x i = 0 y i = c i Note: Start with LSB to MSB

Four steps : Step 1: Read Bit-slice a and exclusive-or (  ) it to X selectively and also to Y: x i <- x i  y i.a i y i <- y i  a i The states of X and Y are now: x i = a i.c i y i = a i  c i Step 2: Read Bit-slice b and exclusive-or it to X selectively and also to Y: x i <- x i  y i.b i Y i <- y i  b i Registers X and Y now contain the carry and sum bits: x i =a i c i  a i.b i  b i.c i = c' i y i = a i  b i  c i = s i

…Contd Step 3: Write the sum bit from Y into Bit-slice s and also complement X selectively: si <- yi xi <- xi  yi The states of X and Y are now: xi= c‘i  si yi = si Step 4: Read the X-register and exclusive-or it into both X and Y: xi <- xi  xi yi <- yi  xi clear X and store the carry bit into Y for next execution of the loop: xi = 0 yi= c‘i

STARAN BLOCK DIAGRAM Assignment switch: Connects it’s control I/P & Data I/P and outputs to AP. AP( Associative processor) : Contains Reg & logic. It receives instructions from the Control memory & transfer the data to and from Control memory.

Registers in the AP: 1) Instruction Register: To hold the 32-bit instruction being executed. 2) Program status word: To hold the CM address of the next instruction to be executed and the program priority level. 3) Common register: to hold a 32-bit search command 4) Array select Reg: to Select a subset of assigned register 5) Four field pointers: To hold MDA addresses 6) Three Counters: To keep track of number of executions of loops. 7) Data pointer : To allow stepping through a set of operands in CM. 8) Two access Mode Reg: To hold the MDA access modes

Parallel input/output module (PIO): 1) PIO flip network a) Port 0 to 3 connects to 4 Array modules b) Port 7 connects to the 32 bit data bus in PIO control through a fan-in & fan-out switch c) Port 6,5,4 are Spare (High bandwidth peripherals, Radar) 2) PIO Control unit ( Controls the array modules, FLIP) 3) Control memory ( It has 5 Banks of bipolar memory) 4) DEC/PDP-11 ( To handle the peripherals, control the system from console commands. 5) External function ( It controls AP & Sequential & PIO )

STARAN Applications Fast Fourier Transform (used in Real-time processing of radar and sonar signals) Sonar post- processing ( Signal processing & Post processing) String search (Searching a string is 100 times faster than conventional computer search.) File processing Air traffic control

Architectures for Applications Fast Fourier Transform : Speed increases over sequential computers STARAN leads itself to efficient manipulation of data in the FFT. Ex: Air Force supplied radar data to GAC By using 512-point 16-bit FFT  2.7 milli-sec( 2 MDA) 1024-point transform  3.0 milli-sec( 4 MDA) Sonar post-processing: Sorting and Editing of the signal processor output