Husky Energy Chair in Oil and Gas Research Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February 17, 2004 Husky Energy Chair in Oil and Gas Research Memorial University of Newfoundland 0 Max Address M U N - February 17, 2005 - Phil Bording
Session 1 History of Design. Tyco Brahe. Napier Session 1 History of Design Tyco Brahe Napier Charles Babbage – mechanical design John Atanasoff – Storage – spinning capacitor - Konrad Zuse - Floating Point Mauchley and Ekert von-Neumann Harvard memory – code memory - data Princeton memory code and data
Session 2 Current Design Issues. Scaling laws. Moore’s Law Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM 7094 IBM 360/44 IBM 360/95 Array Processors the software of array processor calls Programming Models vectors shared memory distributed memory
M U N - February 17, 2005 - Phil Bording Lamda Rules M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording Division of design Company A ALU Memory Memory Weak Link ALU One Company Company B M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording Moore’s Laws Every 18 months the density of transistors on a VLSI chip doubles The investments of $ doubles with every new VLSI plant M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording Illiac 8 X 8 Processors Nearest Neighbor Connections M U N - February 17, 2005 - Phil Bording
Parallel Ensemble Processing Elements - PEPE Radar Processing Computer Associative Computing Data Outputs . . . . P0 Pn-3 Pn-2 Pn-1 Pn Data Inputs M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording IBM Machines Early 1960’s 7094, 36 bit arithmetic 1600 and 1400 processors completely different Middle 1960’s New Machine – IBM 360 36 bit words, but memory parity was added 8 bit byte + 1 bit parity Uniform business machine architectures 32 and 64 bit floating point Not any industry standard for format of floating point M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording Array Processors IBM and CDC designed DMA processors – Direct Memory Access Frees the main processor to compute Allows separate simple processors to do the i/o The idea translated into attached processors for arithmetic processing M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording Array Processors Arrays of data are moved to a local very high speed memory – fast registers Arithmetic is performed by special instructions passed to array processor CPU Array Processor M U N - February 17, 2005 - Phil Bording
Software Design Issues Vector Programming Cache Programming Message Passing Programming NUMA Programming Grid Programming ALL of these memory operations have a Fixed Cost Code Performance Improvements are dominated by fixed costs M U N - February 17, 2005 - Phil Bording
Hardware Design Issues 10 Years equals 100 Fold Speedup Memory Latency – cost of getting the first word is a constant Wires have failed to scale Bigger cache memories are slower Code Performance Improvements are dominated by fixed costs M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording Linear Address Space 0 Max Address Address Pointer Latency is the time to access the first word Bandwidth is the rate of accessing successive words M U N - February 17, 2005 - Phil Bording
von Neumann Architecture Princeton Memory Address Pointer Arithmetic Logic Unit (ALU) Data/Instructions Pc = Pc + 1 Program Counter Featuring Deterministic Execution M U N - February 17, 2005 - Phil Bording
Cache Memory Architecture N T R L Memory Main Memory is large and slow. Cache is much smaller and much faster. Control logic control keeps the main memory coherent. Cache Memory Address Pointer Featuring Non-Deterministic Execution M U N - February 17, 2005 - Phil Bording
Cache Memory - Three Levels Architecture Multi- Gigabytes Large and Slow 160 X Cache Control Logic 2 Gigahertz Clock 2X 8X 16X L3 Cache Memory L2 Cache Memory L1 Cache Memory 32 Kilobytes 128 Kilobytes 16 Megabytes Featuring Really Non-Deterministic Execution Address Pointer M U N - February 17, 2005 - Phil Bording
Programming Models for Parallel Computing M U N - February 17, 2005 - Phil Bording
Distributed Computing Message Passing Interface Program Address Spaces 0 Max 0 Max 0 Max 0 Max Multiple Address Pointers M U N - February 17, 2005 - Phil Bording
Distributed Computing with Message Passing Program Address Spaces Messages Left and Right Multiple Address Pointers M U N - February 17, 2005 - Phil Bording
M U N - February 17, 2005 - Phil Bording
Multi-Threading OpenMP Programming Model Global Program Address Space Local Local Local Local 0 n-1 n 2n-1 2n 3n-1 3n 4n-1 Address and Cache Bus with Conflict Resolution Multiple Address Pointers M U N - February 17, 2005 - Phil Bording
Uniqueness of Store Multi-Threading Program Address Space Multiple Address Pointers Duplicate Pointers to the same Location – Conflict on storing a result So who is managing the multiple pointers? It is the programmers responsibility. M U N - February 17, 2005 - Phil Bording
Multiple Bank Memory Systems Memory Banks Bank 0 1 2 3 Starting + 1 +2 +3 Address +N +2N +3N Mod 4 Vector Programming Model M U N - February 17, 2005 - Phil Bording