Husky Energy Chair in Oil and Gas Research

Husky Energy Chair in Oil and Gas Research
Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February 17, 2004 Husky Energy Chair in Oil and Gas Research Memorial University of Newfoundland Max Address M U N - February 17, Phil Bording

Session 1 History of Design. Tyco Brahe. Napier
Session 1 History of Design Tyco Brahe Napier Charles Babbage – mechanical design John Atanasoff – Storage – spinning capacitor - Konrad Zuse - Floating Point Mauchley and Ekert von-Neumann Harvard memory – code memory - data Princeton memory code and data

Session 2 Current Design Issues. Scaling laws. Moore’s Law
Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM IBM 360/44 IBM 360/95 Array Processors the software of array processor calls

Application Specific Machines
M U N - February 17, Phil Bording

M U N - February 17, 2005 - Phil Bording

Computing and Calculating Engines

Session 1 History Vector memory Pipeline Arithmetic – Array Processing Benchmark Driven Dollars Fairhair Syndrome M U N - February 17, Phil Bording

Processors Data Memory Alu Hardwired instructions Processor Bottleneck Memory Bottleneck Vacuum tubes Core Plated Wire Transistors LSI – 6 T Static VLSI - 2 T Dynamic M U N - February 17, Phil Bording

Linear Address Space Max Address Address Pointer Latency is the time to access the first word Bandwidth is the rate of accessing successive words M U N - February 17, Phil Bording

von Neumann Architecture Princeton
Memory Address Pointer Arithmetic Logic Unit (ALU) Data/Instructions Pc = Pc + 1 Program Counter Featuring Deterministic Execution M U N - February 17, Phil Bording

After Gustfason 2004 Bednar, 2004 M U N - February 17, Phil Bording

Bank memory design Duplicate memory system One design for subsystem Use a binary tree design to spread out addresses and data Fetch/Store many words at once Assume a sequential addressing pattern M U N - February 17, Phil Bording

Bank memory design The wires created a big switch between modules The slower memory access time was better matched to the faster processor times Costly to build – significant effort in engineering M U N - February 17, Phil Bording

Array memory design N rows NxN bits N columns M bits on Bus M U N - February 17, Phil Bording

Array memory design Streaming data flow, nibbles, bytes, and words Sequential Access First word access time = add+latency+data Successive words = data Random Access Indirect Addressing Non-uniform Strides M U N - February 17, Phil Bording

Benchmark Scalar operations Array operations Do loop domination of codes Vendors look seriously at instruction stream Then comes Linpack. LU decomposition If it does matrix multiply fast nothing else matters or does it??? M U N - February 17, Phil Bording

Fairhair Syndrome New world class machine is designed at MIT, Stanford, or Caltech Venture Capital flows in Federal Government buys 10 new machines Company goes public Vulture capitalists sell out Federal Government buys new machines from someboldy else -- the next fairhair Company has stock scandal – goes bankrupt M U N - February 17, Phil Bording

Session 2 Current Design Issues. Scaling laws. Moore’s Law
Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM IBM 360/44 IBM 360/95 Array Processors the software of array processor calls Programming Models vectors shared memory distributed memory

Lamda Rules M U N - February 17, Phil Bording

Division of design Company A ALU Memory Memory Weak Link ALU One Company Company B M U N - February 17, Phil Bording

Moore’s Laws Every 18 months the density of transistors on a VLSI chip doubles The investments of $ doubles with every new VLSI plant M U N - February 17, Phil Bording

Illiac 8 X 8 Processors Nearest Neighbor Connections M U N - February 17, Phil Bording

Parallel Ensemble Processing Elements - PEPE
Radar Processing Computer Associative Computing Data Outputs P0 Pn-3 Pn-2 Pn-1 Pn Data Inputs M U N - February 17, Phil Bording

IBM Machines Early 1960’s 7094, 36 bit arithmetic 1600 and 1400 processors completely different Middle 1960’s New Machine – IBM 360 36 bit words, but memory parity was added 8 bit byte + 1 bit parity Uniform business machine architectures 32 and 64 bit floating point Not any industry standard for format of floating point M U N - February 17, Phil Bording

Array Processors IBM and CDC designed DMA processors – Direct Memory Access Frees the main processor to compute Allows separate simple processors to do the i/o The idea translated into attached processors for arithmetic processing M U N - February 17, Phil Bording

Array Processors Arrays of data are moved to a local very high speed memory – fast registers Arithmetic is performed by special instructions passed to array processor CPU Array Processor M U N - February 17, Phil Bording

Software Design Issues
Vector Programming Cache Programming Message Passing Programming NUMA Programming Grid Programming ALL of these memory operations have a Fixed Cost Code Performance Improvements are dominated by fixed costs M U N - February 17, Phil Bording

Hardware Design Issues
10 Years equals 100 Fold Speedup Memory Latency – cost of getting the first word is a constant Wires have failed to scale Bigger cache memories are slower Code Performance Improvements are dominated by fixed costs M U N - February 17, Phil Bording

Linear Address Space Max Address Address Pointer Latency is the time to access the first word Bandwidth is the rate of accessing successive words M U N - February 17, Phil Bording

von Neumann Architecture Princeton
Memory Address Pointer Arithmetic Logic Unit (ALU) Data/Instructions Pc = Pc + 1 Program Counter Featuring Deterministic Execution M U N - February 17, Phil Bording

Cache Memory Architecture
N T R L Memory Main Memory is large and slow. Cache is much smaller and much faster. Control logic control keeps the main memory coherent. Cache Memory Address Pointer Featuring Non-Deterministic Execution M U N - February 17, Phil Bording

Cache Memory - Three Levels Architecture
Multi- Gigabytes Large and Slow 160 X Cache Control Logic 2 Gigahertz Clock 2X 8X 16X L3 Cache Memory L2 Cache Memory L1 Cache Memory 32 Kilobytes 128 Kilobytes 16 Megabytes Featuring Really Non-Deterministic Execution Address Pointer M U N - February 17, Phil Bording

Programming Models for Parallel Computing

Distributed Computing Message Passing Interface
Program Address Spaces Max Max Max Max Multiple Address Pointers M U N - February 17, Phil Bording

Distributed Computing with Message Passing
Program Address Spaces Messages Left and Right Multiple Address Pointers M U N - February 17, Phil Bording

Multi-Threading OpenMP Programming Model
Global Program Address Space Local Local Local Local n-1 n n-1 2n n-1 3n n-1 Address and Cache Bus with Conflict Resolution Multiple Address Pointers M U N - February 17, Phil Bording

Uniqueness of Store Multi-Threading
Program Address Space Multiple Address Pointers Duplicate Pointers to the same Location – Conflict on storing a result So who is managing the multiple pointers? It is the programmers responsibility. M U N - February 17, Phil Bording

Multiple Bank Memory Systems
Memory Banks Bank Starting Address N N N Mod 4 Vector Programming Model M U N - February 17, Phil Bording

Trends in Technology M U N - February 17, Phil Bording

A 256 Node SMP Linux Cluster (2001) 512 CPU, 512GB, 6TB SCSI, TB Local, GB Ethernet Imagine 20 of these in one room. Bednar, 2004 M U N - February 17, Phil Bording

SIZE, COST, and HEAT The EARTH Simulator 3 Megawatts 500 Million US $
It doesn’t simulte global warming, IT CAUSES IT! M U N - February 17, Phil Bording Bednar, 2004

S L O W E R M U N - February 17, Phil Bording

After Gustfason 2004 Bednar, 2004 M U N - February 17, Phil Bording

GAP M U N - February 17, Phil Bording

Computational Earth Sciences

Atmospheric Modeling and Data Assimilation at the DAO
Robert Atlas and the DAO Team Data Assimilation Office, NASA/GSFC IWG, November 2001

The f-v Dynamical Core Terrain following Lagrangian control-volume vertical discretization of the basic conservation laws: Mass Momentum Total energy 2D horizontal flux-form semi-Lagrangian discretization Genuinely conservative Gibbs oscillation free Absolute vorticity consistently transported with mass dp within the Lagrangian layers. Computationally efficient M U N - February 17, Phil Bording

Computational Performance

Progression in model resolution
1990s: 2o X 2.5o (220 km) 2000: 1o X 1.25o (110 km) 2002: 0.5o X 0.625o (55 km) 2004: o X 0.36o (28 km) 2006: Geodesic grid 20 km : up to 10 km – hydrostatic assumption starts to break down; this is the transition period to non-hydrostatic dynamics : revolution in computing technology is to take place 2025: global non-hydrostatic cloud-resolving model with 1 km or finer resolution; capable of resolving individual thunderstorms Slides from Bob Atlas Presentation

Numerical Problem Solving

Problem Solving – 3D Example of Array Addressing
Finite Differences – 3D Array Large Memory Requirement Wave Propagation FD-Time Domain Algorithm Psi(i,j,k) = Physical Variables; ? How do we address memory?? Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording

Problem Solving – 3D Example of Array Addressing
i-1,j,k i,j,k i+1,j,k Grid Points Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording

Array Addressing by Dimension
1D Array Psi(Lx) Address = (i-1) + base Stride One Data 2D Array Psi(Lx,Ly) Address = (j-1)*Lx+(i-1) + base Stride N Data Stride N*N Data 3D Array Psi(Lx,Ly,Lz) Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording

Cache Memory Access Streams
1D Streams – 100% 1D +/ % 2D +/ % 2D +/-N % 2D +/-1 +/-N % M U N - February 17, Phil Bording

Cache Memory Access Streams
3D +/ % 3D +/-N % 3D +/-N*N % 3D ALL % M U N - February 17, Phil Bording

One Big One versus Many Little Ones

Futures of Micro-poor processors
Lots of arithmetic capability, very hard to use Market forces will make them good at painting bit maps on screens M U N - February 17, Phil Bording

Futures of Micro-poor processors
No relief in Memory Subsystem Design, prefetch will help but not nearly enough A million will cost a Billion, $$$ M U N - February 17, Phil Bording

Futures of Micro-poor processors and the Big Switch
The Big Switch is the hot spot and no relief is in sight. No telling what the switch will cost?? M U N - February 17, Phil Bording

Seismic Modeling and the Inverse Problem

12 Streamers x 5.1 Kilometers Long Data collected for 70 continuous days Over 2300 Square Km. M U N - February 17, Phil Bording

3D Seismic Modeling Large Scale 3D ~200+ Wave Lengths Acoustic and Elastic Wave Equations In-Homogeneous Earth has widely varying parameters. Complexity limits use of 3D elastic modeling Problem Scale Nx=Ny=Nz ~ 1000 Ntime ~ 10,000 Work per Grid Point ~ 100 Number of Seismic Shots per Survey ~ 100,000 Single Survey Simulation is 10^20 Operations. M U N - February 17, Phil Bording

The Babbage Difference Engine, circa 1853

Wave Equation Difference Engine (WEDE) for Seismic Modeling
Four Processors Acoustic Wave Equation My PhD thesis project at the University of Tulsa M U N - February 17, Phil Bording

Wave Equation Difference Engine
Finite Differences Elastic or Acoustic Wave Equations Regular Grids Sponge/One-Way Wave Equation Boundary Conditions Any Source/Receiver Geometry Explicit 4th order in Time & 8th order in Space? M U N - February 17, Phil Bording

No Cache Memory Deterministic Execution Not a MIMD or SIMD or Data Flow Data movement and control matches the algorithm Each grid point has control word Three levels of parallelism, ( Amount of Parallelism) Instruction trees, ~ Multiple Instructions with selection, ~2-3 Multiple Grid points, ~Hundreds of Thousands M U N - February 17, Phil Bording

Acoustic, Constant Density
Density is so constant it does not appear in the equation. C is the P Wave Velocity. The source energy is in src. Psi is the wave field. M U N - February 17, Phil Bording

Machine Performance 100 operations in pipeline 1,000,000 grid point processors 100 Megahertz Clock 10^16 Operations per second M U N - February 17, Phil Bording

Application Specific Parallel Computing
Choose carefully an application which is BIG. Find an algorithm which is suitable. Good data locality. Regular structure in data movement High memory data transfers Map the algorithm into hardware M U N - February 17, Phil Bording

Application Specific Parallel Computing What it is not!
Not suitable for just any algorithm Not general purpose, we will have an efficient but specific memory subsystem. Does not match the alphabet soup, SIMD, MIMD,NUMA, etc M U N - February 17, Phil Bording

What do ASP machines need??
VLSI Design Team, fabless and good? Clever Architect for the problem. A very good memory design! M U N - February 17, Phil Bording

What do ASP machines do away with??
Language Compilers Outdated junk in the processor design, x86! Cache memories! Non-deterministic execution! M U N - February 17, Phil Bording

Multiple Bank Memory Systems
Memory Banks Bank Starting Address N N N Mod 4 As many as are needed!!!! M U N - February 17, Phil Bording

Pipelined Instruction Trees
Each higher level offers parallel operations Pipeline assumes all registers are loaded every cycle Hardwired?? Actually today the instruction trees could be re-configurable using re-programmable cells!!! r = a+b-x*y M U N - February 17, Phil Bording

Pipelined Instruction Trees
a a b x y b d y + - * * - + Multiple Trees offer the second level of Parallelism M U N - February 17, Phil Bording

Three Levels of Parallelism
Instruction Trees, Multiple Levels Multiple Results Multiple Grid Point Processors M U N - February 17, Phil Bording

Wave Machine M U N - February 17, Phil Bording

Imaging Machine M U N - February 17, Phil Bording

Wave Equation a) 8th or 10th Order in space b) 4th Order in time, tricky but possible c) Sponge Boundary Conditions, slowly varying weights along sides d) Nominal flat topography, new schemes are building in topography e) Any seismic source location, any geophone location M U N - February 17, Phil Bording

Elastic Wave Equation a) Grid point work is about 100 operations b) About 20,000 time steps per shot c) 200 Wavelengths gives about 160,000 geophone locations d) Traces have 4096 samples, 2 milliseconds, could be 1 ms. M U N - February 17, Phil Bording

Elastic Wave Equation Shots are placed at twice the receiver spacing Number of shots equals 40,000 Model Frequency is velocity dependent, assume something on the order of 60 hertz. M U N - February 17, Phil Bording

Economics Up Front Fixed Cost, $5 to $ 10 Million Each ASP Chip is $5 to 10 A Petaflop for $5 or $10 Million M U N - February 17, Phil Bording

Economics Seismic Shot takes 0.1 seconds 5 Year life is 50,000 Models A realistic 3D elastic seismic model would cost $200 M U N - February 17, Phil Bording

Comparison 10 Clusters ~ $10 Million 10 models per year One Waves in Linear Motion Analyzer (WILMA) ~$10 Million 10,000 models per year M U N - February 17, Phil Bording

Comparison Waves in Linear Motion Analyzer 1000X faster
For the same money!. M U N - February 17, Phil Bording

Summary 1000 Megawatts is a good sized power station Good memory design is worth the money! Removing the obstacles to efficient computing gives sustainable performance M U N - February 17, Phil Bording

Summary Slower is better. Less power is better. High Efficiency is better. M U N - February 17, Phil Bording

Conclusions Deterministic Computing is important for performance……… Application Specific Computing is a good fit for the wave equation….. And very cost effective……….. M U N - February 17, Phil Bording

Thanks SEG – Continuing Education Memorial University of Newfoundland M U N - February 17, Phil Bording

Husky Energy Chair in Oil and Gas Research

Similar presentations

Presentation on theme: "Husky Energy Chair in Oil and Gas Research"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Husky Energy Chair in Oil and Gas Research

Similar presentations

Presentation on theme: "Husky Energy Chair in Oil and Gas Research"— Presentation transcript:

Similar presentations

About project

Feedback