Download presentation
Presentation is loading. Please wait.
1
Husky Energy Chair in Oil and Gas Research
Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February 17, 2004 Husky Energy Chair in Oil and Gas Research Memorial University of Newfoundland Max Address M U N - February 17, Phil Bording
2
Session 1 History of Design. Tyco Brahe. Napier
Session 1 History of Design Tyco Brahe Napier Charles Babbage – mechanical design John Atanasoff – Storage – spinning capacitor - Konrad Zuse - Floating Point Mauchley and Ekert von-Neumann Harvard memory – code memory - data Princeton memory code and data
3
Session 2 Current Design Issues. Scaling laws. Moore’s Law
Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM IBM 360/44 IBM 360/95 Array Processors the software of array processor calls
4
Application Specific Machines
M U N - February 17, Phil Bording
5
M U N - February 17, 2005 - Phil Bording
6
M U N - February 17, 2005 - Phil Bording
7
M U N - February 17, 2005 - Phil Bording
8
M U N - February 17, 2005 - Phil Bording
9
M U N - February 17, 2005 - Phil Bording
10
Computing and Calculating Engines
M U N - February 17, Phil Bording
11
M U N - February 17, 2005 - Phil Bording
Session 1 History Vector memory Pipeline Arithmetic – Array Processing Benchmark Driven Dollars Fairhair Syndrome M U N - February 17, Phil Bording
12
M U N - February 17, 2005 - Phil Bording
Processors Data Memory Alu Hardwired instructions Processor Bottleneck Memory Bottleneck Vacuum tubes Core Plated Wire Transistors LSI – 6 T Static VLSI - 2 T Dynamic M U N - February 17, Phil Bording
13
M U N - February 17, 2005 - Phil Bording
Linear Address Space Max Address Address Pointer Latency is the time to access the first word Bandwidth is the rate of accessing successive words M U N - February 17, Phil Bording
14
von Neumann Architecture Princeton
Memory Address Pointer Arithmetic Logic Unit (ALU) Data/Instructions Pc = Pc + 1 Program Counter Featuring Deterministic Execution M U N - February 17, Phil Bording
15
M U N - February 17, 2005 - Phil Bording
After Gustfason 2004 Bednar, 2004 M U N - February 17, Phil Bording
16
M U N - February 17, 2005 - Phil Bording
Bank memory design Duplicate memory system One design for subsystem Use a binary tree design to spread out addresses and data Fetch/Store many words at once Assume a sequential addressing pattern M U N - February 17, Phil Bording
17
M U N - February 17, 2005 - Phil Bording
Bank memory design The wires created a big switch between modules The slower memory access time was better matched to the faster processor times Costly to build – significant effort in engineering M U N - February 17, Phil Bording
18
M U N - February 17, 2005 - Phil Bording
19
M U N - February 17, 2005 - Phil Bording
20
M U N - February 17, 2005 - Phil Bording
21
M U N - February 17, 2005 - Phil Bording
Array memory design N rows NxN bits N columns M bits on Bus M U N - February 17, Phil Bording
22
M U N - February 17, 2005 - Phil Bording
Array memory design Streaming data flow, nibbles, bytes, and words Sequential Access First word access time = add+latency+data Successive words = data Random Access Indirect Addressing Non-uniform Strides M U N - February 17, Phil Bording
23
M U N - February 17, 2005 - Phil Bording
Benchmark Scalar operations Array operations Do loop domination of codes Vendors look seriously at instruction stream Then comes Linpack. LU decomposition If it does matrix multiply fast nothing else matters or does it??? M U N - February 17, Phil Bording
24
M U N - February 17, 2005 - Phil Bording
Fairhair Syndrome New world class machine is designed at MIT, Stanford, or Caltech Venture Capital flows in Federal Government buys 10 new machines Company goes public Vulture capitalists sell out Federal Government buys new machines from someboldy else -- the next fairhair Company has stock scandal – goes bankrupt M U N - February 17, Phil Bording
25
Session 2 Current Design Issues. Scaling laws. Moore’s Law
Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM IBM 360/44 IBM 360/95 Array Processors the software of array processor calls Programming Models vectors shared memory distributed memory
26
M U N - February 17, 2005 - Phil Bording
Lamda Rules M U N - February 17, Phil Bording
27
M U N - February 17, 2005 - Phil Bording
Division of design Company A ALU Memory Memory Weak Link ALU One Company Company B M U N - February 17, Phil Bording
28
M U N - February 17, 2005 - Phil Bording
Moore’s Laws Every 18 months the density of transistors on a VLSI chip doubles The investments of $ doubles with every new VLSI plant M U N - February 17, Phil Bording
29
M U N - February 17, 2005 - Phil Bording
Illiac 8 X 8 Processors Nearest Neighbor Connections M U N - February 17, Phil Bording
30
Parallel Ensemble Processing Elements - PEPE
Radar Processing Computer Associative Computing Data Outputs P0 Pn-3 Pn-2 Pn-1 Pn Data Inputs M U N - February 17, Phil Bording
31
M U N - February 17, 2005 - Phil Bording
IBM Machines Early 1960’s 7094, 36 bit arithmetic 1600 and 1400 processors completely different Middle 1960’s New Machine – IBM 360 36 bit words, but memory parity was added 8 bit byte + 1 bit parity Uniform business machine architectures 32 and 64 bit floating point Not any industry standard for format of floating point M U N - February 17, Phil Bording
32
M U N - February 17, 2005 - Phil Bording
Array Processors IBM and CDC designed DMA processors – Direct Memory Access Frees the main processor to compute Allows separate simple processors to do the i/o The idea translated into attached processors for arithmetic processing M U N - February 17, Phil Bording
33
M U N - February 17, 2005 - Phil Bording
Array Processors Arrays of data are moved to a local very high speed memory – fast registers Arithmetic is performed by special instructions passed to array processor CPU Array Processor M U N - February 17, Phil Bording
34
Software Design Issues
Vector Programming Cache Programming Message Passing Programming NUMA Programming Grid Programming ALL of these memory operations have a Fixed Cost Code Performance Improvements are dominated by fixed costs M U N - February 17, Phil Bording
35
Hardware Design Issues
10 Years equals 100 Fold Speedup Memory Latency – cost of getting the first word is a constant Wires have failed to scale Bigger cache memories are slower Code Performance Improvements are dominated by fixed costs M U N - February 17, Phil Bording
36
M U N - February 17, 2005 - Phil Bording
Linear Address Space Max Address Address Pointer Latency is the time to access the first word Bandwidth is the rate of accessing successive words M U N - February 17, Phil Bording
37
von Neumann Architecture Princeton
Memory Address Pointer Arithmetic Logic Unit (ALU) Data/Instructions Pc = Pc + 1 Program Counter Featuring Deterministic Execution M U N - February 17, Phil Bording
38
Cache Memory Architecture
N T R L Memory Main Memory is large and slow. Cache is much smaller and much faster. Control logic control keeps the main memory coherent. Cache Memory Address Pointer Featuring Non-Deterministic Execution M U N - February 17, Phil Bording
39
Cache Memory - Three Levels Architecture
Multi- Gigabytes Large and Slow 160 X Cache Control Logic 2 Gigahertz Clock 2X 8X 16X L3 Cache Memory L2 Cache Memory L1 Cache Memory 32 Kilobytes 128 Kilobytes 16 Megabytes Featuring Really Non-Deterministic Execution Address Pointer M U N - February 17, Phil Bording
40
Programming Models for Parallel Computing
M U N - February 17, Phil Bording
41
Distributed Computing Message Passing Interface
Program Address Spaces Max Max Max Max Multiple Address Pointers M U N - February 17, Phil Bording
42
Distributed Computing with Message Passing
Program Address Spaces Messages Left and Right Multiple Address Pointers M U N - February 17, Phil Bording
43
M U N - February 17, 2005 - Phil Bording
44
Multi-Threading OpenMP Programming Model
Global Program Address Space Local Local Local Local n-1 n n-1 2n n-1 3n n-1 Address and Cache Bus with Conflict Resolution Multiple Address Pointers M U N - February 17, Phil Bording
45
Uniqueness of Store Multi-Threading
Program Address Space Multiple Address Pointers Duplicate Pointers to the same Location – Conflict on storing a result So who is managing the multiple pointers? It is the programmers responsibility. M U N - February 17, Phil Bording
46
Multiple Bank Memory Systems
Memory Banks Bank Starting Address N N N Mod 4 Vector Programming Model M U N - February 17, Phil Bording
47
M U N - February 17, 2005 - Phil Bording
Trends in Technology M U N - February 17, Phil Bording
48
M U N - February 17, 2005 - Phil Bording
A 256 Node SMP Linux Cluster (2001) 512 CPU, 512GB, 6TB SCSI, TB Local, GB Ethernet Imagine 20 of these in one room. Bednar, 2004 M U N - February 17, Phil Bording
49
SIZE, COST, and HEAT The EARTH Simulator 3 Megawatts 500 Million US $
It doesn’t simulte global warming, IT CAUSES IT! M U N - February 17, Phil Bording Bednar, 2004
50
M U N - February 17, 2005 - Phil Bording
51
M U N - February 17, 2005 - Phil Bording
52
M U N - February 17, 2005 - Phil Bording
S L O W E R M U N - February 17, Phil Bording
53
M U N - February 17, 2005 - Phil Bording
54
M U N - February 17, 2005 - Phil Bording
After Gustfason 2004 Bednar, 2004 M U N - February 17, Phil Bording
55
M U N - February 17, 2005 - Phil Bording
GAP M U N - February 17, Phil Bording
56
Computational Earth Sciences
M U N - February 17, Phil Bording
57
Atmospheric Modeling and Data Assimilation at the DAO
Robert Atlas and the DAO Team Data Assimilation Office, NASA/GSFC IWG, November 2001
58
M U N - February 17, 2005 - Phil Bording
The f-v Dynamical Core Terrain following Lagrangian control-volume vertical discretization of the basic conservation laws: Mass Momentum Total energy 2D horizontal flux-form semi-Lagrangian discretization Genuinely conservative Gibbs oscillation free Absolute vorticity consistently transported with mass dp within the Lagrangian layers. Computationally efficient M U N - February 17, Phil Bording
59
Computational Performance
M U N - February 17, Phil Bording
60
Progression in model resolution
1990s: 2o X 2.5o (220 km) 2000: 1o X 1.25o (110 km) 2002: 0.5o X 0.625o (55 km) 2004: o X 0.36o (28 km) 2006: Geodesic grid 20 km : up to 10 km – hydrostatic assumption starts to break down; this is the transition period to non-hydrostatic dynamics : revolution in computing technology is to take place 2025: global non-hydrostatic cloud-resolving model with 1 km or finer resolution; capable of resolving individual thunderstorms Slides from Bob Atlas Presentation
61
Numerical Problem Solving
M U N - February 17, Phil Bording
62
Problem Solving – 3D Example of Array Addressing
Finite Differences – 3D Array Large Memory Requirement Wave Propagation FD-Time Domain Algorithm Psi(i,j,k) = Physical Variables; ? How do we address memory?? Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording
63
Problem Solving – 3D Example of Array Addressing
i-1,j,k i,j,k i+1,j,k Grid Points Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording
64
Array Addressing by Dimension
1D Array Psi(Lx) Address = (i-1) + base Stride One Data 2D Array Psi(Lx,Ly) Address = (j-1)*Lx+(i-1) + base Stride N Data Stride N*N Data 3D Array Psi(Lx,Ly,Lz) Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording
65
Cache Memory Access Streams
1D Streams – 100% 1D +/ % 2D +/ % 2D +/-N % 2D +/-1 +/-N % M U N - February 17, Phil Bording
66
Cache Memory Access Streams
3D +/ % 3D +/-N % 3D +/-N*N % 3D ALL % M U N - February 17, Phil Bording
67
One Big One versus Many Little Ones
M U N - February 17, Phil Bording
68
Futures of Micro-poor processors
Lots of arithmetic capability, very hard to use Market forces will make them good at painting bit maps on screens M U N - February 17, Phil Bording
69
Futures of Micro-poor processors
No relief in Memory Subsystem Design, prefetch will help but not nearly enough A million will cost a Billion, $$$ M U N - February 17, Phil Bording
70
Futures of Micro-poor processors and the Big Switch
The Big Switch is the hot spot and no relief is in sight. No telling what the switch will cost?? M U N - February 17, Phil Bording
71
Seismic Modeling and the Inverse Problem
M U N - February 17, Phil Bording
72
M U N - February 17, 2005 - Phil Bording
73
M U N - February 17, 2005 - Phil Bording
12 Streamers x 5.1 Kilometers Long Data collected for 70 continuous days Over 2300 Square Km. M U N - February 17, Phil Bording
74
M U N - February 17, 2005 - Phil Bording
3D Seismic Modeling Large Scale 3D ~200+ Wave Lengths Acoustic and Elastic Wave Equations In-Homogeneous Earth has widely varying parameters. Complexity limits use of 3D elastic modeling Problem Scale Nx=Ny=Nz ~ 1000 Ntime ~ 10,000 Work per Grid Point ~ 100 Number of Seismic Shots per Survey ~ 100,000 Single Survey Simulation is 10^20 Operations. M U N - February 17, Phil Bording
75
The Babbage Difference Engine, circa 1853
M U N - February 17, Phil Bording
76
Wave Equation Difference Engine (WEDE) for Seismic Modeling
Four Processors Acoustic Wave Equation My PhD thesis project at the University of Tulsa M U N - February 17, Phil Bording
77
Wave Equation Difference Engine
Finite Differences Elastic or Acoustic Wave Equations Regular Grids Sponge/One-Way Wave Equation Boundary Conditions Any Source/Receiver Geometry Explicit 4th order in Time & 8th order in Space? M U N - February 17, Phil Bording
78
Wave Equation Difference Engine
No Cache Memory Deterministic Execution Not a MIMD or SIMD or Data Flow Data movement and control matches the algorithm Each grid point has control word Three levels of parallelism, ( Amount of Parallelism) Instruction trees, ~ Multiple Instructions with selection, ~2-3 Multiple Grid points, ~Hundreds of Thousands M U N - February 17, Phil Bording
79
Acoustic, Constant Density
Density is so constant it does not appear in the equation. C is the P Wave Velocity. The source energy is in src. Psi is the wave field. M U N - February 17, Phil Bording
80
Wave Equation Difference Engine
Machine Performance 100 operations in pipeline 1,000,000 grid point processors 100 Megahertz Clock 10^16 Operations per second M U N - February 17, Phil Bording
81
M U N - February 17, 2005 - Phil Bording
82
M U N - February 17, 2005 - Phil Bording
83
Application Specific Parallel Computing
Choose carefully an application which is BIG. Find an algorithm which is suitable. Good data locality. Regular structure in data movement High memory data transfers Map the algorithm into hardware M U N - February 17, Phil Bording
84
Application Specific Parallel Computing What it is not!
Not suitable for just any algorithm Not general purpose, we will have an efficient but specific memory subsystem. Does not match the alphabet soup, SIMD, MIMD,NUMA, etc M U N - February 17, Phil Bording
85
What do ASP machines need??
VLSI Design Team, fabless and good? Clever Architect for the problem. A very good memory design! M U N - February 17, Phil Bording
86
What do ASP machines do away with??
Language Compilers Outdated junk in the processor design, x86! Cache memories! Non-deterministic execution! M U N - February 17, Phil Bording
87
Multiple Bank Memory Systems
Memory Banks Bank Starting Address N N N Mod 4 As many as are needed!!!! M U N - February 17, Phil Bording
88
Pipelined Instruction Trees
Each higher level offers parallel operations Pipeline assumes all registers are loaded every cycle Hardwired?? Actually today the instruction trees could be re-configurable using re-programmable cells!!! r = a+b-x*y M U N - February 17, Phil Bording
89
Pipelined Instruction Trees
a a b x y b d y + - * * - + Multiple Trees offer the second level of Parallelism M U N - February 17, Phil Bording
90
Three Levels of Parallelism
Instruction Trees, Multiple Levels Multiple Results Multiple Grid Point Processors M U N - February 17, Phil Bording
91
M U N - February 17, 2005 - Phil Bording
Wave Machine M U N - February 17, Phil Bording
92
M U N - February 17, 2005 - Phil Bording
Imaging Machine M U N - February 17, Phil Bording
93
M U N - February 17, 2005 - Phil Bording
Wave Equation a) 8th or 10th Order in space b) 4th Order in time, tricky but possible c) Sponge Boundary Conditions, slowly varying weights along sides d) Nominal flat topography, new schemes are building in topography e) Any seismic source location, any geophone location M U N - February 17, Phil Bording
94
M U N - February 17, 2005 - Phil Bording
Elastic Wave Equation a) Grid point work is about 100 operations b) About 20,000 time steps per shot c) 200 Wavelengths gives about 160,000 geophone locations d) Traces have 4096 samples, 2 milliseconds, could be 1 ms. M U N - February 17, Phil Bording
95
M U N - February 17, 2005 - Phil Bording
Elastic Wave Equation Shots are placed at twice the receiver spacing Number of shots equals 40,000 Model Frequency is velocity dependent, assume something on the order of 60 hertz. M U N - February 17, Phil Bording
96
M U N - February 17, 2005 - Phil Bording
Economics Up Front Fixed Cost, $5 to $ 10 Million Each ASP Chip is $5 to 10 A Petaflop for $5 or $10 Million M U N - February 17, Phil Bording
97
M U N - February 17, 2005 - Phil Bording
Economics Seismic Shot takes 0.1 seconds 5 Year life is 50,000 Models A realistic 3D elastic seismic model would cost $200 M U N - February 17, Phil Bording
98
M U N - February 17, 2005 - Phil Bording
Comparison 10 Clusters ~ $10 Million 10 models per year One Waves in Linear Motion Analyzer (WILMA) ~$10 Million 10,000 models per year M U N - February 17, Phil Bording
99
Comparison Waves in Linear Motion Analyzer 1000X faster
For the same money!. M U N - February 17, Phil Bording
100
M U N - February 17, 2005 - Phil Bording
Summary 1000 Megawatts is a good sized power station Good memory design is worth the money! Removing the obstacles to efficient computing gives sustainable performance M U N - February 17, Phil Bording
101
M U N - February 17, 2005 - Phil Bording
Summary Slower is better. Less power is better. High Efficiency is better. M U N - February 17, Phil Bording
102
M U N - February 17, 2005 - Phil Bording
Conclusions Deterministic Computing is important for performance……… Application Specific Computing is a good fit for the wave equation….. And very cost effective……….. M U N - February 17, Phil Bording
103
M U N - February 17, 2005 - Phil Bording
Thanks SEG – Continuing Education Memorial University of Newfoundland M U N - February 17, Phil Bording
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.