Presentation is loading. Please wait.

Presentation is loading. Please wait.

Husky Energy Chair in Oil and Gas Research

Similar presentations


Presentation on theme: "Husky Energy Chair in Oil and Gas Research"— Presentation transcript:

1 Husky Energy Chair in Oil and Gas Research
Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February 17, 2004 Husky Energy Chair in Oil and Gas Research Memorial University of Newfoundland Max Address M U N - February 17, Phil Bording

2 Session 1 History of Design. Tyco Brahe. Napier
Session 1 History of Design Tyco Brahe Napier Charles Babbage – mechanical design John Atanasoff – Storage – spinning capacitor - Konrad Zuse - Floating Point Mauchley and Ekert von-Neumann Harvard memory – code memory - data Princeton memory code and data

3 Session 2 Current Design Issues. Scaling laws. Moore’s Law
Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM IBM 360/44 IBM 360/95 Array Processors the software of array processor calls

4 Application Specific Machines
M U N - February 17, Phil Bording

5 M U N - February 17, 2005 - Phil Bording

6 M U N - February 17, 2005 - Phil Bording

7 M U N - February 17, 2005 - Phil Bording

8 M U N - February 17, 2005 - Phil Bording

9 M U N - February 17, 2005 - Phil Bording

10 Computing and Calculating Engines
M U N - February 17, Phil Bording

11 M U N - February 17, 2005 - Phil Bording
Session 1 History Vector memory Pipeline Arithmetic – Array Processing Benchmark Driven Dollars Fairhair Syndrome M U N - February 17, Phil Bording

12 M U N - February 17, 2005 - Phil Bording
Processors Data Memory Alu Hardwired instructions Processor Bottleneck Memory Bottleneck Vacuum tubes Core Plated Wire Transistors LSI – 6 T Static VLSI - 2 T Dynamic M U N - February 17, Phil Bording

13 M U N - February 17, 2005 - Phil Bording
Linear Address Space Max Address Address Pointer Latency is the time to access the first word Bandwidth is the rate of accessing successive words M U N - February 17, Phil Bording

14 von Neumann Architecture Princeton
Memory Address Pointer Arithmetic Logic Unit (ALU) Data/Instructions Pc = Pc + 1 Program Counter Featuring Deterministic Execution M U N - February 17, Phil Bording

15 M U N - February 17, 2005 - Phil Bording
After Gustfason 2004 Bednar, 2004 M U N - February 17, Phil Bording

16 M U N - February 17, 2005 - Phil Bording
Bank memory design Duplicate memory system One design for subsystem Use a binary tree design to spread out addresses and data Fetch/Store many words at once Assume a sequential addressing pattern M U N - February 17, Phil Bording

17 M U N - February 17, 2005 - Phil Bording
Bank memory design The wires created a big switch between modules The slower memory access time was better matched to the faster processor times Costly to build – significant effort in engineering M U N - February 17, Phil Bording

18 M U N - February 17, 2005 - Phil Bording

19 M U N - February 17, 2005 - Phil Bording

20 M U N - February 17, 2005 - Phil Bording

21 M U N - February 17, 2005 - Phil Bording
Array memory design N rows NxN bits N columns M bits on Bus M U N - February 17, Phil Bording

22 M U N - February 17, 2005 - Phil Bording
Array memory design Streaming data flow, nibbles, bytes, and words Sequential Access First word access time = add+latency+data Successive words = data Random Access Indirect Addressing Non-uniform Strides M U N - February 17, Phil Bording

23 M U N - February 17, 2005 - Phil Bording
Benchmark Scalar operations Array operations Do loop domination of codes Vendors look seriously at instruction stream Then comes Linpack. LU decomposition If it does matrix multiply fast nothing else matters or does it??? M U N - February 17, Phil Bording

24 M U N - February 17, 2005 - Phil Bording
Fairhair Syndrome New world class machine is designed at MIT, Stanford, or Caltech Venture Capital flows in Federal Government buys 10 new machines Company goes public Vulture capitalists sell out Federal Government buys new machines from someboldy else -- the next fairhair Company has stock scandal – goes bankrupt M U N - February 17, Phil Bording

25 Session 2 Current Design Issues. Scaling laws. Moore’s Law
Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM IBM 360/44 IBM 360/95 Array Processors the software of array processor calls Programming Models vectors shared memory distributed memory

26 M U N - February 17, 2005 - Phil Bording
Lamda Rules M U N - February 17, Phil Bording

27 M U N - February 17, 2005 - Phil Bording
Division of design Company A ALU Memory Memory Weak Link ALU One Company Company B M U N - February 17, Phil Bording

28 M U N - February 17, 2005 - Phil Bording
Moore’s Laws Every 18 months the density of transistors on a VLSI chip doubles The investments of $ doubles with every new VLSI plant M U N - February 17, Phil Bording

29 M U N - February 17, 2005 - Phil Bording
Illiac 8 X 8 Processors Nearest Neighbor Connections M U N - February 17, Phil Bording

30 Parallel Ensemble Processing Elements - PEPE
Radar Processing Computer Associative Computing Data Outputs P0 Pn-3 Pn-2 Pn-1 Pn Data Inputs M U N - February 17, Phil Bording

31 M U N - February 17, 2005 - Phil Bording
IBM Machines Early 1960’s 7094, 36 bit arithmetic 1600 and 1400 processors completely different Middle 1960’s New Machine – IBM 360 36 bit words, but memory parity was added 8 bit byte + 1 bit parity Uniform business machine architectures 32 and 64 bit floating point Not any industry standard for format of floating point M U N - February 17, Phil Bording

32 M U N - February 17, 2005 - Phil Bording
Array Processors IBM and CDC designed DMA processors – Direct Memory Access Frees the main processor to compute Allows separate simple processors to do the i/o The idea translated into attached processors for arithmetic processing M U N - February 17, Phil Bording

33 M U N - February 17, 2005 - Phil Bording
Array Processors Arrays of data are moved to a local very high speed memory – fast registers Arithmetic is performed by special instructions passed to array processor CPU Array Processor M U N - February 17, Phil Bording

34 Software Design Issues
Vector Programming Cache Programming Message Passing Programming NUMA Programming Grid Programming ALL of these memory operations have a Fixed Cost Code Performance Improvements are dominated by fixed costs M U N - February 17, Phil Bording

35 Hardware Design Issues
10 Years equals 100 Fold Speedup Memory Latency – cost of getting the first word is a constant Wires have failed to scale Bigger cache memories are slower Code Performance Improvements are dominated by fixed costs M U N - February 17, Phil Bording

36 M U N - February 17, 2005 - Phil Bording
Linear Address Space Max Address Address Pointer Latency is the time to access the first word Bandwidth is the rate of accessing successive words M U N - February 17, Phil Bording

37 von Neumann Architecture Princeton
Memory Address Pointer Arithmetic Logic Unit (ALU) Data/Instructions Pc = Pc + 1 Program Counter Featuring Deterministic Execution M U N - February 17, Phil Bording

38 Cache Memory Architecture
N T R L Memory Main Memory is large and slow. Cache is much smaller and much faster. Control logic control keeps the main memory coherent. Cache Memory Address Pointer Featuring Non-Deterministic Execution M U N - February 17, Phil Bording

39 Cache Memory - Three Levels Architecture
Multi- Gigabytes Large and Slow 160 X Cache Control Logic 2 Gigahertz Clock 2X 8X 16X L3 Cache Memory L2 Cache Memory L1 Cache Memory 32 Kilobytes 128 Kilobytes 16 Megabytes Featuring Really Non-Deterministic Execution Address Pointer M U N - February 17, Phil Bording

40 Programming Models for Parallel Computing
M U N - February 17, Phil Bording

41 Distributed Computing Message Passing Interface
Program Address Spaces Max Max Max Max Multiple Address Pointers M U N - February 17, Phil Bording

42 Distributed Computing with Message Passing
Program Address Spaces Messages Left and Right Multiple Address Pointers M U N - February 17, Phil Bording

43 M U N - February 17, 2005 - Phil Bording

44 Multi-Threading OpenMP Programming Model
Global Program Address Space Local Local Local Local n-1 n n-1 2n n-1 3n n-1 Address and Cache Bus with Conflict Resolution Multiple Address Pointers M U N - February 17, Phil Bording

45 Uniqueness of Store Multi-Threading
Program Address Space Multiple Address Pointers Duplicate Pointers to the same Location – Conflict on storing a result So who is managing the multiple pointers? It is the programmers responsibility. M U N - February 17, Phil Bording

46 Multiple Bank Memory Systems
Memory Banks Bank Starting Address N N N Mod 4 Vector Programming Model M U N - February 17, Phil Bording

47 M U N - February 17, 2005 - Phil Bording
Trends in Technology M U N - February 17, Phil Bording

48 M U N - February 17, 2005 - Phil Bording
A 256 Node SMP Linux Cluster (2001) 512 CPU, 512GB, 6TB SCSI, TB Local, GB Ethernet Imagine 20 of these in one room. Bednar, 2004 M U N - February 17, Phil Bording

49 SIZE, COST, and HEAT The EARTH Simulator 3 Megawatts 500 Million US $
It doesn’t simulte global warming, IT CAUSES IT! M U N - February 17, Phil Bording Bednar, 2004

50 M U N - February 17, 2005 - Phil Bording

51 M U N - February 17, 2005 - Phil Bording

52 M U N - February 17, 2005 - Phil Bording
S L O W E R M U N - February 17, Phil Bording

53 M U N - February 17, 2005 - Phil Bording

54 M U N - February 17, 2005 - Phil Bording
After Gustfason 2004 Bednar, 2004 M U N - February 17, Phil Bording

55 M U N - February 17, 2005 - Phil Bording
GAP M U N - February 17, Phil Bording

56 Computational Earth Sciences
M U N - February 17, Phil Bording

57 Atmospheric Modeling and Data Assimilation at the DAO
Robert Atlas and the DAO Team Data Assimilation Office, NASA/GSFC IWG, November 2001

58 M U N - February 17, 2005 - Phil Bording
The f-v Dynamical Core Terrain following Lagrangian control-volume vertical discretization of the basic conservation laws: Mass Momentum Total energy 2D horizontal flux-form semi-Lagrangian discretization Genuinely conservative Gibbs oscillation free Absolute vorticity consistently transported with mass dp within the Lagrangian layers. Computationally efficient M U N - February 17, Phil Bording

59 Computational Performance
M U N - February 17, Phil Bording

60 Progression in model resolution
1990s: 2o X 2.5o (220 km) 2000: 1o X 1.25o (110 km) 2002: 0.5o X 0.625o (55 km) 2004: o X 0.36o (28 km) 2006: Geodesic grid 20 km : up to 10 km – hydrostatic assumption starts to break down; this is the transition period to non-hydrostatic dynamics : revolution in computing technology is to take place 2025: global non-hydrostatic cloud-resolving model with 1 km or finer resolution; capable of resolving individual thunderstorms Slides from Bob Atlas Presentation

61 Numerical Problem Solving
M U N - February 17, Phil Bording

62 Problem Solving – 3D Example of Array Addressing
Finite Differences – 3D Array Large Memory Requirement Wave Propagation FD-Time Domain Algorithm Psi(i,j,k) = Physical Variables; ? How do we address memory?? Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording

63 Problem Solving – 3D Example of Array Addressing
i-1,j,k i,j,k i+1,j,k Grid Points Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording

64 Array Addressing by Dimension
1D Array Psi(Lx) Address = (i-1) + base Stride One Data 2D Array Psi(Lx,Ly) Address = (j-1)*Lx+(i-1) + base Stride N Data Stride N*N Data 3D Array Psi(Lx,Ly,Lz) Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base M U N - February 17, Phil Bording

65 Cache Memory Access Streams
1D Streams – 100% 1D +/ % 2D +/ % 2D +/-N % 2D +/-1 +/-N % M U N - February 17, Phil Bording

66 Cache Memory Access Streams
3D +/ % 3D +/-N % 3D +/-N*N % 3D ALL % M U N - February 17, Phil Bording

67 One Big One versus Many Little Ones
M U N - February 17, Phil Bording

68 Futures of Micro-poor processors
Lots of arithmetic capability, very hard to use Market forces will make them good at painting bit maps on screens M U N - February 17, Phil Bording

69 Futures of Micro-poor processors
No relief in Memory Subsystem Design, prefetch will help but not nearly enough A million will cost a Billion, $$$ M U N - February 17, Phil Bording

70 Futures of Micro-poor processors and the Big Switch
The Big Switch is the hot spot and no relief is in sight. No telling what the switch will cost?? M U N - February 17, Phil Bording

71 Seismic Modeling and the Inverse Problem
M U N - February 17, Phil Bording

72 M U N - February 17, 2005 - Phil Bording

73 M U N - February 17, 2005 - Phil Bording
12 Streamers x 5.1 Kilometers Long Data collected for 70 continuous days Over 2300 Square Km. M U N - February 17, Phil Bording

74 M U N - February 17, 2005 - Phil Bording
3D Seismic Modeling Large Scale 3D ~200+ Wave Lengths Acoustic and Elastic Wave Equations In-Homogeneous Earth has widely varying parameters. Complexity limits use of 3D elastic modeling Problem Scale Nx=Ny=Nz ~ 1000 Ntime ~ 10,000 Work per Grid Point ~ 100 Number of Seismic Shots per Survey ~ 100,000 Single Survey Simulation is 10^20 Operations. M U N - February 17, Phil Bording

75 The Babbage Difference Engine, circa 1853
M U N - February 17, Phil Bording

76 Wave Equation Difference Engine (WEDE) for Seismic Modeling
Four Processors Acoustic Wave Equation My PhD thesis project at the University of Tulsa M U N - February 17, Phil Bording

77 Wave Equation Difference Engine
Finite Differences Elastic or Acoustic Wave Equations Regular Grids Sponge/One-Way Wave Equation Boundary Conditions Any Source/Receiver Geometry Explicit 4th order in Time & 8th order in Space? M U N - February 17, Phil Bording

78 Wave Equation Difference Engine
No Cache Memory Deterministic Execution Not a MIMD or SIMD or Data Flow Data movement and control matches the algorithm Each grid point has control word Three levels of parallelism, ( Amount of Parallelism) Instruction trees, ~ Multiple Instructions with selection, ~2-3 Multiple Grid points, ~Hundreds of Thousands M U N - February 17, Phil Bording

79 Acoustic, Constant Density
Density is so constant it does not appear in the equation. C is the P Wave Velocity. The source energy is in src. Psi is the wave field. M U N - February 17, Phil Bording

80 Wave Equation Difference Engine
Machine Performance 100 operations in pipeline 1,000,000 grid point processors 100 Megahertz Clock 10^16 Operations per second M U N - February 17, Phil Bording

81 M U N - February 17, 2005 - Phil Bording

82 M U N - February 17, 2005 - Phil Bording

83 Application Specific Parallel Computing
Choose carefully an application which is BIG. Find an algorithm which is suitable. Good data locality. Regular structure in data movement High memory data transfers Map the algorithm into hardware M U N - February 17, Phil Bording

84 Application Specific Parallel Computing What it is not!
Not suitable for just any algorithm Not general purpose, we will have an efficient but specific memory subsystem. Does not match the alphabet soup, SIMD, MIMD,NUMA, etc M U N - February 17, Phil Bording

85 What do ASP machines need??
VLSI Design Team, fabless and good? Clever Architect for the problem. A very good memory design! M U N - February 17, Phil Bording

86 What do ASP machines do away with??
Language Compilers Outdated junk in the processor design, x86! Cache memories! Non-deterministic execution! M U N - February 17, Phil Bording

87 Multiple Bank Memory Systems
Memory Banks Bank Starting Address N N N Mod 4 As many as are needed!!!! M U N - February 17, Phil Bording

88 Pipelined Instruction Trees
Each higher level offers parallel operations Pipeline assumes all registers are loaded every cycle Hardwired?? Actually today the instruction trees could be re-configurable using re-programmable cells!!! r = a+b-x*y M U N - February 17, Phil Bording

89 Pipelined Instruction Trees
a a b x y b d y + - * * - + Multiple Trees offer the second level of Parallelism M U N - February 17, Phil Bording

90 Three Levels of Parallelism
Instruction Trees, Multiple Levels Multiple Results Multiple Grid Point Processors M U N - February 17, Phil Bording

91 M U N - February 17, 2005 - Phil Bording
Wave Machine M U N - February 17, Phil Bording

92 M U N - February 17, 2005 - Phil Bording
Imaging Machine M U N - February 17, Phil Bording

93 M U N - February 17, 2005 - Phil Bording
Wave Equation a) 8th or 10th Order in space b) 4th Order in time, tricky but possible c) Sponge Boundary Conditions, slowly varying weights along sides d) Nominal flat topography, new schemes are building in topography e) Any seismic source location, any geophone location M U N - February 17, Phil Bording

94 M U N - February 17, 2005 - Phil Bording
Elastic Wave Equation a) Grid point work is about 100 operations b) About 20,000 time steps per shot c) 200 Wavelengths gives about 160,000 geophone locations d) Traces have 4096 samples, 2 milliseconds, could be 1 ms. M U N - February 17, Phil Bording

95 M U N - February 17, 2005 - Phil Bording
Elastic Wave Equation Shots are placed at twice the receiver spacing Number of shots equals 40,000 Model Frequency is velocity dependent, assume something on the order of 60 hertz. M U N - February 17, Phil Bording

96 M U N - February 17, 2005 - Phil Bording
Economics Up Front Fixed Cost, $5 to $ 10 Million Each ASP Chip is $5 to 10 A Petaflop for $5 or $10 Million M U N - February 17, Phil Bording

97 M U N - February 17, 2005 - Phil Bording
Economics Seismic Shot takes 0.1 seconds 5 Year life is 50,000 Models A realistic 3D elastic seismic model would cost $200 M U N - February 17, Phil Bording

98 M U N - February 17, 2005 - Phil Bording
Comparison 10 Clusters ~ $10 Million 10 models per year One Waves in Linear Motion Analyzer (WILMA) ~$10 Million 10,000 models per year M U N - February 17, Phil Bording

99 Comparison Waves in Linear Motion Analyzer 1000X faster
For the same money!. M U N - February 17, Phil Bording

100 M U N - February 17, 2005 - Phil Bording
Summary 1000 Megawatts is a good sized power station Good memory design is worth the money! Removing the obstacles to efficient computing gives sustainable performance M U N - February 17, Phil Bording

101 M U N - February 17, 2005 - Phil Bording
Summary Slower is better. Less power is better. High Efficiency is better. M U N - February 17, Phil Bording

102 M U N - February 17, 2005 - Phil Bording
Conclusions Deterministic Computing is important for performance……… Application Specific Computing is a good fit for the wave equation….. And very cost effective……….. M U N - February 17, Phil Bording

103 M U N - February 17, 2005 - Phil Bording
Thanks SEG – Continuing Education Memorial University of Newfoundland M U N - February 17, Phil Bording


Download ppt "Husky Energy Chair in Oil and Gas Research"

Similar presentations


Ads by Google