M U N - February 17, 2005 - Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording March 8,

Slides:



Advertisements
Similar presentations
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
TO COMPUTERS WITH BASIC CONCEPTS Lecturer: Mohamed-Nur Hussein Abdullahi Hame WEEK 1 M. Sc in CSE (Daffodil International University)
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
Basic Governing Differential Equations
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Basic Governing Differential Equations
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 6 Memory and Programmable Logic Devices
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
Chapter 1 CSF 2009 Computer Abstractions and Technology.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
3 1 3 C H A P T E R Hardware: Input, Processing, and Output Devices.
Basics and Architectures
Technology in Focus: Under the Hood
Development of WRF-CMAQ Interface Processor (WCIP)
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
CS 1308 Computer Literacy and the Internet Computer Systems Organization.
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
1 Embedded Systems Computer Architecture. Embedded Systems2 Memory Hierarchy Registers Cache RAM Disk L2 Cache Speed (faster) Cost (cheaper per-byte)
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
M U N -March 10, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording March 10, 2005.
SKILL AREA: 1.2 MAIN ELEMENTS OF A PERSONAL COMPUTER.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Parallel Computing.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
CS 1308 Computer Literacy and the Internet. Objectives In this chapter, you will learn about:  The components of a computer system  Putting all the.
GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.
Outline Why this subject? What is High Performance Computing?
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
Mass Coordinate WRF Dynamical Core - Eulerian geometric height coordinate (z) core (in framework, parallel, tested in idealized, NWP applications) - Eulerian.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
M U N - February 15, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Deutscher Wetterdienst 1FE 13 – Working group 2: Dynamics and Numerics report ‘Oct – Sept. 2008’ COSMO General Meeting, Krakau
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
William Stallings Computer Organization and Architecture 6th Edition
These slides are based on the book:
Advanced Architectures
Reporter: Prudence Chien
Hardware September 19, 2017.
Architecture & Organization 1
What is Parallel and Distributed computing?
Architecture & Organization 1
BIC 10503: COMPUTER ARCHITECTURE
Husky Energy Chair in Oil and Gas Research
Course Description: Parallel Computer Architecture
Chapter 5: Computer Systems Organization
The C&C Center Three Major Missions: In This Presentation:
Husky Energy Chair in Oil and Gas Research
Presentation transcript:

M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording March 8, Max Address Husky Energy Chair in Oil and Gas Research Memorial University of Newfoundland

Session 1 History of Design Tyco Brahe Napier Charles Babbage – mechanical design John Atanasoff – Storage – spinning capacitor - Konrad Zuse - Floating Point Mauchley and Ekert von-Neumann Harvard memory – code memory - data Princeton memory code and data

Session 2 Current Design Issues Scaling laws Moore’s Law Transistors – VLSI Memory – Technology Division of Design The memory Challenge The processor Challenge The ILLIAC – PEPE IBM 7094 IBM 360/44 IBM 360/95 Array Processors the software of array processor calls

M U N - February 17, Phil Bording5 Processors Data Memory Alu Hardwired instructions Processor Bottleneck Memory Bottleneck Vacuum tubes Core Plated Wire Transistors LSI – 6 T Static VLSI - 2 T Dynamic

M U N - February 17, Phil Bording6 After Gustfason 2004 Bednar, 2004

M U N - February 17, Phil Bording7 Lamda Rules

M U N - February 17, Phil Bording8 Moore’s Laws Every 18 months the density of transistors on a VLSI chip doubles The investments of $ doubles with every new VLSI plant

M U N - February 17, Phil Bording9 Parallel Ensemble Processing Elements - PEPE P0 Pn-3Pn-2Pn-1Pn.. Data Inputs Radar Processing Computer Associative Computing Data Outputs

M U N - February 17, Phil Bording10 Multiple Bank Memory Systems Starting Address +N +2N +3N Mod 4 Memory Banks Bank Vector Programming Model

M U N - February 17, Phil Bording11 Trends in Technology

M U N - February 17, Phil Bording12 A 256 Node SMP Linux Cluster (2001) 512 CPU, 512GB, 6TB SCSI, TB Local, GB Ethernet Imagine 20 of these in one room. Bednar, 2004

M U N - February 17, Phil Bording13 SIZE, COST, and HEAT The EARTH Simulator 3 Megawatts 500 Million US $ It doesn’t simulte global warming, IT CAUSES IT! Bednar, 2004

M U N - February 17, Phil Bording14

M U N - February 17, Phil Bording15

M U N - February 17, Phil Bording16 SLOWERSLOWER

M U N - February 17, Phil Bording17

M U N - February 17, Phil Bording18 After Gustfason 2004 Bednar, 2004

M U N - February 17, Phil Bording19 GAP

M U N - February 17, Phil Bording20 Computational Earth Sciences

Atmospheric Modeling and Data Assimilation at the DAO Robert Atlas and the DAO Team Data Assimilation Office, NASA/GSFC IWG, November 2001

M U N - February 17, Phil Bording22 The f-v Dynamical Core Terrain following Lagrangian control- volume vertical discretization of the basic conservation laws: Mass Momentum Total energy 2D horizontal flux-form semi-Lagrangian discretization Genuinely conservative Gibbs oscillation free Absolute vorticity consistently transported with mass dp within the Lagrangian layers. Computationally efficient

M U N - February 17, Phil Bording23 Computational Performance

Progression in model resolution  1990s: 2 o X 2.5 o (220 km)  2000: 1 o X 1.25 o (110 km)  2002: 0.5 o X o (55 km)  2004: 0.25 o X 0.36 o (28 km)  2006: Geodesic grid 20 km  : up to 10 km – hydrostatic assumption starts to break down; this is the transition period to non-hydrostatic dynamics  : revolution in computing technology is to take place  2025: global non-hydrostatic cloud-resolving model with 1 km or finer resolution; capable of resolving individual thunderstorms Slides from Bob Atlas Presentation

M U N - February 17, Phil Bording25 Numerical Problem Solving

M U N - February 17, Phil Bording26 Problem Solving – 3D Example of Array Addressing Finite Differences – 3D Array Large Memory Requirement Wave Propagation FD-Time Domain Algorithm Psi(i,j,k) = Physical Variables; ? How do we address memory?? Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base

M U N - February 17, Phil Bording27 Problem Solving – 3D Example of Array Addressing Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base Grid Points i,j,ki-1,j,ki+1,j,k

M U N - February 17, Phil Bording28 Array Addressing by Dimension 3D Array Psi(Lx,Ly,Lz) Address = (k-1)*Lx*Ly +(j-1)*Lx+(i-1) + base 2D Array Psi(Lx,Ly) Address = (j-1)*Lx+(i-1) + base 1D Array Psi(Lx) Address = (i-1) + base Stride One Data Stride N Data Stride N*N Data

M U N - February 17, Phil Bording29 Cache Memory Access Streams 1D Streams – 100% 1D +/-1 100% 2D +/-1 100% 2D +/-N 80% 2D +/-1 +/-N 26%

M U N - February 17, Phil Bording30 Cache Memory Access Streams 3D +/-1 100% 3D +/-N 80% 3D +/-N*N 28% 3D ALL 7%

M U N - February 17, Phil Bording31 One Big One versus Many Little Ones

M U N - February 17, Phil Bording32 Futures of Micro-poor processors Lots of arithmetic capability, very hard to use Market forces will make them good at painting bit maps on screens

M U N - February 17, Phil Bording33 Futures of Micro-poor processors No relief in Memory Subsystem Design, prefetch will help but not nearly enough A million will cost a Billion, $$$

M U N - February 17, Phil Bording34 Futures of Micro-poor processors and the Big Switch The Big Switch is the hot spot and no relief is in sight. No telling what the switch will cost??

M U N - February 17, Phil Bording35 Seismic Modeling and the Inverse Problem

M U N - February 17, Phil Bording36

M U N - February 17, Phil Bording37 12 Streamers x 5.1 Kilometers Long Data collected for 70 continuous days Over 2300 Square Km.

M U N - February 17, Phil Bording38 3D Seismic Modeling 1.Large Scale 3D ~200+ Wave Lengths 2.Acoustic and Elastic Wave Equations 3.In-Homogeneous Earth has widely varying parameters. 4.Complexity limits use of 3D elastic modeling 5.Problem Scale Nx=Ny=Nz ~ 1000 Ntime ~ 10,000 Work per Grid Point ~ 100 Number of Seismic Shots per Survey ~ 100,000 Single Survey Simulation is 10^20 Operations.

M U N - February 17, Phil Bording39 The Babbage Difference Engine, circa 1853

M U N - February 17, Phil Bording40 Wave Equation Difference Engine (WEDE) for Seismic Modeling Four Processors Acoustic Wave Equation My PhD thesis project at the University of Tulsa

M U N - February 17, Phil Bording41 Wave Equation Difference Engine Finite Differences Elastic or Acoustic Wave Equations Regular Grids Sponge/One-Way Wave Equation Boundary Conditions Any Source/Receiver Geometry Explicit 4 th order in Time & 8 th order in Space?

M U N - February 17, Phil Bording42 Wave Equation Difference Engine No Cache Memory Deterministic Execution Not a MIMD or SIMD or Data Flow Data movement and control matches the algorithm Each grid point has control word Three levels of parallelism, ( Amount of Parallelism) Instruction trees, ~ Multiple Instructions with selection, ~2-3 Multiple Grid points, ~Hundreds of Thousands

M U N - February 17, Phil Bording43 Acoustic, Constant Density Density is so constant it does not appear in the equation. C is the P Wave Velocity. The source energy is in src. Psi is the wave field.

M U N - February 17, Phil Bording44 Wave Equation Difference Engine Machine Performance 100 operations in pipeline 1,000,000 grid point processors 100 Megahertz Clock 10^16 Operations per second

M U N - February 17, Phil Bording45

M U N - February 17, Phil Bording46

M U N - February 17, Phil Bording47 Application Specific Parallel Computing Choose carefully an application which is BIG. Find an algorithm which is suitable. Good data locality. Regular structure in data movement High memory data transfers Map the algorithm into hardware

M U N - February 17, Phil Bording48 Application Specific Parallel Computing What it is not! Not suitable for just any algorithm Not general purpose, we will have an efficient but specific memory subsystem. Does not match the alphabet soup, SIMD, MIMD,NUMA, etc

M U N - February 17, Phil Bording49 What do ASP machines need?? VLSI Design Team, fabless and good? Clever Architect for the problem. A very good memory design!

M U N - February 17, Phil Bording50 What do ASP machines do away with?? Language Compilers Outdated junk in the processor design, x86! Cache memories! Non-deterministic execution!

M U N - February 17, Phil Bording51 Multiple Bank Memory Systems Starting Address +N +2N +3N Mod 4 Memory Banks Bank As many as are needed!!!!

M U N - February 17, Phil Bording52 Pipelined Instruction Trees Each higher level offers parallel operations Pipeline assumes all registers are loaded every cycle Hardwired?? Actually today the instruction trees could be re-configurable using re-programmable cells!!! r = a+b-x*y

M U N - February 17, Phil Bording53 Pipelined Instruction Trees a bd y - * - abxy * + Multiple Trees offer the second level of Parallelism +

M U N - February 17, Phil Bording54 Three Levels of Parallelism 1.Instruction Trees, Multiple Levels 2.Multiple Results 3.Multiple Grid Point Processors

M U N - February 17, Phil Bording55 Wave Machine

M U N - February 17, Phil Bording56 Imaging Machine

M U N - February 17, Phil Bording57 Wave Equation a) 8th or 10th Order in space b) 4 th Order in time, tricky but possible c) Sponge Boundary Conditions, slowly varying weights along sides d) Nominal flat topography, new schemes are building in topography e) Any seismic source location, any geophone location

M U N - February 17, Phil Bording58 Elastic Wave Equation a) Grid point work is about 100 operations b) About 20,000 time steps per shot c) 200 Wavelengths gives about 160,000 geophone locations d) Traces have 4096 samples, 2 milliseconds, could be 1 ms.

M U N - February 17, Phil Bording59 Elastic Wave Equation Shots are placed at twice the receiver spacing Number of shots equals 40,000 Model Frequency is velocity dependent, assume something on the order of 60 hertz.

M U N - February 17, Phil Bording60 Economics Up Front Fixed Cost, $5 to $ 10 Million Each ASP Chip is $5 to 10 A Petaflop for $5 or $10 Million

M U N - February 17, Phil Bording61 Economics Seismic Shot takes 0.1 seconds 5 Year life is 50,000 Models A realistic 3D elastic seismic model would cost $200

M U N - February 17, Phil Bording62 Comparison 10 Clusters ~ $10 Million 10 models per year One Waves in Linear Motion Analyzer (WILMA) ~$10 Million 10,000 models per year

M U N - February 17, Phil Bording63 Comparison Waves in Linear Motion Analyzer 1000X faster For the same money!.

M U N - February 17, Phil Bording64 Summary 1000 Megawatts is a good sized power station Good memory design is worth the money! Removing the obstacles to efficient computing gives sustainable performance

M U N - February 17, Phil Bording65 Summary Slower is better. Less power is better. High Efficiency is better.

M U N - February 17, Phil Bording66 Conclusions Deterministic Computing is important for performance……… Application Specific Computing is a good fit for the wave equation….. And very cost effective………..

M U N - February 17, Phil Bording67 Thanks SEG – Continuing Education Memorial University of Newfoundland