High Performance Computing and CyberGIS Keith T. Weber, GISP GIS Director, ISU.

Slides:

Advertisements

Similar presentations

N-Body I CS 170: Computing for the Sciences and Mathematics.

Advertisements

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

2. Computer Clusters for Scalable Parallel Computing

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Topics Parallel Computing Shared Memory OpenMP 1.

2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

A many-core GPU architecture.. Price, performance, and evolution.

GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.

Reference: Message Passing Fundamentals.

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

Supercomputing in Plain English An Introduction to High Performance Computing Henry Neeman, Director OU Supercomputing Center for Education & Research.

Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

void ordered_fill (float* array, int array_length) { int index; for (index = 0; index < array_length; index++) { array[index] = index; }

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Parallel & Cluster Computing Clusters & Distributed Parallelism Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy,

Supercomputing in Plain English Overview: What the Heck is Supercomputing? Henry Neeman, Director OU Supercomputing Center for Education & Research Blue.

Parallel & Cluster Computing Overview of Parallelism Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Parallel Programming & Cluster Computing Overview: What the Heck is Supercomputing? Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen.

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Henry Neeman, Director Horst Severini, Associate Director OU Supercomputing.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

Parallel & Cluster Computing Monte Carlo Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer

GPU Architecture and Programming

Supercomputing and Science An Introduction to High Performance Computing Part I: Overview Henry Neeman.

Parallel Programming & Cluster Computing Overview of Parallelism Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

CS591x -Cluster Computing and Parallel Programming

Parallel Programming & Cluster Computing An Overview of High Performance Computing Henry Neeman, University of Oklahoma Paul Gray, University of Northern.

GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Parallel & Cluster Computing 2005 Supercomputing Overview Paul Gray, University of Northern Iowa David Joiner, Kean University Tom Murphy, Contra Costa.

September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!

Copyright © Curt Hill SIMD Single Instruction Multiple Data.

Concurrency and Performance Based on slides by Henri Casanova.

Parallel Programming & Cluster Computing Monte Carlo Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.

Parallel Programming & Cluster Computing Overview: What the Heck is Supercomputing? Henry Neeman, University of Oklahoma Charlie Peck, Earlham College.

Single Instruction Multiple Data

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

The University of Adelaide, School of Computer Science

Constructing a system with multiple computers or processors

Introduction to Parallelism.

EE 193: Parallel Computing

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

What is Parallel and Distributed computing?

Using Remote HPC Resources to Teach Local Courses

CMSC 611: Advanced Computer Architecture

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

CSE8380 Parallel and Distributed Processing Presentation

Supercomputing CS1313 Fall 2018.

Constructing a system with multiple computers or processors

By Brandon, Ben, and Lee Parallel Computing.

CMSC 611: Advanced Computer Architecture

Types of Parallel Computers

CSE 332: Data Abstractions Memory Hierarchy

Presentation transcript:

High Performance Computing and CyberGIS Keith T. Weber, GISP GIS Director, ISU

Goal of this presentation Introduce you to an another world of computing, analysis, and opportunity Encourage you to learn more!

Some Terminology Up-Front Supercomputing HPC HTC CI

AcknowledgementsAcknowledgements Much of the material presented here, was originally designed by Henry Neeman at Oklahoma University and OSCER

5 What is Supercomputing? Supercomputing is the biggest, fastest computing right this minute. Likewise, a supercomputer is one of the biggest, fastest computers right this minute. So, the definition of supercomputing is constantly changing. Rule of Thumb: A supercomputer is typically 100 X as powerful as a PC.

Fastest Supercomputer and Moore’s Law

7 What is Supercomputing About? Size Speed Laptop

8 Size…Size… Many problems that are interesting to scientists and engineers can’t fit on a PC –usually because they need more than a few GB of RAM, or more than a few 100 GB of disk.

Speed…Speed… Many problems that are interesting to scientists and engineers would take a long time to run on a PC. –months or even years. –But a problem that would take 1 month on a PC might take only a few hours on a supercomputer

What can Supercomputing be used for? Data Mining Modeling Simulation Visualization [1]

What is a Supercomputer? A cluster of small computers, each called a node, hooked together by an interconnection network (interconnect for short). A cluster needs software that allows the nodes to communicate across the interconnect. But what a cluster is … is all of these components working together as if they’re one big computer... a super computer.

1,076 Intel Xeon CPU chips/4288 cores 8,800 GB RAM ~130 TB globally accessible disk QLogic Infiniband Force10 Networks Gigabit Ethernet Red Hat Enterprise Linux 5 Peak speed: 34.5 TFLOPs* –*TFLOPs: trillion floating point operations (calculations) per second For example: Dell Intel Xeon Linux Cluster sooner.oscer.ou.edu

Quantifying a Supercomputer Number of cores –Your workstation (4?) –ISU cluster (800) –Blue Waters (300,000) TeraFlops

PARALLELISMPARALLELISM How a cluster works together:

ParallelismParallelism Less fish … More fish! Parallelism means doing multiple things at the same time

THE JIGSAW PUZZLE ANALOGY Understanding Parallel Processing 16

Serial Computing We are very accustom to serial processing. It can be compared to building a jigsaw puzzle by yourself. In other words, suppose you want to complete a jigsaw puzzle that has 1000 pieces. We can agree this will take a certain amount of time…let’s just say, one hour

Shared Memory Parallelism If Scott sits across the table from you, then he can work on his half of the puzzle and you can work on yours. Once in a while, you’ll both reach into the pile of pieces at the same time (you’ll contend for the same resource), which will cause you to slowdown. And from time to time you’ll have to work together (communicate) at the interface between his half and yours. The speedup will be nearly 2-to-1: Together it will take about 35 minutes instead of 30.

The More the Merrier? Now let’s put Paul and Charlie on the other two sides of the table. Each of you can work on a part of the puzzle, but there’ll be a lot more contention for the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaces. So you will achieve noticeably less than a 4-to-1 speedup. But you’ll still have an improvement, maybe something like 20 minutes instead of an hour.

Diminishing Returns If we now put Dave, Tom, Horst, and Brandon at the corners of the table, there’s going to be a much more contention for the shared resource, and a lot of communication at the many interfaces. The speedup will be much less than we’d like; you’ll be lucky to get 5-to-1. We can see that adding more and more workers onto a shared resource is eventually going to have a diminishing return.

Amdahl’s Law Source: CPU utilization

Distributed Parallelism Let’s try something a little different. Let’s set up two tables You will sit at one table and Scott at the other. We will put half of the puzzle pieces on your table and the other half of the pieces on Scott’s. Now you can work completely independently, without any contention for a shared resource. BUT, the cost per communication is MUCH higher, and you need the ability to split up (decompose) the puzzle correctly, which can be tricky.

23 More Distributed Processors It’s easy to add more processors in distributed parallelism. But you must be aware of the need to: decompose the problem and communicate among the processors. Also, as you add more processors, it may be harder to load balance the amount of work that each processor gets.

FYI…Kinds of Parallelism Instruction Level Parallelism Shared Memory Multithreading Distributed Memory Multiprocessing GPU Parallelism Hybrid Parallelism (Shared + Distributed + GPU) 24

Why Parallelism Is Good The Trees: We like parallelism because, as the number of processing units working on a problem grows, we can solve the same problem in less time. The Forest: We like parallelism because, as the number of processing units working on a problem grows, we can solve bigger problems.

JargonJargon Threads are execution sequences that share a single memory area Processes are execution sequences with their own independent, private memory areas Multithreading: parallelism via multiple threads Multiprocessing: parallelism via multiple processes Shared Memory Parallelism is concerned with threads Distributed Parallelism is concerned with processes.

Basic Strategies Data Parallelism: Each processor does exactly the same tasks on its unique subset of the data –jigsaw puzzles or big datasets that need to be processed now! Task Parallelism: Each processor does different tasks on exactly the same set of data –which algorithm is best?

An Example: Embarrassingly Parallel An application is known as embarrassingly parallel if its parallel implementation: –Can straightforwardly be broken up into equal amounts of work per processor, AND –Has minimal parallel overhead (i.e., communication among processors) FYI…Embarrassingly parallel applications are also known as loosely coupled.

Monte Carlo Methods Monte Carlo methods are ways of simulating or calculating actual phenomena based on randomness within known error limits. –In GIS, we use Monte Carlo simulations to calculate error propagation effects –How? Monte Carlo simulations are typically, embarrassingly parallel applications.

Monte Carlo Methods In a Monte Carlo method, you randomly generate a large number of example cases (realizations), and then compare the results of these realizations When the average of the realizations converges that is, your answer doesn’t change substantially if new realizations are generated, then the Monte Carlo simulation can stop.

Embarrassingly Parallel Monte Carlo simulations are embarrassingly parallel, because each realization is independent of all other realizations

A Quiz… Q: Is this an example of Data Parallelism or Task Parallelism? A: Task Parallelism: Each processor does different tasks on exactly the same set of data

Questions so far?

WHAT IS A GPGPU? OR THANK YOU GAMING INDUSTRY A bit more to know…

It’s an Accelerator No, not this....

AcceleratorsAccelerators In HPC, an accelerator is hardware whose role it is to speed up some aspect of the computing workload. –In the olden days (1980s), PCs sometimes had floating point accelerators (aka, the math coprocessor)

Why Accelerators are Good They make your code run faster.

Why Accelerators are Bad Because: They’re expensive (or they were) They’re harder to program (NVIDIA CUDA) Your code may not be portable to other accelerators, so the labor you invest in programming may have a very short life.

The King of the Accelerators The undisputed king of accelerators is the graphics processing unit (GPU).

Why GPU? Graphics Processing Units (GPUs) were originally designed to accelerate graphics tasks like image rendering for gaming. They became very popular with gamers, because they produced better and better images, at lightning fast refresh speeds As a result, prices have become extremely reasonable, ranging from three figures at the low end to four figures at the high end.

GPU’s Do Arithmetic GPUs render images This is done through floating point arithmetic – As it turns out, this is the same stuff people use supercomputing for!

Interested? Curious? To learn more, or to get involved with supercomputing there is a host of opportunities awaiting you –Get to know your Campus Champions –Visit –Visit –Ask about internships (BWUPEP) –Learn C (not C++, but C) or Fortran –Learn UNIX

Questions?Questions?