Copyright 2008, University of Alberta Introduction to High Performance Computing Jon Johansson Academic ICT University of Alberta.

Slides:

Advertisements

Similar presentations

© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (

Distributed Systems CS

SE-292 High Performance Computing

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

Today’s topics Single processors and the Memory Hierarchy

Copyright 2007, University of Alberta Introduction to High Performance Computing Jon Johansson Academic ICT University of Alberta.

Parallel Computers Chapter 1

Information Technology Center Introduction to High Performance Computing at KFUPM.

History of Distributed Systems Joseph Cordina

Chapter 17 Parallel Processing.

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Parallel Computer Architectures

Lecture 1: Introduction to High Performance Computing.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2001, 2004, 2005, 2006, 2008, Dr. Ken Hoganson CS8625-June-2-08 Class Will Start Momentarily…

Computer System Architectures Computer System Software

1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.

Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.

Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Parallel Computing.

CS591x -Cluster Computing and Parallel Programming

+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.

Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.

Outline Why this subject? What is High Performance Computing?

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

Parallel Computing Presented by Justin Reschke

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Background Computer System Architectures Computer System Software.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

These slides are based on the book:

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.

Distributed Processors

Parallel Processing - introduction

Super Computing By RIsaj t r S3 ece, roll 50.

Constructing a system with multiple computers or processors

Chapter 17 Parallel Processing

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Chapter 4 Multiprocessors

Introduction, background, jargon

Presentation transcript:

Copyright 2008, University of Alberta Introduction to High Performance Computing Jon Johansson Academic ICT University of Alberta

Copyright 2007, University of Alberta Agenda What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID??

Copyright 2007, University of Alberta High Performance Computing HPC is the field that concentrates on developing supercomputers and software to run on supercomputers a main area of this discipline is developing parallel processing algorithms and software programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors

Copyright 2007, University of Alberta High Performance Computing HPC is about “big problems”, i.e. need: lots of memory many cpu cycles big hard drives no matter what field you work in, perhaps your research would benefit by making problems “larger” 2d → 3d finer mesh increase number of elements in the simulation

Copyright 2007, University of Alberta Grand Challenges weather forecasting economic modeling computer-aided design drug design exploring the origins of the universe searching for extra-terrestrial life computer vision nuclear power and weapons simulations

Copyright 2007, University of Alberta Grand Challenges – Protein To simulate the folding of a 300 amino acid protein in water: # of atoms: ~ 32,000 folding time: 1 millisecond # of FLOPs: 3  Machine Speed: 1 PetaFLOP/s Simulation Time: 1 year (Source: IBM Blue Gene Project) IBM’s answer: The Blue Gene Project US$ 100 M of funding to build a 1 PetaFLOP/s computer Ken Dil and Kit Lau’s protein folding model. Charles L Brooks III, Scripps Research Institute

Copyright 2007, University of Alberta Grand Challenges - Nuclear National Nuclear Security Administration use supercomputers to run three-dimensional codes to simulate instead of test address critical problems of materials aging simulate the environment of the weapon and try to gauge whether the device continues to be usable stockpile science, molecular dynamics and turbulence calculations

Copyright 2007, University of Alberta Grand Challenges - Nuclear March 7, 2002: first full- system three-dimensional simulations of a nuclear weapon explosion simulation used more than 480 million cells (grid: 780x780x780) if the grid is a cube 1,920 processors on IBM ASCI White at the Lawrence Livermore National laboratory 2,931 wall-clock hours or days 6.6 million CPU hours ASCI White Test shot “Badger” Nevada Test Site – Apr Yield: 23 kilotons

Copyright 2007, University of Alberta Grand Challenges - Nuclear Advanced Simulation and Computing Program (ASC)

Copyright 2007, University of Alberta Agenda What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID??

Copyright 2007, University of Alberta What is a “Mainframe”? large and reasonably fast machines the speed isn't the most important characteristic high-quality internal engineering and resulting proven reliability expensive but high-quality technical support top-notch security strict backward compatibility for older software

Copyright 2007, University of Alberta What is a “Mainframe”? these machines can, and do, run successfully for years without interruption (long uptimes) repairs can take place while the mainframe continues to run the machines are robust and dependable IBM coined a term advertise the robustness of their mainframe computers : Reliability, Availability and Serviceability (RAS)

Copyright 2007, University of Alberta What is a “Mainframe”? Introducing IBM System z9 109 Designed for the On Demand Business IBM is delivering a holistic approach to systems design Designed and optimized with a total systems approach Helps keep your applications running with enhanced protection against planned and unplanned outages Extended security capabilities for even greater protection capabilities Increased capacity with more available engines per server

Copyright 2007, University of Alberta What is a Supercomputer?? at any point in time the term “Supercomputer” refers to the fastest machines currently available a supercomputer this year might be a mainframe in a couple of years a supercomputer is typically used for scientific and engineering applications that must do a great amount of computation

Copyright 2007, University of Alberta What is a Supercomputer?? the most significant difference between a supercomputer and a mainframe: a supercomputer channels all its power into executing a few programs as fast as possible if the system crashes, restart the job(s) – no great harm done a mainframe uses its power to execute many programs simultaneously e.g. – a banking system must run reliably for extended periods

Copyright 2007, University of Alberta What is a Supercomputer?? to see the worlds “fastest” computers look at measure performance with the Linpack benchmark solve a dense system of linear equations the performance numbers give a good indication of peak performance

Terminology combining a number of processors to run a program is called variously: multiprocessing parallel processing coprocessing

Terminology parallel computing – harnessing a bunch of processors on the same machine to run your computer program note that this is one machine generally a homogeneous architecture same processors, memory, operating system all the machines in the Top 500 are in this category

Terminology distributed computing - harnessing a bunch of processors on different machines to run your computer program heterogeneous architecture different operating systems, cpus, memory the terms “parallel” and “distributed” computing are often used interchangeably the work is divided into sections so each processor does a unique piece

Terminology some distributed computing projects are built on BOINC (Berkeley Open Infrastructure for Network Computing): – Search for Extraterrestrial Intelligence – deduces DNA sequence, given a protein – enhance clean energy technology by improving hydrogen production and storage (this is beta now)

Copyright 2007, University of Alberta Quantify Computer Speed we want a way to compare computer speeds count the number of “floating point operations” required to solve the problem + - x / results of the benchmark are so many Floating point Operations Per Second (FLOPS) a supercomputer is a machine that can provide a very large number of FLOPS

Copyright 2007, University of Alberta Floating Point Operations multiply x1000 matrices for each resulting array element 1000 multiplies 999 adds do this 1,000,000 times ~10 9 operations needed increasing array size has the number of operations increasing as O(N 3 )

Copyright 2007, University of Alberta Agenda What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID??

Copyright 2007, University of Alberta High Performance Computing supercomputers use many CPUs to do the work note that all supercomputing architectures have processors and some combination cache some form of memory and IO the processors are separated from the other processors by some distance there are major differences in the way that the parts are connected some problems fit into different architectures better than others

Copyright 2007, University of Alberta High Performance Computing increasing computing power available to researchers allows increasing problem dimensions adding more particles to a system increasing the accuracy of the result improving experiment turnaround time

Copyright 2007, University of Alberta Flynn’s Taxonomy Michael J. Flynn (1972) classified computer architectures based on the number of concurrent instructions and data streams available single instruction, single data (SISD) – basic old PC multiple instruction, single data (MISD) – redundant systems single instruction, multiple data (SIMD) – vector (or array) processor multiple instruction, multiple data (MIMD) – shared or distributed memory systems: symmetric multiprocessors and clusters common extension: single program (or process), multiple data (SPMD)

Copyright 2007, University of Alberta Architectures we can also classify supercomputers according to how the processors and memory are connected couple processors to a single large memory address space couple computers, each with its own memory address space

Copyright 2007, University of Alberta Architectures Symmetric Multiprocessing (SMP) Uniform Memory Access (UMA) multiple CPUs, residing in one cabinet, share the same memory processors and memory are tightly coupled the processors share memory and the I/O bus or data path

Copyright 2007, University of Alberta Architectures SMP a single copy of the operating system is in charge of all the processors SMP systems range from two to as many as 32 or more processors

Copyright 2007, University of Alberta Architectures SMP "capability computing" one CPU can use all the memory all the CPUs can work on a little memory whatever you need

Copyright 2007, University of Alberta Architectures UMA-SMP negatives as the number of CPUs get large the buses become saturated long wires cause latency problems

Copyright 2007, University of Alberta Architectures Non-Uniform Memory Access (NUMA) NUMA is similar to SMP - multiple CPUs share a single memory space hardware support for shared memory memory is separated into close and distant banks basically a cluster of SMPs memory on the same processor board as the CPU (local memory) is accessed faster than memory on other processor boards (shared memory) hence "non-uniform" NUMA architecture scales much better to higher numbers of CPUs than SMP

Copyright 2007, University of Alberta Architectures

Copyright 2007, University of Alberta Architectures University of Alberta SGI OriginSGI NUMA cables

Copyright 2007, University of Alberta Architectures Cache Coherent NUMA (ccNUMA) each CPU has an associated cache ccNUMA machines use special-purpose hardware to maintain cache coherence typically done by using inter-processor communication between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession

Copyright 2007, University of Alberta Architectures Distributed Memory Multiprocessor (DMMP) each computer has its own memory address space looks like NUMA but there is no hardware support for remote memory access the special purpose switched network is replaced by a general purpose network such as Ethernet or more specialized interconnects: Infiniband Myrinet Lattice: Calgary’s HP ES40 and ES45 cluster – each node has 4 processors

Copyright 2007, University of Alberta Architectures Massively Parallel Processing (MPP) Cluster of commodity PCs processors and memory are loosely coupled "capacity computing" each CPU contains its own memory and copy of the operating system and application. each subsystem communicates with the others via a high- speed interconnect. in order to use MPP effectively, a problem must be breakable into pieces that can all be solved simultaneously

Copyright 2007, University of Alberta Architectures

Copyright 2007, University of Alberta Architectures lots of “how to build a cluster” tutorials on the web – just Google: uilding.htmlhttp:// uilding.html

Copyright 2007, University of Alberta Architectures Vector Processor or Array Processor a CPU design that is able to run mathematical operations on multiple data elements simultaneously a scalar processor operates on data elements one at a time vector processors formed the basis of most supercomputers through the 1980s and into the 1990s “pipeline” the data

Copyright 2007, University of Alberta Architectures Vector Processor or Array Processor operate on many pieces of data simultaneously consider the following add instruction: C = A + B on both scalar and vector machines this means: add the contents of A to the contents of B and put the sum in C' on a scalar machine the operands are numbers on a vector machine the operands are vectors and the instruction directs the machine to compute the pair-wise sum of each pair of vector elements

Copyright 2007, University of Alberta Architectures University of Victoria has 4 NEC SX-6/8A vector processors in the School of Earth and Ocean Sciences each has 32 GB of RAM 8 vector processors in the box peak performance is 72 GFLOPS

Copyright 2007, University of Alberta Agenda What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID??

Copyright 2007, University of Alberta BlueGene/L The fastest on the Nov top 500 list: installed at the Lawrence Livermore National Laboratory (LLNL) (US Department of Energy) Livermore California

Copyright 2007, University of Alberta

Copyright 2007, University of Alberta BlueGene/L processors: memory: 72 TB 104 racks – each has 2048 processors the first 64 had 512 GB of RAM (256 MB/processor) the 40 new racks have 1 TB of RAM (512 MB/processor) a Linpack performance of TFlop/s in Nov 2005 it was the only system ever to exceed the 100 TFlop/s mark there are now 10 machines over 100 TFlop/s

The Fastest Six SiteComputerProcessorsYearR max (Gflops)R peak (Gflops) DOE/NNSA/LLNL United States BlueGene/L - eServer Blue Gene Solution IBM Forschungszentru m Juelich (FZJ) Germany JUGENE - Blue Gene/P Solution IBM SGI/New Mexico Computing Applications Center (NMCAC) United States SGI Altix ICE 8200, Xeon quad core 3.0 GHz SGI Computational Research Laboratories, TATA SONS India EKA - Cluster Platform 3000 BL460c, Xeon 53xx 3GHz, Infiniband Hewlett-Packard Government Agency Sweden Cluster Platform 3000 BL460c, Xeon 53xx 2.66GHz, Infiniband Hewlett-Packard NNSA/Sandia National Laboratories United States Red Storm - Sandia/ Cray Red Storm, Opteron 2.4 GHz dual core Cray Inc Copyright 2007, University of Alberta

# of Processors with Time Copyright 2007, University of Alberta The number of processors in the fastest machines has increased by about a factor of 200 in the last 15 years

# of Gflops Increase with Time Copyright 2007, University of Alberta Machine speed has increased by more than a factor of 5000 in the last 15 years.

Copyright 2007, University of Alberta Future BlueGene

Copyright 2007, University of Alberta Agenda What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID??

Copyright 2007, University of Alberta Speedup how can we measure how much faster our program runs when using more than one processor? define Speedup S as: the ratio of 2 program execution times constant problem size T 1 is the execution time for the problem on a single processor (use the “best” serial time) T P is the execution time for the problem on P processors

Copyright 2007, University of Alberta Speedup Linear speedup the time to execute the problem decreases by the number of processors if a job requires 1 week with 1 processor it will take less that 10 minutes with 1024 processors

Copyright 2007, University of Alberta Speedup Sublinear speedup the usual case there are generally some limitations to the amount of speedup that you get communication

Copyright 2007, University of Alberta Speedup Superlinear speedup very rare memory access patterns may allow this for some algorithms

Copyright 2007, University of Alberta Speedup why do a speedup test? it’s hard to tell how a program will behave e.g. “Strange” is actually fairly common behaviour for un- tuned code in this case: linear speedup to ~10 cpus after 24 cpus speedup is starting to decrease

Copyright 2007, University of Alberta Speedup to use more processors efficiently change this behaviour change loop structure adjust algorithms ?? run jobs with processors so the machines are used efficiently

Copyright 2007, University of Alberta Speedup one class of jobs that have linear speed up are called “embarrassingly parallel” a better name might be “perfectly” parallel doesn’t take much effort to turn the problem into a bunch of parts that can be run in parallel: parameter searches rendering the frames in a computer animation brute force searches in cryptography

Copyright 2007, University of Alberta Speedup we have been discussing Strong Scaling the problem size is fixed and we increase the number of processors decrease computational time (Amdahl Scaling) the amount of work available to each processor decreases as the number of processors increases eventually, the processors are doing more communication that number crunching and the speedup curve flattens difficult to have high efficiency for large numbers of processors

Copyright 2007, University of Alberta Speedup we are often interested in Weak Scaling double the problem size when we double the number of processors constant computational time (Gustafson scaling) the amount of work for each processor has stays roughly constant parallel overhead is (hopefully) small compared to the real work the processor does e.g. Weather prediction

Copyright 2007, University of Alberta Amdahl’s Law Gene Amdahl: 1967 parallelize some of the program – some must remain serial f is the fraction of the calculation that is serial 1- f is the fraction of the calculation that is parallel the maximum speedup that can be obtained by using P processors is: f 1- f serial parallel

Copyright 2007, University of Alberta Amdahl’s Law if 25% of the calculation must remain serial the best speedup you can obtain is 4 need to parallelize as much of the program as possible to get the best advantage from multiple processors

Copyright 2007, University of Alberta Agenda What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID??

Copyright 2007, University of Alberta Parallel Programming need to do something to your program to use multiple processors need to incorporate commands into your program which allow multiple threads to run one thread per processor each thread gets a piece of the work several ways (APIs) to do this …

Copyright 2007, University of Alberta Parallel Programming OpenMP introduce statements into your code in C: #pragma in FORTRAN: C$OMP or !$OMP can compile serial and parallel executables from the same source code restricted to shared memory machines not clusters!

Copyright 2007, University of Alberta Parallel Programming OpenMP demo: MatCrunch mathematical operations on the elements of an array introduce 2 OMP directives before a loop # pragma omp parallel // define a parallel section # pragma omp for // loop is to be parallel serial section:4.03 sec parallel section – 1 cpu: secs parallel section – 2 cpu: secs speedup = 1.99// not bad for adding 2 lines

Copyright 2007, University of Alberta Parallel Programming for a larger number of processors the speedup for MatCrunch is not linear need to do the speedup test to see how your program will behave

Copyright 2007, University of Alberta Parallel Programming MPI (Message Passing Interface) a standard set of communication subroutine libraries works for SMPs and clusters programs written with MPI are highly portable information and downloads MPICH: LAM/MPI: Open MPI:

Copyright 2007, University of Alberta Parallel Programming MPI (Message Passing Interface) supports the SPMD, single program multiple data model all processors use the same program each processor has its own data think of a cluster – each node is getting a copy of the program but running a specific portion of it with its own data

Copyright 2007, University of Alberta Parallel Programming it’s possible to combine OpenMP and MPI for running on clusters of SMP machines the trick in parallel programming is to keep all the processors working (“load balancing”) working on data that no other processor needs to touch (there aren’t any cache conflicts)

Copyright 2007, University of Alberta Agenda What is High Performance Computing? What is a “supercomputer”? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID??

Copyright 2007, University of Alberta Grid Computing A computational grid: is a large-scale distributed computing infrastructure composed of geographically distributed, autonomous resource providers lots of computers joined together requires excellent networking that supports resource sharing and distribution offers access to all the resources that are part of the grid compute cycles storage capacity visualization/collaboration is intended for integrated and collaborative use by multiple organizations

Copyright 2007, University of Alberta Grids Ian Foster (the “Father of the Grid”) says that to be a Grid three points must be met computing resources are not administered centrally many sites connected open standards are used not a proprietary system non-trivial quality of service is achieved it is available most of the time CERN says a Grid is “a service for sharing computer power and data storage capacity over the Internet”

Copyright 2007, University of Alberta Canadian Academic Computing Sites in 2000

Copyright 2007, University of Alberta Canadian Grids Some sites in Canada have tied their resources together to form 7 Canadian Grid Consortia: ACENET Atlantic Computational Excellence Network CLUMEQ Consortium Laval UQAM McGill and Eastern Quebec for High Performance Computing SCINET University of Toronto HPCVL High Performance Computing Virtual Laboratory RQCHP Reseau Quebecois de calcul de haute performance SHARCNET Shared Hierarchical Academic Research Computing Network WESTGRID Alberta, British Columbia Some sites in Canada have tied their resources together to form 7 Canadian Grid Consortia: ACENET Atlantic Computational Excellence Network CLUMEQ Consortium Laval UQAM McGill and Eastern Quebec for High Performance Computing SCINET University of Toronto HPCVL High Performance Computing Virtual Laboratory RQCHP Reseau Quebecois de calcul de haute performance SHARCNET Shared Hierarchical Academic Research Computing Network WESTGRID Alberta, British Columbia

Copyright 2007, University of Alberta WestGrid Edmonton Calgary UBC Campus SFU Campus

Copyright 2007, University of Alberta Grids the ultimate goal of the Grid idea is to have a system that you can submit a job to, so that: your job uses resources that fit requirements that you specify  128 nodes on an SMP  200 GB of RAM or  256 nodes on a PC cluster  1 GB/processor when done the results come back to you you don’t care where the job runs Vancouver or St. John’s or in between

Copyright 2007, University of Alberta Sharing Resources HPC resources are not available quite as readily as your desktop computer the resources must be shared fairly the idea is that each person get as much of the resource as necessary to run their job for a “reasonable” time if the job can’t finish in the allotted time the job needs to “checkpoint” save enough information to begin running again from where it left off

Copyright 2007, University of Alberta Sharing Resources Portable Batch System (Torque) submit a job to PBS job is placed in a queue with other users’ jobs jobs in the queue are prioritized by a scheduler your job executes at some time in the future An HPC Site

Copyright 2007, University of Alberta Sharing Resources When connecting to a Grid we need a layer of “middleware” tools to securely access the resources Globus is one example A Grid of HPC Sites

Copyright 2007, University of Alberta Questions? Many details in other sessions of this workshop!