Lecture 1: Introduction to High Performance Computing.

Slides:



Advertisements
Similar presentations
Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
1 Computational models of the physical world Cortical bone Trabecular bone.
Parallel computer architecture classification
1 Introduction to Parallel Processing with Multi-core Part I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA
CS61C L28 Parallel Computing (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
1 Lecture 1 Introduction to Parallel Computing Parallel Computing Fall 2008.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
CSE 260 Parallel Computation Allan Snavely, Henri Casanova
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
1 CHAPTER 2 COMPUTER HARDWARE. 2 The Significance of Hardware  Pace of hardware development is extremely fast. Keeping up requires a basic understanding.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
Writer:-Rashedul Hasan Editor:- Jasim Uddin
ICMAP-Shakeel 1 Infrastructure and Operations. ICMAP-Shakeel 2 Performance Variable for IT Functional capabilities and limitations Price-performance ratio.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
Lecture 1: Introduction. Course Outline The aim of this course: Introduction to the methods and techniques of performance analysis of computer systems.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 1 High Performance Computing Helvi Hartmann FIAS Inverted CERN.
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
CSCI-455/522 Introduction to High Performance Computing Lecture 1.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Department of Computer Science University of the West Indies Part II.
CS591x -Cluster Computing and Parallel Programming
Outline Why this subject? What is High Performance Computing?
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Tackling I/O Issues 1 David Race 16 March 2010.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
1 A simple parallel algorithm Adding n numbers in parallel.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.
Why Parallel/Distributed Computing Sushil K. Prasad
Introduction to Computers - Hardware
Parallel Computing and Parallel Computers
Lynn Choi School of Electrical Engineering
Modern supercomputers, Georgian supercomputer project and usage areas
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
Introduction Super-computing Tuesday
Parallel computer architecture classification
Overview of Earth Simulator.
Super Computing By RIsaj t r S3 ece, roll 50.
CS775: Computer Architecture
Parallel Computers.
What is Parallel and Distributed computing?
Introduction.
Introduction.
Introduction and History of Cray Supercomputers
Course Description: Parallel Computer Architecture
Chapter 1 Introduction.
By Brandon, Ben, and Lee Parallel Computing.
Vrije Universiteit Amsterdam
CSE 102 Introduction to Computer Engineering
Presentation transcript:

Lecture 1: Introduction to High Performance Computing

Grand challenge problem A grand challenge problem is one that cannot be solved in a reasonable amount of time with today’s computers.

Weather Forecasting Cells of size 1 mile x 1 mile x 1 mile => Whole global atmosphere about 5 x 10 8 cells If each calculation requires 200 Flops => Flops, in one time step To forecast the weather over 10 days using 10-minute intervals, with a computer operating at 100 Mflops (10 8 Flops/s) => would take 10 7 seconds or over 100 days. To perform the calculation in 10 minutes would require a computer operating at 1.7 Tflops (1.7 x Flops/s).

Some Grand Challenge Applications Science Global climate modeling Astrophysical modeling Biology: genomics; protein folding; drug design Computational Chemistry Computational Material Sciences and Nanosciences Engineering Crash simulation Semiconductor design Earthquake and structural modeling Computation fluid dynamics (airplane design) Combustion (engine design) Business Financial and economic modeling Transaction processing, web services and search engines Defense Nuclear weapons -- test by simulations Cryptography

Units of High Performance Computing Speed 1 Mflop/s 1 Megaflop/s 10 6 Flop/second 1 Gflop/s 1 Gigaflop/s 10 9 Flop/second 1 Tflop/s 1 Teraflop/s Flop/second 1 Pflop/s 1 Petaflop/s Flop/second Capacity 1 MB 1 Megabyte 10 6 Bytes 1 GB 1 Gigabyte 10 9 Bytes 1 TB 1 Terabyte Bytes 1 PB 1 Petabyte Bytes

Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.

Moore’s Law holds also for performance and capacity ComputerENIACLaptop Number of vacuum tubes / transistors Weight (kg) Size (m 3 ) Power (watts) Cost ($) Memory (bytes) Performance (Flops/s)

Peak Performance A contemporary RISC processor delivers 10% of its peak performance Two primary reasons behind this low efficiency: IPC inefficiency Memory inefficiency

Instructions per cycle (IPC) inefficiency Today the theoretical IPC is 4-6 Detailed analysis for a spectrum of applications indicates that the average IPC is 1.2–1.4 ~75% of the performance is not used

Reasons for IPC inefficiency Latency Waiting for access to memory or other parts of the system Overhead Extra work that has to be done to manage program concurrency and parallel resources the real work you want to perform Starvation Not enough work to do due to insufficient parallelism or poor load balancing among distributed resources Contention Delays due to fighting over what task gets to use a shared resource next. Network bandwidth is a major constraint

Memory Hierarchy

Processor-Memory Problem Processors issue instructions roughly every nanosecond DRAM can be accessed roughly every 100 nanoseconds The gap is growing: processors getting faster by 60% per year DRAM getting faster by 7% per year

Processor-Memory Problem

How fast can a serial computer be? Consider the 1 Tflop sequential machine data must travel distance, r, to get from memory to CPU to get 1 data element per cycle, this means times per second at the speed of light, c = 3x10 8 m/s so r < c / = 0.3 mm For 1 TB of storage in a 0.3 mm 2 area each word occupies about 3 Angstroms 2, the size of a small atom

So, we need Parallel Computing!

High Performance Computers In 1980s 1x10 6 Floating Point Ops/sec (Mflop/s) Scalar based In 1990s 1x10 9 Floating Point Ops/sec (Gflop/s) Vector & Shared memory computing Today 1x10 12 Floating Point Ops/sec (Tflop/s) Highly parallel, distributed processing, message passing

What is a Supercomputer? A supercomputer is a hardware and software system that provides close to the maximum performance that can currently be achieved

Top500 Computers Over the last 10 years the range for the Top500 has increased greater than Moore’s law: 1993 #1 = 59.7 GFlop/s #500 = 422 MFlop/s 2004 #1 = 70 TFlop/s #500 = 850 GFlop/s

Top500 List at June 2005 Manuf.ComputerInstal. SiteCntryYearRmax (Tflop/s) #proc 1 IBMBlueGene/LLLNLUSA IBMBlueGene/LIBM Watson Res. Center USA SGIAltixNASAUSA NECVectorEarth Simulator Center Japan IBMClusterBarcelona Supercomp. C. Spain

Performance Development

Increasing CPU Performance Manycore Chip Composed of hybrid cores Some general purpose Some graphics Some floating point

What is Next? Board composed of multiple manycore chips sharing memory Rack composed of multiple boards A room full of these racks  Millions of cores  Exascale systems (10 18 Flop/s)

Moore’s Law Reinterpreted Number of cores per chip doubles every 2 year, while clock speed decreases (not increases). Need to deal with systems with millions of concurrent threads Number of threads of execution doubles every 2 year

Performance Projection

Directions Move toward shared memory SMPs and Distributed Shared Memory Shared address space with deep memory hierarchy Clustering of shared memory machines for scalability Efficiency of message passing and data parallel programming MPI and HPF

Future of HPC Yesterday's HPC is today's mainframe is tomorrow's workstation