Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Slides:

Advertisements

Similar presentations

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Advertisements

Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN

1 Computational models of the physical world Cortical bone Trabecular bone.

Parallel computer architecture classification

Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

The Central Processing Unit: What Goes on Inside the Computer.

Last Lecture The Future of Parallel Programming and Getting to Exascale 1.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

BY MANISHA JOSHI.  Extremely fast data processing-oriented computers.  Speed is measured in “FLOPS”.  For highly calculation-intensive tasks.  For.

Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

Information Technology Center Introduction to High Performance Computing at KFUPM.

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos.

The i9 Processor From INTEL By: Chad Sheppard. Little info about the new chip Coming from a great line of processors Intel Pentium 1, 2, 3, M, 4, 4HT.

CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.

Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.

Computing Resources Joachim Wagner Overview CNGL Cluster MT Group Cluster School Cluster Desktop PCs.

CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.

1 Microprocessor speeds Measure of system clock speed –How many electronic pulses the clock produces per second (clock frequency) –Usually expressed in.

Chapter 4  Converts data into information  Control center  Set of electronic circuitry that executes stored program instructions  Two parts ◦ Control.

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.

Writer:-Rashedul Hasan Editor:- Jasim Uddin

Lecture 2 : Introduction to Multicore Computing

Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.

Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 27 – A Brief History of the Microprocessor.

Processor Development The following slides track three developments in microprocessors since Clock Speed – the speed at which the processor can carry.

Jaguar Super Computer Topics Covered Introduction Architecture Location & Cost Bench Mark Results Location & Manufacturer Machines in top 500 Operating.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.

Massive Supercomputing Coping with Heterogeneity of Modern Accelerators Toshio Endo and Satoshi Matsuoka Tokyo Institute of Technology, Japan.

PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.

- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.

CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.

2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.

OpenCL Framework for Heterogeneous CPU/GPU Programming a very brief introduction to build excitement NCCS User Forum, March 20, 2012 György (George) Fekete.

CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.

A look at computing performance and usage.  3.6GHz Pentium 4: 1 GFLOPS  1.8GHz Opteron: 3 GFLOPS (2003)  3.2GHz Xeon X5460, quad-core: 82 GFLOPS.

Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.

Personal Chris Ward CS147 Fall  Recent offerings from NVIDA show that small companies or even individuals can now afford and own Super Computers.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.

Shashwat Shriparv InfinitySoft.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

Current Research Overview Jeremy Espenshade 09/04/08.

Multi-core CPU’s April 9, Multi-Core at BNL First purchase of AMD dual-core in 2006 First purchase of Intel multi-core in 2007 –dual-core in early.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.

Why Parallel/Distributed Computing Sushil K. Prasad

Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.

Lesson 8 CPUs Used in Personal Computers.

Super Computing By RIsaj t r S3 ece, roll 50.

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang

Parallel Computers Today

Nicole Ondrus Top 500 Parallel System Presentation

Lesson 8 CPUs Used in Personal Computers.

Multicore and GPU Programming

Multicore and GPU Programming

Presentation transcript:

Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson

Parallel Processing2 Goals Parallelism: To run large and difficult programs fast. Course: To become effective parallel programmers –“How to Write Parallel Programs” –“Parallelism will become, in the not too distant future, an essential part of every programmer’s repertoire” –“Coordination – a general phenomenon of which parallelism is one example – will become a basic and widespread phenomenon in CS” Why? –Some problems require extensive computing power to solve –The most powerful computer by definition is a parallel machine –Parallel computing is becoming ubiquitous –Distributed & networked computers with simultaneous users require coordination

Parallel Processing3 Top 500

Parallel Processing4 LINPACK Benchmark Solve a dense N  N system of linear equations, y = Ax, using Gaussian Elimination with partial pivoting –2/3N 3 + 2N 2 FLOPS High Performance LINPACK used to measure performance for TOP500 (introduced by Jack Dongarra)

Parallel Processing5 Example LU Decomposition Solve the following linear system Find LU decomposition A = PLU

Parallel Processing6 Big Machines Cray 2 DoE-Lawrence Livermore National Laboratory (1985) 3.9 gigaflops 8 processor vector machine Cray XMP/4 DoE, LANL,… (1983) 941 megaflops 4 processor vector machine

Parallel Processing7 Big Machines Cray Jaguar ORNL (2009) 1.75 petaflops 224,256 AMD Opteron cores Tianhe-1A NSC Tianjin, China (2010) petaflops 14,336 Xeon X5670 processors 7,168 Nvidia Tesla M2050 GPUS

Parallel Processing8 Need for Parallelism

Parallel Processing9 Multicore Intel Core i7

Parallel Processing10 Multicore IBM Blue Gene/L teraflops 65,536 "compute nodes” Cyclops64 80 gigaflops megahertz multiply-accumulate

Parallel Processing11 Multicore

Parallel Processing12 Multicore

Parallel Processing13 GPU Nvidia GTX teraflops 480 SP (700 MHz) Fermi chip 3 billion transistors

Parallel Processing14 Google Server 2003: 15,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III 2005: 200,000 servers 2006: upwards of servers

Drexel Machines Tux 5 nodes –4 Quad-Core AMD Opteron 8378 processors (2.4 GHz) –32 GB RAM Draco 20 nodes –Dual Xeon Processor X5650 (2.66 GHz) –6 GTX 480 –72 GB RAM 4 nodes –6 C2070 GPUs Parallel Processing15

Parallel Processing16 Programming Challenge “But the primary challenge for an 80-core chip will be figuring out how to write software that can take advantage of all that horsepower.” Read more: processor/ _ html?tag=mncol#ixzz1AHCK 1LEc

Parallel Processing17 Basic Idea One way to solve a problem fast is to break the problem into pieces, and arrange for all of the pieces to be solved simultaneously. The more pieces, the faster the job goes - upto a point where the pieces become too small to make the effort of breaking-up and distributing worth the bother. A “parallel program” is a program that uses the breaking up and handing-out approach to solve large or difficult problems.