Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

SE-292 High Performance Computing

Structure of Computer Systems

Today’s topics Single processors and the Memory Hierarchy

Beowulf Supercomputer System Lee, Jung won CS843.

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

Types of Parallel Computers

Chapter 1 Parallel Computers.

Parallel Computers Chapter 1

CSCI-455/522 Introduction to High Performance Computing Lecture 2.

COMPE 462 Parallel Computing

Information Technology Center Introduction to High Performance Computing at KFUPM.

History of Distributed Systems Joseph Cordina

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.

GCSE Computing - The CPU

Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.

Term 2, 2011 Week 1. CONTENTS Network communications standards – Ethernet – TCP/IP Other network protocols – The standard – Wireless application.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

Computer System Architectures Computer System Software

1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab

Distributed Shared Memory Systems and Programming

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

Multi-core architectures. Single-core computer Single-core CPU chip.

Multi-Core Architectures

Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your.

并行程序设计 Programming for parallel computing 张少强 QQ: ( 第一讲： 2011 年 9 月.

Grid Computing, B. Wilkinson, 20047a.1 Computational Grids.

Parallel Computer Architecture and Interconnect 1b.1.

1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.

ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, Dec 26, 2012outline.1 ITCS 4145/5145 Parallel Programming Spring 2013 Barry Wilkinson Department.

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.

By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

Background Computer System Architectures Computer System Software.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.

Parallel and Distributed Programming: A Brief Introduction Kenjiro Taura.

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Parallel Computers Chapter 1.

Constructing a system with multiple computers or processors

What happens inside a CPU?

What is Parallel and Distributed computing?

Multiple Processor Systems

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Chapter 1 Introduction.

Programming with Shared Memory

Chapter 4 Multiprocessors

Types of Parallel Computers

Multicore and GPU Programming

Presentation transcript:

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016

2 Conventional Computer Consists of a processor executing a program stored in a (main) memory: Each main memory location located by its address. Addresses start at 0 and extend to 2 b - 1 when there are b bits (binary digits) in address. Main memory Processor Instructions (to processor) Data (to or from processor)

3 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer

4 1. Shared Memory Multiprocessor System Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module: Processors Processor-memory Interconnections Memory module One address space

5 Using a processor-memory bus as the interconnection network Example – Dual and Quad Processor Shared Memory Multiprocessors Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Memory controller Memory Processor/ memory bus Shared memory Set of lines 100+ coit-grid01 – coit-grid04 are of this form (each a dual processor server with 8GB shared memory). Typically nowadays individual connections are used instead of a bus.

Dual-core and multi-core processors Two or more independent processors in one package Actually an old idea but not put into wide practice until recently because limits of making single processors faster principally caused by: –Power dissipation (power wall) and clock frequency limitations –Limits in parallelism within a single instruction stream –Memory speed limitations (memory wall) 6 “Recent” innovation(since 2005)

7 Single quad core shared memory multiprocessor Shared L2 Cache Memory controller Memory Shared memory Chip Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor “core” Note: Recently (after 2011) the memory controller is integrated inside the processor chip Maybe an L3 cache

8 Multiple quad-core multiprocessors Memory Shared memory Core cache Core cache Core cache Core cache Core cache Core cache Core cache Core cache Core cache Core cache Core cache Core cache Examples cci-grid05.uncc.edu - four processors each quad core. All 16 cores have access to 64 GB shared main memory (thro multilevel caches) cci-grid09.uncc.edu - two processors, each 16 core, 3 levels of caches Note: Recently (after 2011) with the move of the memory controller to inside the chip, the shared memory is distributed with processors

9 Programming Shared Memory Multiprocessors Several possible ways – Usual approach is to use threads Threads - individual parallel sequences (threads), each thread having their own local variables but being able to access shared variables declared outside threads. 1.Low–level thread libraries - programmer calls thread routines to create and control the threads. Examples: Pthreads, Java threads. 2.Higher level library functions and preprocessor compiler directives. Example: OpenMP – an industry standard.

10 Other programming alternatives Parallelizing compilers compiling regular sequential programs and making them parallel programs Special parallel languages (both not now common).

11 2. Distributed Memory Multicomputer Complete computers connected through an interconnection network: Processor Interconnection network Local Computers Messages memory Many interconnection networks explored in 1970s and 1980s including 2- and 3- dimensional meshes, hypercubes, and multistage interconnection networks

12 Networked Computers as a Computing Platform Became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990s. Several early projects. Notable: NASA Beowulf project. “Beowulf” cluster -- A group of interconnected commodity computers achieving high performance with low cost. Typically using commodity interconnects - high speed Ethernet, and Linux OS.

13 Key advantages of using commodity networked computers: Very high performance workstations and PCs readily available at low cost. Latest processors can easily be incorporated into the system as they become available. Existing software can be used or modified.

Cluster Interconnects Originally fast Ethernet on low cost clusters Gigabit Ethernet - easy upgrade path More specialized/higher performance interconnects available including Myrinet and Infiniband. 14

1b.15 Dedicated cluster with a master node and compute nodes User Master node Compute nodes Dedicated Cluster Ethernet interface Switch External network Computers Local network

16 Software Tools for Clusters Based upon message passing programming model User-level libraries provided for explicitly specifying messages to be sent between executing processes on each computer. Use with regular programming languages (C, C++,...). Can be quite difficult to program correctly as we shall see.

Using GPUs for high performance computing GPUs (graphics processing units) originally designed to speed up and support graphics operations Now also used for high performance computing. GPUs now have 100’s or 1000’s of processing cores and provide orders of magnitude increase in execution speed. We will look at GPU devices and how to program them in the last few weeks of the course 17

GPU clusters Recent trend for clusters – incorporating GPUs for high performance. Several of the fastest computers in the world are GPU clusters UNC-C cluster used in course has three GPU servers: coit-grid06.uncc.edu (448 core C2050 GPU) coit-grid07.uncc.edu (448-core C2050 GPU) coit-grid08.uncc.edu (2496-core K20 GPU) 18 July 2014 K20x GPUs C2050 GPUs Was # 1 in Nov ,640 cores

Next step Learn how to program multiprocessor systems We will start with shared memory programming and OpenMP. 19