Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Slides:



Advertisements
Similar presentations
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow Wilson W. L. Fung Ivan Sham George Yuan Tor M. Aamodt Electrical and Computer Engineering.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Chapter 17 Parallel Processing.
Chapter 7 Multicores, Multiprocessors, and Clusters.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Pertemuan 25 Parallel Processing 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Advanced Computer Architectures
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
Basics and Architectures
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Multicore Systems CET306 Harry R. Erwin University of Sunderland.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Morgan Kaufmann Publishers
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
Flynn’s Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Grid Computing Framework A Java framework for managed modular distributed parallel computing.
Floating Point Numbers & Parallel Computing. Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Copyright © Curt Hill SIMD Single Instruction Multiple Data.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Lecture # 10 Processors Microcomputer Processors.
Processor Level Parallelism 1
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Single Instruction Multiple Data
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Distributed Processors
Flynn’s Classification Of Computer Architectures
Morgan Kaufmann Publishers
Multi-Processing in High Performance Computer Architecture:
Chapter 17 Parallel Processing
Symmetric Multiprocessing (SMP)
Chapter 4 Multiprocessors
Multi-core and Beyond COMP25212 System Architecture
CS 286 Computer Organization and Architecture
The University of Adelaide, School of Computer Science
6- General Purpose GPU Programming
Utsunomiya University
Multicore and GPU Programming
Presentation transcript:

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group

Processors Designs  Power  Single Threaded Performance  Throughput  Reliability ......

Classifying Processors  SISD Single Instruction Single Data Uniprocessor  SIMD Single Instruction Multiple Data Vector processor & Vector operations (MMX & SSE)  MIMD Multiple Instructions Multiple Data Multi-cores (multiprocessors)  MISD (not known)  SPMD: Single Program Multiple Data (MIMD plus only one program) clusters

Classifying Processors  RISC Reduced instruction set Small number of very fast simple instructions Complex instructions are constructed from many smaller instructions  CISC Complex instruction set Lots of instructions Can be slow, but do a lot of work per instruction

Graphics Processing Units (GPU)  HD video and games are computationally very demanding (Beyond even the best CPU’s)  Extremely parallel, each pixel is independent  Quite different emphasis and evolution for GPUs Fine to perform non-graphics tasks poorly or not at all Large number of cores and each highly multithreaded ( concurrent threads per Nvidia core) Additional threads are queued till the earlier threads complete Shared register file Each core is SIMD No coherency between cores No communication between groups of threads Very fast memory access

Graphics Processing Units (GPU)

Coalesced Memory Access

Un-Coalesced Memory Access

SpiNNaker Massively Parallel System

Fabricated SpiNNaker CMP Fabricated in UMC 130nm L130E CMP Die Area sq.mm Over 100 million transistors Power consumption of 1W at 1.2V when all the processor cores are operating Peak Performance – 4 GIPS

Constructing Clusters, Data Centres and Super Computers

Composing Multi-cores QPI or HT Input/Output Hub Motherboard Multi-core Memory (DRAM) Input/Output Hub Multi-core Chip Memory (DRAM) Multi-core Chip Memory (DRAM) Multi-core Chip Memory (DRAM)

Composing Multiple Computers... Interconnection Network

Clusters/Super Computers/Data Centres  All terms overloaded and misused  Have lots of CPU’s on lots of Mother boards  Clusters/Super Computers are used to run one large task very quickly eg. A simulation  Cluster/Farms/Data centres do thousands of independent tasks in parallel eg. Google Mail  The distinction becomes blurred with services such as Google  Main difference is the network between CPU’s

Building a Cluster/SC/DC Large numbers of self contained computers in a small form factor These are optimised for cooling and power efficiency Racks house 10’s – 100’s of CPU’s They normally also contain separate units for networking and power distribution They are self contained

Building a Cluster/SC/DC Sometimes a rack is not big enough How many new computers a day go into a data centre? What does this mean for reliability?

Building a Cluster/SC/DC  Join Lots of racks  Add power distribution, network and cooling  For Super Computers add racks dedicated to storage

K Super Computer  Water cooled  6D network for fault tolerance  RISC processors (Sparc64 VIII fx)  90,000 processors

Questions ?