Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN 28.08.2007.

Slides:



Advertisements
Similar presentations
Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
Advertisements

GPU and PC System Architecture UC Santa Cruz BSoE – March 2009 John Tynefield / NVIDIA Corporation.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
L1 Event Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting Dubna, October 16, 2008.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Structure of Computer Systems
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Intel Core2 GHz Q6700 L2 Cache 8 Mbytes (4MB per pair) L1 Cache: (128 KB Instruction +128KB Data at the core level???) L3 Cache: None? CPU.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
FSOSS Dr. Chris Szalwinski Professor School of Information and Communication Technology Seneca College, Toronto, Canada GPU Research Capabilities.
Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.
A many-core GPU architecture.. Price, performance, and evolution.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
Panda: MapReduce Framework on GPU’s and CPU’s
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt.
Chapter 18 Multicore Computers
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
COMPUTER ARCHITECTURE (for Erasmus students)
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
1 Latest Generations of Multi Core Processors
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?
Microprocessors BY Sandy G.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Sony PlayStation 3 Sony also laid out the technical specs of the device. The PlayStation 3 will feature the much-vaunted Cell processor, which will run.
Multicore – The future of Computing Chief Engineer Terje Mathisen.
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
© 2007 IBM Corporation Game Tomorrow! The Science - and Future - of Gaming.
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Trends in Multicore Architecture.
Graphic Processing Units Presentation by John Manning.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
“Processors” issues for LQCD January 2009 André Seznec IRISA/INRIA.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.
Hardware Architecture
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.
Lynn Choi School of Electrical Engineering
Modern supercomputers, Georgian supercomputer project and usage areas
Brad Baker, Wayne Haney, Dr. Charles Choi
Hyperthreading Technology
Parallel Computers Today
CSE 502: Computer Architecture
Presentation transcript:

Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN

Multi-core and tera-scale computing - Andrzej Nowak, CERN3 The “free” bonus > Silicon technology advances more quickly than design capabilities > Single CPU complexity is rising slowly > Moving from 90nm and 65nm processes to 45nm and 32nm processes > Free transistors available  Take all you want… eat all you take

Multi-core and tera-scale computing - Andrzej Nowak, CERN4 The multi-core revolution > What do we do with extra silicon?  Copy what we already have > First shot at the PC consumer market – Intel’s Hyper- Threading in the Xeons and Pentium 4 (SMT)  Idea: do work when nothing is happening  Some resources in the CPU core were shared  The relation to extra space on die was not direct > First popular dual-core CPU for Joe Average – the Intel Core Duo  Idea: copy a big part of the processor  Less resources are shared > Next generations of x86-like CPUs are coming  6, 8, 16 cores

Multi-core and tera-scale computing - Andrzej Nowak, CERN5 Multi-core designs > Many other multi-core CPUs are on the market  AMD x2 (and x4 coming soon)  ARM specifications for multi-core CPUs (your iPod is dual core!)  Sun’s Niagara processor (8 cores)  Cell processor in Playstation 3 units > Programmers need to take advantage of the new features  CERN openlab and Intel are organizing a multi- threading and parallelism workshop on the beginning of October!

Multi-core and tera-scale computing - Andrzej Nowak, CERN6 Tera-scale computing > Computer performance is traditionally expressed in FLOPS (floating point operations per second)  CDC 6600 (1966) – 10 MFLOPS, 64kB memory  Your iPod – 100 MFLOPS  Your iMac – 3-4 GFLOPS  Your graphics card: GFLOPS > Not so far from the magical limit - 1 Teraflop…? Hence the name, tera-scale

Multi-core and tera-scale computing - Andrzej Nowak, CERN7 Processors in GPUs (digression) > Newest trend – heavily multi-core (up to 128) > Blazing fast > Toolkits available (i.e. NVIDIA CUDA) > But…  Floating point operations are not precise enough or non-standard  Data types are limited  Memory handling is not optimized for general purpose computing  Tiny cache, if at all  ~150W… for the chip only

Multi-core and tera-scale computing - Andrzej Nowak, CERN8 Tera-scale computing ctd. > Intel’s Polaris  80-core prototype  ~1 TFLOPS > Intel’s Larrabee design  core x86-GPU hybrid  ~3 TFLOPS > Research directions  How do you feed 80 hungry cores?  Parallelism – fine grained or coarse?  Effective virtualization  Memory access and bus optimization  Resource sharing

Multi-core and tera-scale computing - Andrzej Nowak, CERN9 Questions for the future > How many cores does your mother need? > How many cores do you, a scientist, need? > How do you effectively use what you have? > What is the best level to introduce parallelism? Do you need to redesign your software? > GRID computing or tera-scale homogenous computers? Will virtualization be effective enough?

Q&A (1 Swiss minute) This research project has been supported by a Marie Curie Early Stage Research Training Fellowship of the European Community’s Sixth Framework Programme under contract number (MEST-CT )