IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.

Slides:



Advertisements
Similar presentations
Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.
Advertisements

High Speed Data Acquisition Architectures. Some Basic Architectures Non-Buffered (streaming) FIFO Buffered Multiplexed RAM Ping Pong Multiplexed RAM Dual.
Parallel Processing with PlayStation3 Lawrence Kalisz.
DSPs Vs General Purpose Microprocessors
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multicore Architectures Michael Gerndt. Development of Microprocessors Transistor capacity doubles every 18 months © Intel.
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
Cell Broadband Engine. INF5062, Carsten Griwodz & Pål Halvorsen University of Oslo Cell Broadband Engine Structure SPE PPE MIC EIB.
Ido Tov & Matan Raveh Parallel Processing ( ) January 2014 Electrical and Computer Engineering DPT. Ben-Gurion University.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
How To Buy a Computer Steps to buying the right computer for you.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08.
Module I Overview of Computer Architecture and Organization.
INTRODUCTION TO MICROCONTROLLER. What is a Microcontroller A microcontroller is a complete microprocessor system, consisting of microprocessor, limited.
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy IBM Systems and Technology Group IBM Journal of Research and Development.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
Introduction to the Cell multiprocessor J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy (IBM Systems and Technology Group)
Evaluation of Multi-core Architectures for Image Processing Algorithms Masters Thesis Presentation by Trupti Patil July 22, 2009.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Computer Graphics Graphics Hardware
Unit 2 - Hardware Graphics Cards. Why do we need graphics cards? ● The processor executes commands for many different purposes. ● Graphics processing.
Computer Architecture and Organization Introduction.
© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
Lecture 11: 10/1/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Performance Tuning John Black CS 425 UNR, Fall 2000.
Outline Why this subject? What is High Performance Computing?
Playstation2 Architecture Architecture Hardware Design.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Optimizing Ray Tracing on the Cell Microprocessor David Oguns.
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Overview of microcomputer structure and operation
High performance computing architecture examples Unit 2.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
EEL 4709C Prof. Watson Herman Group 4 Ali Alshamma, Derek Montgomery, David Ortiz 11/11/2008.
● Cell Broadband Engine Architecture Processor ● Ryan Layer ● Ben Kreuter ● Michelle McDaniel ● Carrie Ruppar.
Computer Graphics Graphics Hardware
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Visit for more Learning Resources
Cell Architecture.
Presented by: Tim Olson, Architect
CS703 - Advanced Operating Systems
Lecture 2: Intro to the simd lifestyle and GPU internals
Chapter 17 Parallel Processing
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Computer Graphics Graphics Hardware
Computer Evolution and Performance
Multicore and GPU Programming
Presentation transcript:

IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing

Outline Architectural overview Definition of Cell Processor Shared vs Private Memory Design System Scalability

Architectural Features Power Processor Element (PPE) –64 bit Synergistic Processing Elements (SPE) –32 bit –256 kb of on chip RAM Element Interconnect Bus (EIB) I/O Interface RAM

Synergistic Processing Elements Vector Instructions PPE delegates anything that is parallelizable Local Store

Stream Processing

Vector Instructions

Advantages of Vector Instructions Fewer instructions Fewer branch instructions -- fewer mispredictions Access memory block at a time Less memory access = faster processing time Example: convert an image to grayscale

Disadvantages of Vector Processor More expensive to produce Increased code complexity May be difficult to port between systems Increased power consumption Wasted resources if using scalar instructions

Definition of Cell Processor Microprocessor designed to optimize cooperation between ordinary desktop processors and more specialized high- performance processors (like a GPU) Performance and hardware simplicity prioritized over programming convenience

Shared Memory RAM Private Memory Local Stores on SPE Cache on PPE Pretty Simple

System Design Vector instruction optimization takes planning (ie shopping list) Gaming (PS3) Cryptography, graphics transform and lighting, physics, fast-Fourier transforms (FFT), matrix operations

Scalability Harder to optimize programs Easier to optimize hardware

Why use a cell processor? General purpose processor Designed not to have any slow components Even though you cannot vectorize every instruction, the SPE’s are still useful Worse case - just as fast as an ordinary desktop processor