Parallel Processing with PlayStation3 Lawrence Kalisz.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
Multi-core/Cell Game Engine Design
Sven Woop Computer Graphics Lab Saarland University
Systems and Technology Group © 2006 IBM Corporation Cell Programming Tutorial - JHD24 May 2006 Cell Programming Tutorial Jeff Derby, Senior Technical Staff.
A Seamless Communication Solution for Hybrid Cell Clusters Natalie Girard Bill Gardner, John Carter, Gary Grewal University of Guelph, Canada.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Parallelizing GIS applications for IBM Cell Broadband engine and x86 Multicore platforms Bharghava R, Jyothish Soman, K S Rajan International.
Cell Broadband Engine. INF5062, Carsten Griwodz & Pål Halvorsen University of Oslo Cell Broadband Engine Structure SPE PPE MIC EIB.
Intel Core2 GHz Q6700 L2 Cache 8 Mbytes (4MB per pair) L1 Cache: (128 KB Instruction +128KB Data at the core level???) L3 Cache: None? CPU.
Ido Tov & Matan Raveh Parallel Processing ( ) January 2014 Electrical and Computer Engineering DPT. Ben-Gurion University.
Sony PLAYSTATION 3 and the Cell Processor Dr. Hayden So Department of Electrical and Electronic Engineering 3 Sep, 2008.
Presented by Performance and Productivity of Emerging Architectures Jeremy Meredith Sadaf Alam Jeffrey Vetter Future Technologies.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
OS Case Study: The Xbox 360  Instructor: Rob Nash  Readings: See citations in the slides.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Development of a Ray Casting Application for the Cell Broadband Engine Architecture Shuo Wang University of Minnesota Twin Cities Matthew Broten Institute.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Cell Broadband Processor Daniel Bagley Meng Tan. Agenda  General Intro  History of development  Technical overview of architecture  Detailed technical.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
Cell Systems and Technology Group. Introduction to the Cell Broadband Engine Architecture  A new class of multicore processors being brought to the consumer.
Evaluation of Multi-core Architectures for Image Processing Algorithms Masters Thesis Presentation by Trupti Patil July 22, 2009.
Agenda Performance highlights of Cell Target applications
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc
1/21 Cell Processor (Cell Broadband Engine Architecture) Mark Budensiek.
March 12, 2007 Introduction to PS3 Cell BE Programming Narate Taerat.
Neuroblastoma Stroma Classification on the Sony Playstation 3 Tim Hartley, Olcay Sertel, Mansoor Khan, Umit Catalyurek, Joel Saltz, Metin Gurcan Department.
Dragged, Kicking and Screaming: Multicore Architecture and Video Games.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
CyRay: Real-Time Ray Tracing on the PlayStation3 CprE491: MAY08-38 Brendan Campbell, Sean Godinez, Daniel Risse, Aaron Westphal 12 December 2007 DISCLAIMER:
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
1 The IBM Cell Processor – Architecture and On-Chip Communication Interconnect.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
Process Scheduling III ( 5.4, 5.7) CPE Operating Systems
Parallel Ray Tracer Computer Systems Lab Presentation Stuart Maier.
CS 4396 Computer Networks Lab Router Architectures.
Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
The Effects of Parallel Programming on Gaming Anthony Waterman.
Playstation2 Architecture Architecture Hardware Design.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Optimizing Ray Tracing on the Cell Microprocessor David Oguns.
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
Advanced Rendering Technology The AR250 A New Architecture for Ray Traced Rendering.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
1/21 Cell Processor Systems Seminar Diana Palsetia (11/21/2006)
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
● Cell Broadband Engine Architecture Processor ● Ryan Layer ● Ben Kreuter ● Michelle McDaniel ● Carrie Ruppar.
GPU Architecture and Its Application
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Cell Architecture.
Real-Time Ray Tracing Stefan Popov.
EEE4084F Digital Systems Lecture 24: RC Platform Case Studies 1/2
CSCE 441: Computer Graphics Ray Tracing
Ray Tracing on Programmable Graphics Hardware
Large data arrays processing on Cell Broadband Engine
Introduction to Cell Programming
Presentation transcript:

Parallel Processing with PlayStation3 Lawrence Kalisz

Topics Cell Processor 1.History 2.Architecture Parallel Programming 1.Install Linux 2.Examples PS3 Cluster 1.Applications 2.Examples

PS3 Cell Processor: History Created by Sony, Toshiba, and IBM (STI) 400 Engineers ½ Billion Dollars

PS3 Cell Processor: Architecture

Power Processing Element (PPE) Synergistic Processing Element (SPE) Element Interconnection Bus (EIB) Memory System Network Card & Graphics Card

Power Processor Element PPE handles operating system and control tasks 64-bit Power Architecture with VMX In-order, 2-way hardware simultaneous multi-threading (SMT) 32KB L1 cache (I & D) and 512KB L2

Synergistic Processing Element Specialized high performance core Three main components 1.SPU: Supplemental Processing Units 2.LS: local store memory 3. MFC: memory flow control manages data in and out of SPE Can only access (load & store) data in the SPE local store 7 SPEs used for rendering, 1 SPE reserved for image compression

SPE: Data IN and OUT Steps SPU needs data 1. SPU initiates MFC request for data 2. MFC requests data from memory 3. Data is copied to local store 4. SPU can access data from local store SPU operates on data then copies data from local store back to memory in a similar process

SPE: Data IN and OUT Steps

Element Interconnect Bus Physically overlaps all processor elements Central arbiter supports up to 3 concurrent transfers per ring 2 stage, dual round robin arbiter Each port supports concurrent 16B in and 16B out data path Ring topology is transparent to element data interface Each EIB Bus data port supports 25.6GBytes/sec each way

PS3 Cell: Parallel Programming

Current working Linux distros: 1.Fedora Core 5 2.YellowDog Gentoo PowerPC 64 edition 4.Debian OpenMPI (for use with cluster) IBMs CELL SDK

PS3 Cell: Parallel Programming Cell performance ~10x better than GPU for media and other applications that can take advantage of its SIMD capability PPE performance is comparable to a traditional GPU performance SPE performance mostly the same as, or better than, a GPU with SIMD Performance scales with number of SPEs

PS3 Cell: Parallel Programming Programming becomes exercise in partitioning, mapping (layout),routing (communication) and scheduling

PS3 Cell: Parallel Programming AI Backgammon player

PS3 Cell: Parallel Programming AI Backgammon player 1M board evaluations in ~3 seconds (6 SPEs) Data parallel implementation, linear speedup

PS3 Cell: Parallel Programming SPU programs are designed and written to work together but are compiled independently Separate compiler and toolchain (ppu-gcc and spu-gcc) Produces small ELF image for each program that can be embedded in PPU program

PS3 Cell: Parallel Programming BLUE-STEEL

PS3 Cell: Parallel Programming BLUE-STEEL Full ray tracer running on each SPE Data parallel implementation :// eature=player_detailpage :// eature=player_detailpage

PS3 Cell: Parallel Programming BLUE-STEEL A Solution to the rendering equation Triangle Rasterization – Fast – possible in real time on a single core – Inaccurate or tedious for global effects such as shadows, reflection, refraction, or global illumination Ray Tracing – Slow – unless done on multiple cores – Accurate and natural shadows, reflection, and refraction

PS3 Cell: Parallel Programming BLUE-STEEL Build a fast ray tracer from the ground up to take advantage of multiple cores. – 6 accessible cores for rendering

PS3 Cell: Parallel Programming Ray Tracing Shoot a ray through each pixel on the screen Check for intersections with each object in the scene Keep the closest intersection

PS3 Cell: Parallel Programming Ray Tracing Shade each point according to the material of the object, as well as the lights in the scene Cast rays for shadows, reflection, and refraction

PS3 Cell: Parallel Programming BLUE-STEEL

PS3: Cluster Applications Air Force PS3 Gravity Grid LACAL Student Cluster

References PS3-cell-tutorial.pdf PS3-cell-tutorial.pdf mS7XPiI mS7XPiI _cluster _cluster garra/PAPERS/scop3.pdf garra/PAPERS/scop3.pdf

Any Questions ?