Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc. www.gedae.com 856 - 231-

Slides:



Advertisements
Similar presentations
Parallel Processing with PlayStation3 Lawrence Kalisz.
Advertisements

4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
Cell Broadband Engine. INF5062, Carsten Griwodz & Pål Halvorsen University of Oslo Cell Broadband Engine Structure SPE PPE MIC EIB.
Chapter 3 Computer Hardware and Peripherals: Your Digital Toolbox
Basic Computer Hardware and Software.
Real-Time Video Analysis on an Embedded Smart Camera for Traffic Surveillance Presenter: Yu-Wei Fan.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal.
0 HPEC 2010 Automated Software Cache Management.
Cell Broadband Processor Daniel Bagley Meng Tan. Agenda  General Intro  History of development  Technical overview of architecture  Detailed technical.
PlayStation 2 Architecture Irin Jose Farid Momin Quy Ngo Olivia Wong.
V Material obtained from summer workshop in Guildford County.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Introduction to the Cell multiprocessor J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy (IBM Systems and Technology Group)
Evaluation of Multi-core Architectures for Image Processing Algorithms Masters Thesis Presentation by Trupti Patil July 22, 2009.
Using the PSoC USB March 17, 2012 Lloyd Moore, President/Owner.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix E Authors: John Hennessy & David Patterson.
Christian Steinle, University of Mannheim, Institute of Computer Engineering1 L1 Tracking – Status CBMROOT And Realisation Christian Steinle, Andreas Kugel,
Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.
© 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, Scott Geaghan, Myra Jean Prelle, Brian Bouzas,
Gedae, Inc. Implementing Modal Software in Data Flow for Heterogeneous Architectures James Steed, Kerry Barnes, William Lundgren Gedae, Inc.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.
March 12, 2007 Introduction to PS3 Cell BE Programming Narate Taerat.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
1 The IBM Cell Processor – Architecture and On-Chip Communication Interconnect.
© CCI Learning Solutions Inc. 1 Lesson 2: Elements of a Personal Computer System unit Microprocessor chip How memory is measured What ROM is What RAM is.
Simple, Efficient, Portable Decomposition of Large Data Sets William Lundgren Gedae), David Erb (IBM), Max Aguilar (IBM), Kerry Barnes.
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES
HPEC SMHS 9/24/2008 MIT Lincoln Laboratory Large Multicore FFTs: Approaches to Optimization Sharon Sacco and James Geraci 24 September 2008 This.
Improved air combat awareness - with AESA and next-generation signal processing Main beam jamming rejection Wide transmit beam Communication Side lobe.
Group May Bryan McCoy Kinit Patel Tyson Williams.
An FX software correlator for VLBI Adam Deller Swinburne University Australia Telescope National Facility (ATNF)
Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.
High Performance Computing Group Feasibility Study of MPI Implementation on the Heterogeneous Multi-Core Cell BE TM Architecture Feasibility Study of MPI.
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Motherboard A motherboard allows all the parts of your computer to receive power and communicate with one another.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Chapter 5A Transforming Data Into Information.
Optimizing Ray Tracing on the Cell Microprocessor David Oguns.
Comparison of Cell and POWER5 Architectures for a Flocking Algorithm A Performance and Usability Study CS267 Final Project Jonathan Ellithorpe Mark Howison.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
5 th October 2004Hardware – KS41 Hardware Objectives: Computer systems  What do they do?  Identify the hardware that makes up a computer system (PC)
Basic Computer Hardware and Software. Guilford County SciVis V
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
Basic Computer Hardware and Software.
KERRY BARNES WILLIAM LUNDGREN JAMES STEED
Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc
High performance computing architecture examples Unit 2.
XRD data analysis software development. Outline  Background  Reasons for change  Conversion challenges  Status 2.
SEPTEMBER 8, 2015 Computer Hardware 1-1. HARDWARE TERMS CPU — Central Processing Unit RAM — Random-Access Memory  “random-access” means the CPU can read.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
1/21 Cell Processor Systems Seminar Diana Palsetia (11/21/2006)
HPEC-1 SMHS 7/7/2016 MIT Lincoln Laboratory Focus 3: Cell Sharon Sacco / MIT Lincoln Laboratory HPEC Workshop 19 September 2007 This work is sponsored.
● Cell Broadband Engine Architecture Processor ● Ryan Layer ● Ben Kreuter ● Michelle McDaniel ● Carrie Ruppar.
Gedae, Inc. Implementing Modal Software in Data Flow for Heterogeneous Architectures James Steed, Kerry Barnes, William Lundgren Gedae, Inc.
Operating System Overview
Basic Computer Hardware and Software.
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Basic Computer Hardware and Software.
2PAD’s Beamforming Software
Chapter 1 Introduction.
Presentation transcript:

Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc HPEC 2007: Multicore Processors and Their Impact on DoD HPEC Systems

The Software Architecture Makes Hardware Refreshes Difficult HPEC 2007 Gedae, Inc. 1 PE-0 Code 0 PE-1 Code 1 PE-2 Code 2 PE-3 Code 3 PE-A Code A PE-B Code B PE-C Code C PE-D Code D PE-E Code E PE-F Code F Old System New System

Problems that Reduce Software Portability Languages and compilers are based on serial processors Software architecture is buried in the code Differences in multiprocessor/multicore hardware necessitate changes to the software architecture –Number of processors –Interconnection –Bandwidth –Processor speed –Memory size –Memory structure Gedae mitigates the risk of porting software by automating the incorporation of the software architecture HPEC 2007 Gedae, Inc. 2

Application Environment Search and track using four audio channels Display using camera directed by pan-tilt unit HPEC 2007 Gedae, Inc. 3 Playstation 3 PPE USB Audio USB Gimbal Sys Mem Sys Mem SPE BUSBUS BUSBUS

Stages in Development Developed as simulation with file input and rendered output Deployed on quad PowerPC board, processing in real time at limited frame rate Hardware refresh to Cell Broadband Engine processor, processing more frames per second HPEC 2007 Gedae, Inc. 4

System Specifications SimulationMercury AdapDevPlaystation 3 Processors14 PowerPC AltiVec (500 MHz), 1 Pentium 1 PPE, 6 SPEs SensorsDatafile of 4 recorded channels ICS 610 ADC PCI Board, 4 microphones M-Audio Quattro USB Device, 4 microphones OutputConstellation display Directed Perception D46-17 Pan-Tilt Unit UIRendered scene Matrix Vision BlueFOX USB Camera displayed using Video for Gedae HPEC 2007 Gedae, Inc. 5

Algorithm Specified in Gedae HPEC 2007 Gedae, Inc Channel Audio Input Low Pass Filtering Correlate the 4 Channels Detection Algorithm Pan and Tilt Camera Same flow graph used for simulation, quad DSP board, and Cell/B.E. processor

Simulation Using Gedae-Sim Audio data captured from actual model train, recorded to file Simulated 3-d environment created with train, track, camera, and light Scene rendered from vantage- point of camera Gedae-Sim used to verify algorithm and run on multiple virtual processors HPEC 2007 Gedae, Inc. 7 Spot Formation Output Rendered Scene

Mercury AdapDev Pentium III development host –1.26 GHz –1 GB SDRAM Quad PowerPC 500MHz (MCP7410) –AltiVec instruction set –2 MB L2 cache –256 MB SDRAM –DMA engines RACE++ switched-fabric architecture HPEC 2007 Gedae, Inc. 8

Mercury AdapDev Implementation HPEC 2007 Gedae, Inc. 9 Put Preprocessing of 4 channels in 4 partitions Add Correlation to 0-th partition DMA between processors Map partitions to 4 PowerPCs Nonblocking transfer of audio data from host to PowerPCs Strip mine for cache performance

Cell/B.E. Architecture Power Processing Element (PPE) 8 Synergistic Processing Elements (SPE) –VMX SIMD instruction set –DMA engines –256 kB Local Storage (LS) System Memory Element Interconnect Bus (EIB) –Over 200 GB/s HPEC 2007 Gedae, Inc. 10

Cell/B.E. Implementation Alter implementation to use 6 SPEs Alter implementation to fit in the SPEs’ 256KB Local Storage Maximize use of SPEs HPEC 2007 Gedae, Inc Put Preprocessing of 4 channels in 4 partitions Strip mine to reduce memory footprint Map partitions to 6 SPEs Use 2 SPEs to perform 1 st stage of correlation

Gedae was used to easily move the application to new hardware Changes to the implementation were handled by automation and simple GUIs, not changes to code High performance gains were realized with minimal effort Results TargetProgrammer HoursPerformance Simulation4 weeks- Mercury AdapDev6 hours3 Hz Cell/B.E.2 hours15 Hz HPEC 2007 Gedae, Inc. 12