Ben Gaudette Michael Pfeister CSE 520 Spring 2010.

Slides:



Advertisements
Similar presentations
Graphics on a Stream Processor
Advertisements

Computer Organization, Bus Structure
Machine cycle.
ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.
© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
The University of Adelaide, School of Computer Science
Streaming Supercomputer Strawman Bill Dally, Jung-Ho Ahn, Mattan Erez, Ujval Kapasi, Tim Knight, Ben Serebrin April 15, 2002.
Intro Test 2 – Chapters 3,4 & Word Sample Questions SPRING 2005.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
Streaming Supercomputer Strawman Architecture November 27, 2001 Ben Serebrin.
Instruction Level Parallelism (ILP) Colin Stevens.
Technische universiteit eindhoven ‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’ Jorge Luis Borges (Argentine.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
Technische universiteit eindhoven ‘Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.’ Jorge Luis Borges (Argentine.
Introduction to Systems Architecture Kieran Mathieson.
The Imagine Stream Processor Flexibility with Performance March 30, 2001 William J. Dally Computer Systems Laboratory Stanford University
Jan 30, 2003 GCAFE: 1 Compilation Targets Ian Buck, Francois Labonte February 04, 2003.
Processor Architecture Kieran Mathieson. Outline Memory CPU Structure Design a CPU Programming Design Issues.
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
Lecture 8 Shelving in Superscalar Processors (Part 1)
1Hot Chips 2000Imagine IMAGINE: Signal and Image Processing Using Streams William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong,
Chapter One Introduction to Pipelined Processors.
CSE 690: GPGPU Lecture 4: Stream Processing Klaus Mueller Computer Science, Stony Brook University.
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.
Design of a RISC Processor Compatible with ARM Instruction Set AHMET GÜRHANLI LAB: BL405 SUPERVISER: 陳中平 教授.
Chapter One Introduction to Pipelined Processors.
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
RICE UNIVERSITY DSP architectures for wireless communications Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston.
Stages of Processing.  When a computer is given instructions, a series of tasks must take place in order for a result to be accomplished  To accomplish.
The Imagine Stream Processor Concurrent VLSI Architecture Group Stanford University Computer Systems Laboratory Stanford, CA Scott Rixner February.
Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery Concurrent VLSI Architecture.
RICE UNIVERSITY A real-time baseband communications processor for high data rate wireless systems Sridhar Rajagopal ECE Department Ph.D.
FLAC Audio Player An ability to decode files stored in the FLAC format. An ability to select files stored on the device. An ability to display visualization.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
DSP Architectural Considerations for Optimal Baseband Processing Sridhar Rajagopal Scott Rixner Joseph R. Cavallaro Behnaam Aazhang Rice University, Houston,
RICE UNIVERSITY On the architecture design of a 3G W-CDMA/W-LAN receiver Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia.
February 12, 1999 Architecture and Circuits: 1 Interconnect-Oriented Architecture and Circuits William J. Dally Computer Systems Laboratory Stanford University.
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
EKT303/4 Superscalar vs Super-pipelined.
The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Presenter: Lu Hao.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
An Overview of Parallel Processing
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Instruction level parallelism And Superscalar processors By Kevin Morfin.
Design-Space Exploration
Parallel Processing - introduction
A programmable communications processor for future wireless systems
William Stallings Computer Organization and Architecture
Christopher Han-Yu Chou Supervisor: Dr. Guy Lemieux
Overview Instruction Codes Computer Registers Computer Instructions
The fetch-execute cycle
Stream Architecture: Rethinking Media Processor Design
Figure 8.1 Architecture of a Simple Computer System.
Array Processor.
Compiler Supports and Optimizations for PAC VLIW DSP Processors
8051 Supplement.
Computer Structure S.Abinash 11/29/ _02.
CSC 4250 Computer Architectures
Figure 8.1 Architecture of a Simple Computer System.
Chapter 1 Introduction.
Software Development Approaches
Lecture 4: Advanced Pipelines
Appendix C Practice Problem Set 1
Presentation transcript:

Ben Gaudette Michael Pfeister CSE 520 Spring 2010

Project Description Background Stream Processing is a paradigm that exploits parallel processing via data parallelism Stream Processors include GPU’s, PPU’s, Cell Processor (with software support), and the Imagine/Storm-1. Project Goal Create a simulator for a Stream Processor based on the Imagine.

Imagine Processor - History Originally an academic based project. Lead by William Dally of Stanford Created Isim Students took all of the deliverables away and created a start up company: SPI

Imagine Processor - Resources The VLSI Implementation and Evaluation of Area- and Energy-Efficient Streaming Media Processors by Brucek Khailany _onesided.pdf _onesided.pdf Imagine Programming System User’s Guide by Peter Mattson. Imagine Home Page

Imagine Processor - Architecture

Imagine Processor – ALU Cluster 3 ADD units, 2 MUL, 1 DSQ, 1 SP, 1 COMM Each input has a 16 word Local Register File An Intracluster Switch is used to connect all FU outputs to all LRF’s.

Imagine Processor – “our” ALU Cluster 3 ADD units, 2 MUL, 1 DSQ Each FU has a 32 word Local Register File A perfect Intracluster Switch is used to connect all FU ouputs to all LRF’s.

Imagine Processor – ADD FU ADD unit is fully pipelined to 4 stages Instructions: FADD/FSUB – 4 cycles ADD/SUB – 2 cycles ILT/ILEFLT/FLE – 2 cycles IEQ/NEQ – 1 cycle AND/OR/XOR/NOT – 1 cycle FTOI – 3 cycles ITOF – 4 cycles

Imagine Processor – MUL FU MUL unit is fully pipelined to 4 stages Instructions: FMUL – 4 cycles IMUL – 4 cycles UMUL – 4 cycles

Imagine Processor – DSQ FU DSQ unit is not pipelined Instructions: FDIV – 17 cycles FSQRT – 16 cycles IDIV/UDIV – 22 cycles IDIVR/UDIVR – 23 cycles

Imagine Processor – SP & COMM FU SP unit 256 Word Scratchpad One Read Port One Write Port COMM unit Exchanges data between clusters when a stream is not completely data parallel.

Imagine Processor – SRF

Imagine Processor – Microcontroller