Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stream Architecture: Rethinking Media Processor Design

Similar presentations


Presentation on theme: "Stream Architecture: Rethinking Media Processor Design"— Presentation transcript:

1 Stream Architecture: Rethinking Media Processor Design
Scott Rixner April 9, 2001 Rice University Computer Systems Laboratory

2 Media Processing Video/image compression & decompression
MPEG, JPEG, ... Signal Processing DSL modems, cellular base stations, ... Image synthesis Polygon rendering, image-based rendering, ... Image understanding Face recognition, depth extraction, ... Scott Rixner Stream Architecture

3 Stereo Depth Extraction
Left Camera Image Right Camera Image 30 fps Requirements 11 GOPS Imagine stream processor 12.1 GOPS, 4.6 GOPS/W Depth Map Scott Rixner Stream Architecture

4 Outline Stream Processing VLSI Constraints Register Organization
Imagine Conclusions Scott Rixner Stream Architecture

5 Media Processing Characteristics
Low-precision data 24% 8-bit integer operations 29% 16-bit integer operations Abundant data-parallelism Little global data reuse Average of 1.5 references per global data word Numerous computations per global reference operations per global data reference Scott Rixner Stream Architecture

6 Stream Processing Stream Input Data Kernel Output Data
SAD Kernel Stream Input Data Output Data Image 1 convolve Image 0 Depth Map Little data reuse (pixels never revisited) Highly data parallel (output pixels not dependent on other output pixels) Compute intensive (>60 operations per memory reference) Scott Rixner Stream Architecture

7 Locality and Concurrency
Operations within a kernel operate on local data Kernels can be partitioned across chips to exploit control parallelism Image 0 convolve convolve SAD Depth Map Image 1 convolve convolve Streams expose data parallelism Scott Rixner Stream Architecture

8 Sony PlayStation2 Emotion Engine FPU MIPS Core VPU0 VPU1 Graphics
Synthesizer Display IPU RDRAM, I/O, DMAC, etc. Scott Rixner Stream Architecture

9 Special vs. General Purpose
Special Purpose Fixed function High performance General Purpose Programmable Insufficient performance Instruction Cache IR IP Registers Scott Rixner Stream Architecture

10 Register Files Dwarf ALUs
Scott Rixner Stream Architecture

11 Register File Area Each cell requires: Each cell grows as p2
1 word line per port 1 bit line per port Each cell grows as p2 R registers in the file Area: p2R µ N3 Register Bit Cell Scott Rixner Stream Architecture

12 Register File Access Delay
Signal must traverse: Word line to access cell Bit line to transfer data Wire capacitance dominates Delay: pR1/2 µ N3/2 Register File Scott Rixner Stream Architecture

13 Register File Power Dissipation
100% utilization requires driving all pR1/2 bit lines Wire capacitance dominates Power: p2R µ N3 Register File Scott Rixner Stream Architecture

14 Centralized Register Organization
Area, Power µ N3, Delay µ N3/2 Scott Rixner Stream Architecture

15 Partitioned Organizations
SIMD Data-parallel axis Distributed Register Files (DRF) Instruction-level parallel axis Hierarchical Memory hierarchy axis Stream Optimizing for streams Scott Rixner Stream Architecture

16 SIMD Register Organization
Area, Power µ N3/C2, Delay µ (N/C)3/2 Scott Rixner Stream Architecture

17 Distributed Register Organization
Area, Power µ N2, Delay µ N Scott Rixner Stream Architecture

18 Combining SIMD and DRF Scalar SIMD Central DRF Scott Rixner
Stream Architecture

19 Hierarchical Register Organization
Hierarchical T=40 Area, Power µ N3, Delay µ N3/2 Scott Rixner Stream Architecture

20 Hierarchical Organizations
Scalar SIMD Central DRF Scott Rixner Stream Architecture

21 Stream Register Organization
Area, Power µ N2/C, Delay µ N/C Scott Rixner Stream Architecture

22 Stream Organizations Scalar SIMD Central DRF Scott Rixner
Stream Architecture

23 Comparison of Organizations
48 ALUs (32-bit), 500 MHz Stream organization improves central organization by Area: 195x, Delay: 20x, Power: 430x Scott Rixner Stream Architecture

24 (8% with latency constraints)
Performance 16% Performance Drop (8% with latency constraints) 180x Improvement Scott Rixner Stream Architecture

25 Stream Architecture Stream Processing Stream Register Organization
Matched to media processing Exposes locality and concurrency Stream Register Organization Efficiency of special-purpose hardware Optimized for streaming applications Data bandwidth Bandwidth hierarchy Memory access scheduling Conditional streams Scott Rixner Stream Architecture

26 The Imagine Stream Processor
Stream Register File Network Interface Stream Controller Imagine Stream Processor Host Processor ALU Cluster 0 ALU Cluster 1 ALU Cluster 2 ALU Cluster 3 ALU Cluster 4 ALU Cluster 5 ALU Cluster 6 ALU Cluster 7 SDRAM Streaming Memory System Microcontroller Scott Rixner Stream Architecture

27 Arithmetic Clusters Communication Unit Scratch-pad Register File
Intercluster Network Local Register File + + + * * / CU To SRF Cross Point From SRF Scott Rixner Stream Architecture

28 Bandwidth Hierarchy SDRAM ALU Cluster ALU Cluster SDRAM Register File Stream SDRAM SDRAM ALU Cluster 2GB/s 32GB/s 544GB/s bit operations per word of memory bandwidth Scott Rixner Stream Architecture

29 Stream Recirculation Scott Rixner Stream Architecture

30 Bandwidth Demands of FIR Filter
Scott Rixner Stream Architecture

31 Bandwidth Utilization of FIR Filter
Scott Rixner Stream Architecture

32 Performance floating-point application 16-bit kernels 16-bit
applications 16-bit kernels floating-point kernel Scott Rixner Stream Architecture

33 Power GOPS/W: 4.6 6.9 4.1 10.2 9.6 2.4 6.3 Scott Rixner
Stream Architecture

34 Relative Performance and Power Efficiency
FFT Performance Power Efficiency Scott Rixner Stream Architecture

35 Imagine Floorplan Tapeout ~Q2 ’01 21 million T’s Target: 32 FO4
6M SRF SRAM 6M UC SRAM 6M Clusters 3M Other Target: 32 FO4 300 MHz at SSSS 500 MHz at TTSS TI GS30KA: 0.15 mm Ldrawn 457 Signal Pins Scott Rixner Stream Architecture

36 Imagine Team William J. Dally Ujval Kapasi Brucek Khailany
Peter Mattson Jinyung Namkoong John Owens Ben Serebrin Brian Towles Scott Rixner Don Alpert (Intel) Ghazi Ben Amor Chris Buehler (MIT) JP Grossman (MIT) Brad Johanson Abelardo Lopez-Lagunas Ben Mowery Manman Ren Scott Rixner Stream Architecture

37 Conclusions Media Processing VLSI Imagine Little data reuse
Highly data parallel Compute intensive VLSI Stream register organization Bandwidth hierarchy Imagine Stream architecture 10 GOPS sustained application performance 5 GOPS/W application power efficiency Scott Rixner Stream Architecture


Download ppt "Stream Architecture: Rethinking Media Processor Design"

Similar presentations


Ads by Google