What Choices Make A Killer Video Processor Architecture?

Slides:



Advertisements
Similar presentations
What Choices Make A Killer Video Processor Architecture? Jonah Probell Ultra Data Corp
Advertisements

Designing Embedded Hardware 01. Introduction of Computer Architecture Yonam Institute of Digital Technology.
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
ECSE DSP architecture Review of basic computer architecture concepts C6000 architecture: VLIW Principle and Scheduling Addressing Assembly and linear.
DesignCon 2005 The Trade-Offs of Software Programmability in Video Processors Jonah Probell Sorin Cismas Amit Gulati Steve Leibson.
Design center Vienna Donau-City-Str. 1 A-1220 Vienna Vers SVEN Scalable Video Engine Gerald Krottendorfer.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Computer Architecture & Organization
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Instruction Level Parallelism (ILP) Colin Stevens.
1 Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Basics and Architectures
MPEG Motion Picture Expert Group Moving Picture Encoded Group Prateek raj gautam(725/09)
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.
By: Hitesh Yadav Supervising Professor: Dr. K. R. Rao Department of Electrical Engineering The University of Texas at Arlington Optimization of the Deblocking.
Ch. 2 Data Manipulation 4 The central processing unit. 4 The stored-program concept. 4 Program execution. 4 Other architectures. 4 Arithmetic/logic instructions.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
Aug 25, 2005 page1 Aug 25, 2005 Integration of Advanced Video/Speech Codecs into AccessGrid National Center for High Performance Computing Speaker: Barz.
Computer Architecture Memory, Math and Logic. Basic Building Blocks Seen: – Memory – Logic & Math.
Copyright © 2003 Texas Instruments. All rights reserved. DSP C5000 Chapter 18 Image Compression and Hardware Extensions.
Computer Science/Ch.3 Data Manipulation 3-1 Chapter 3 Data Manipulation.
The Alpha Thomas Daniels Other Dude Matt Ziegler.
1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.
Case Study: Implementing the MPEG-4 AS Profile on a Multi-core System on Chip Architecture R 楊峰偉 R 張哲瑜 R 陳 宸.
CSC 360- Instructor: K. Wu Review of Computer Organization.
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
PRESENTED BY: MOHAMAD HAMMAM ALSAFRJALANI UFL ECE Dept. 3/31/2010 UFL ECE Dept 1 CACHE OPTIMIZATION FOR AN EMBEDDED MPEG-4 VIDEO DECODER.
Nios II Processor: Memory Organization and Access
Lynn Choi School of Electrical Engineering
x86 Processor Architecture
Basic Computer Hardware & Software
System On Chip.
Embedded Systems Design
Presented by: Tim Olson, Architect
Architecture & Organization 1
Computer Architecture
Multi-core SOC for Future Media Processing
Vector Processing => Multimedia
Computer Architecture
Basic Computer Hardware and Software.
This chapter provides a series of applications.
Number Representations and Basic Processor Architecture
Directory-based Protocol
Architecture & Organization 1
Text Book Computer Organization and Architecture: Designing for Performance, 7th Ed., 2006, William Stallings, Prentice-Hall International, Inc.
The CA1024: A Massively Parallel Processor for Cost-Effective HDTV
Comparison of Two Processors
Computer Organization
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Computer Architecture
Digital Signal Processors-1
Computer Architecture
Hardware Organization
Graphics Processing Unit
Computer Architecture
ADSP 21065L.
Presentation transcript:

What Choices Make A Killer Video Processor Architecture? Jonah Probell jonah@ultradatacorp.com Ultra Data Corp www.ultradatacorp.com

© Copyright 2004 Jonah Probell Outline Overview of Ultra Data UD3000 Software programmability Parallelism VLIW SIMD Multiprocessing Appropriate use of on- and off-chip memory Optimal organization of data structures in DRAM Deterministic performance 5-port regfile 2-port on-chip memory DMA controller instead of caches © Copyright 2004 Jonah Probell

Nobody’s Video Decoder Chip SDRAM SATA & I2C busses Optics sled Host / audio processor SDRAM controller I2C, SATA, timers DVD optical interface Peripheral bus bridge peripheral bus high-speed interconnect Video Decode Processor Video post-processing Audio output I2S / SPDIF / raw Video output S-video / raw 24-bit RGB or 8/16-bit YCrCb Audio / Video DACs © Copyright 2004 Jonah Probell

The Ultra Data UD3000 … … 2-port DMEM 2-port DMEM FIFO FIFO Test & Set System Bus Bridge Crossbar Switch Fabric Outer Loop Processor 0 Outer Loop Processor 1 Inner Loop Processor Inner Loop Processor 1 Inner Loop Processor 2 Smart 2-D DMA Controller instruction extensions instruction extensions © Copyright 2004 Jonah Probell

© Copyright 2004 Jonah Probell H.264 Main Profile Decode CA VLC CABAC OLP 0 load prediction source inverse transform DMA ctrl store block interpolation ILP 0 apply deltas ILP 1 Deblocking thresholds Deblocking Filter OLP 1 ILP 2 © Copyright 2004 Jonah Probell

The Inner Loop Processor Switch Fabric IMEM Control Unit 32-bit RISC Program Counter Loads & Stores Data Aligner 3-port Regfile 32 32 Vector Unit 64-bit SIMD data Multiply Acc Data packing 5-port Regfile 32 64 © Copyright 2004 Jonah Probell

Video Codec Standards ITU-T standards H.261 H.263 ITU-T / MPEG joint standards H.262 / MPEG-2 H.264 / MPEG-4 Part 10 AVC MPEG standards MPEG-1 MPEG-4 On2 Technologies standards VP3 VP4 VP5 VP6 DivX Networks standard DivX Microsoft standard Windows Media Video 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 © Copyright 2004 Jonah Probell

VLIW Parallelism data memory regfile VLIW DSP program load multiply shift store add branch load multiply store shift branch add program sequencer regfile ALU + - x & | ! >> << VLIW DSP program sequential DSP program © Copyright 2004 Jonah Probell

© Copyright 2004 Jonah Probell SIMD Parallelism frame of macroblocks macroblock of pixels 8x8 block of pixels 4x4 block of pixels © Copyright 2004 Jonah Probell

Multiprocessor Parallelism symmetric parallel multiprocessing pipelined multiprocessing video codec system video codec system motion estimation prediction CPU 0 motion estimation prediction CPU 0 transform & compression CPU 1 transform & compression CPU 1 deblocking CPU 2 deblocking CPU 2 software hardware software hardware © Copyright 2004 Jonah Probell

Data Bandwidths SDRAM temporary data storage video chip display device bitstream source video chip display device © Copyright 2004 Jonah Probell

DRAM Optimal Data Ordering : 1k byte rows Frame mapped to DRAM rows as a C-style two-dimentional array Frame mapped to DRAM rows as square groups © Copyright 2004 Jonah Probell

Deterministic Performance © Copyright 2004 Jonah Probell

The Inner Loop Processor Switch Fabric IMEM Control Unit 32-bit RISC Program Counter Loads & Stores Data Aligner 3-port Regfile 32 32 Vector Unit 64-bit SIMD data Multiply Acc Data packing 5-port Regfile 32 64 © Copyright 2004 Jonah Probell

The Ultra Data UD3000 … … 2-port DMEM 2-port DMEM FIFO FIFO Test & Set System Bus Bridge Crossbar Switch Fabric Outer Loop Processor 0 Outer Loop Processor 1 Inner Loop Processor Inner Loop Processor 1 Inner Loop Processor 2 Smart 2-D DMA Controller instruction extensions instruction extensions © Copyright 2004 Jonah Probell

A Killer Video Processor Architecture Software programmability Parallelism VLIW SIMD Multiprocessing Appropriate use of on- and off-chip memory Optimal organization of data structures in DRAM Deterministic performance 5-port regfile 2-port on-chip memory DMA controller instead of caches © Copyright 2004 Jonah Probell

© Copyright 2004 Jonah Probell Acknowledgements This presentation is © Copyright 2004 Jonah Probell ALL RIGHTS RESERVED. Certain information for this document was derived from publicly available documents of Ultra Data Corp., UB Video Inc., On2 Technologies Inc., and Wikipedia. All trademarks mentioned in this document are property of their respective owners and are hereby acknowledged. Jonah Probell jonah@ultradatacorp.com (781) 209-0886 © Copyright 2004 Jonah Probell