Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.

Slides:



Advertisements
Similar presentations
What Choices Make A Killer Video Processor Architecture? Jonah Probell Ultra Data Corp
Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
DSPs Vs General Purpose Microprocessors
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
ARM-DSP Multicore Considerations CT Scan Example.
Design center Vienna Donau-City-Str. 1 A-1220 Vienna Vers SVEN Scalable Video Engine Gerald Krottendorfer.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Real-Time Video Analysis on an Embedded Smart Camera for Traffic Surveillance Presenter: Yu-Wei Fan.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Introduction.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
Video on DSP and FPGA John Johansson April 12, 2004.
Introduction to Microprocessors Number Systems and Conversions No /6/00 Chapter 1: Introduction to 68HC11 The 68HC11 Microcontroller.
Analysis, Fast Algorithm, and VLSI Architecture Design for H
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Multicore Design Considerations. Multicore: The Forefront of Computing Technology “We’re not going to have faster processors. Instead, making software.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
1 Background The latest video coding standard H.263 -> MPEG4 Part2 -> MPEG4 Part10/AVC Superior compression performance 50%-70% bitrate saving (H.264 v.s.MPEG-2)
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
 Coding efficiency/Compression ratio:  The loss of information or distortion measure:
Overview Introduction The Level of Abstraction Organization & Architecture Structure & Function Why study computer organization?
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
1 SERIAL PORT INTERFACE FOR MICROCONTROLLER EMBEDDED INTO INTEGRATED POWER METER Mr. Borisav Jovanović, Prof.dr Predrag Petković, Prof.dr. Milunka Damnjanović,
MOI PROJECT Gugulethu Mabuza Bachelor Science Electrical Engineering Michigan State University.
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket.
® SPARTAN Series High Volume System Solution. ® Spartan/XL Estimated design size (system gates) 30K 5K180K XC4000XL/A XC4000XV Virtex S05/XL.
Chapter 1 — Computer Abstractions and Technology — 1 Understanding Performance Algorithm Determines number of operations executed Programming language,
TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.
1 A high-level simulator for the H.264/AVC decoding process in multi-core systems Florian H. Seitner, Ralf M. Schreier, Michael Bleyer, Margrit Gelautz.
Processor and Memory Organisation By: Prof. Mahendra B. Salunke Asst. Prof., Department of Computer Engg, SITS, Pune-41 URL:
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.
Chapter 1 Introduction. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 1, Slide 2 Learning Objectives  Why process signals.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
Overview von Neumann Architecture Computer component Computer function
Case Study: Implementing the MPEG-4 AS Profile on a Multi-core System on Chip Architecture R 楊峰偉 R 張哲瑜 R 陳 宸.
CSC 360- Instructor: K. Wu Review of Computer Organization.
PRESENTED BY: MOHAMAD HAMMAM ALSAFRJALANI UFL ECE Dept. 3/31/2010 UFL ECE Dept 1 CACHE OPTIMIZATION FOR AN EMBEDDED MPEG-4 VIDEO DECODER.
Instructor: Chapter 2: The System Unit. Learning Objectives: Recognize how data is processed Understand processors Understand memory types and functions.
Vector computers.
Howd - Zur Hung Eric Lai Wei Jie Lee Yu - Chiang Lee Design Manager: Jonathan P. Lee [M2] Huffman Encoder Project Presentation #3 February 7 th, 2007 Overall.
Digital Signal Processor HANYANG UNIVERSITY 학기 Digital Signal Processor 조 성 호 교수님 담당조교 : 임대현
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
TI Information – Selective Disclosure
ARM Embedded Systems
CS161 – Design and Architecture of Computer Systems
Parallelizing an Image Compression Toolbox
Computer Architecture & Operations I
System On Chip.
Embedded Systems Design
Presented by: Tim Olson, Architect
QuickPath interconnect GB/s GB/s total To I/O
Introduction.
Computers © 2005 Prentice-Hall, Inc. Slide 1.
Vector Processing => Multimedia
Introduction to Digital Signal Processors (DSPs)
Chapter 1 Introduction.
Computer Organization
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
What Choices Make A Killer Video Processor Architecture?
Presentation transcript:

Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004

Real-Time HD CONFIDENTIAL | 2 Agenda TVP2000 Processor overview Architecture highlights Performance Benchmarks Software tools Status Roadmap

Real-Time HD CONFIDENTIAL | 3 Telairity’s Market Leading supplier of Video chips for the Broadcast, Professional and Digital Imaging markets

Real-Time HD CONFIDENTIAL | 4 EagleEye - Video Encoder Module TVP2000 Video Processor Voltage Regulator DRAM 512MB DDR2 SPI Clk – 67.5 MHz – 135 Mbps Serial 20 Bit YCbCr Interrupt +5 Volts Video In Compressed Video Out Reset DIMM MODULE 20 Bit YCbCr Reconstructed Video Out

Real-Time HD CONFIDENTIAL | 5 TVP2000-Video Processor Video Controller 512 MB DRAM (8-DDR2) Processor P0 TVP400 Processor P1 TVP400 Processor P2 TVP400 Processor P3 TVP400 Processor P4 TVP400 DMA & SDRAM Controller 128 bit Bit Packing Unit

Real-Time HD CONFIDENTIAL | 6 0 TVP400 – Vector DSP Core I/O Interface Vector Registers 128KB VECTOR SRAM Scalar Registers Scalar Unit 8 KB Scratch 32KB I Cache 4 KB D Cache DMA ControllerPIO Controller Vector Units bit 8GB/s DRAM 8GB/s 48GB/s 12GB/s 24GB/s 8GB/s

Real-Time HD CONFIDENTIAL | 7 H.264 Partitioning & Performance Budget Sub-sample2% Motion estimation40% Transform & Quantization5% Transform size & rate control6% Reorder2% Entropy coding20% Inverse quantization & transform5% De-blocking filter4% Up-sample2% System control4% Total 90%

Real-Time HD CONFIDENTIAL | 8 TVP Performance Benchmark Motion estimation ~50% of problem Typically implemented in a programmable machine Hardwired approaches are not necessarily applicable N-Step Search algorithm was chosen : Exposes the need for a “Sum of Absolute Differences” compound instruction Exposes the cache memory line splitting problem Exposes the cache memory line replacement efficiency Exposes the inherent parallelism available in the algorithm

Real-Time HD CONFIDENTIAL | 9 TVP Entropy Coding CABAC Cycle count for Binarization of Arithmetic Encoding 8 – 4*4 Transform blocks, 9 non zero coefficients Benchmark done on TVP2000 1GHz Apogee C compiler only and with vector intrinsics AMD Opteron 2.4GHz GCC-O2 compiler Results TVP GHz C only 201 GHz w/ Vector intrinsics AMD 2.4GHz TVP2000 chip is ~ 49 times more powerful than AMD Opteron chip for Binarization of CABAC encoding

Real-Time HD CONFIDENTIAL | 10 Scalable Encoders Broadcast Applications Video Quality 4:2:2, 10b TVP2000 4:2:2, 8b 4:2:0, 8b

Real-Time HD CONFIDENTIAL | 11 N-Step-Search Algorithm This Algorithm is most widely known in its three-step form, the three-step-search (TSS)