Download presentation
Presentation is loading. Please wait.
Published byAubrey Caldwell Modified over 9 years ago
1
Progress on media processor design Xiaolang Yan (yan@vlsi.zju.edu.cn)yan@vlsi.zju.edu.cn Xing Qin (qinx@vlsi.zju.edu.cn)qinx@vlsi.zju.edu.cn Jian Yang (yangj@vlsi.zju.edu.cn)yangj@vlsi.zju.edu.cn Xiaohua Luo (luoxh@vlsi.zju.edu.cn)luoxh@vlsi.zju.edu.cn Peiyong Zhang (zhangpy@vlsi.zju.edu.cn)zhangpy@vlsi.zju.edu.cn Dake Liu (dake@isy.liu.se)dake@isy.liu.se Embedded DSP Research & Develop Group Presented by Chunyue Liu (liucy@vlsi.zju.edu.cn)liucy@vlsi.zju.edu.cn
2
Outline Overview of media processor Progress on Spock Progress on Schubert - Overview - Key features - Performance Conclusions & Problems
3
Background and Challenges Media applications have very high computation complexity - H.264 encoding of 720 x 576 pixels @ 30 frames /s up to 30 GOPS Media processor is on the demand - Some state of art Media Processors (e.g. Nomatic, da Vinci) Multiple standards coexist - Flexible & programmable Our current IC design level constraint (200MHz@.18um) ASIP is the best choice Our proposal on IC-DFN’05
4
Overview of media processor Programmable and heterogeneous processors on a SoC platform - General MCU (CK510, a 32-bit RISC core) Interface (GUI), Os (Linux) - Enhanced DSP (Spock) Audio processing, Bitstream parsing, Data transferring - Vector processor (Schubert) Video processing
5
Outline Overview of media processor Progress on Spock Progress on Schubert - Overview - Key features - Performance Conclusions & Problems
6
Progress on Spock Developed tools chain - Assembler, Simulator and Debugger FPGA prototype: real time decoding -128kb/s OGG @ 40MHz To test Spock, Dual-core SoC platform is developed - Integrated with CK510 - Inter-processor communication uses mailbox and shared memory -.18um, less than 500mw,166MHz - CK510 core area: 2 x 2 mm 2 - Spock core area: 1.5 x 1.5 mm 2
7
Overview of Spock Optimization for Control - Branch optimization: conditional execution 2-level hardware loop, repeat Optimization for Signal Processing - Multiple addressing mode: Post address ++/-- Reverse/module addressing - MAC with parallel load - VLX instruction set extension: putbits, showbits, getbits, etc.
8
Outline Overview of media processor Progress on Spock Progress on Schubert - Overview - Key features - Performance Conclusions & Problems
9
Progress on Schubert Application coverage to function coverage SW-HW partition: 10%-90% locality Assembly instruction set specification Design of Assembler and Simulator Build golden model Benchmark instruction set Behavior function verification Micro-architecture design RTL coding Backend design Design for test RTL code verification Test chip fabrication & test board prototype Good performance? Design Methodology Released 316 novel instructions - SIMD and RISC Developed tools chain - Assembler - Cycle-accurate Simulator Mapped kernels H.264/AVC - IT/IIT, Intra/inter-prediction - de-blocking, Motion estimation MPEG2 - DCT, Motion compensation Micro-Architecture is designed estimated area: 3.5 x 3.5 mm 2 @.18um with a 70KB SRAM
10
Key features of Schubert Dual clusters and dual coupling pipelines - SIMD combined with VLIW architecture Explicit Data Organization SIMD (EDO-SIMD) 2-Dimensional and byte-align addressing storage Cycle accurate instruction set simulator
11
Dual clusters and dual coupling pipelines Two clusters: - Cluster0: Computation (+/-,*,&,>/<,etc.) - Cluster1: Data conversion & LD/ST - Based on Decoupled Access & Execution (DAE) Two pipelines: - Each cluster holds its own executive-level pipeline - Share the IF & ID level pipeline Advantages - Parallelize computation operations with non-computation operations - Perform well on cycle count
12
Dual clusters and dual coupling pipelines
13
Explicit Data Organization SIMD ISA Bottleneck of conventional SIMD ISA - SIMD is inefficient if sub-word data is unaligned each other - SIMD is less flexible than VLIW SIMD classVISMMX/SSEAltiVec Ld/St11.70%21.00%17.90% Organize9.70%12.60%17% Integer ALU13.60%18.80%11.80% Float ALU--9.30%6.90% Cycle percent of conventional SIMD ISA This overhead is reduced by Dual-Cluster How to reduce this overhead? Related works - Complex streamed instruction, Delft TU - Stream buffer, Stream processor, Stanford University - Indirect register addressing, Elite project, IBM
14
Explicit Data Organization SIMD ISA Proposed EDO-SIMD ISA - Explicit data organization information (e.g. 3x8|3:4:7:0:1:2:6:5) Indicate operand relations (align, merge, extract, broadcast, cross) - Append Permutation network onto the RF pipeline of Cluster0Append Permutation network onto the RF pipeline of Cluster0 - Add Permutation pipeline in the Cluster1 in parallel with AD0Add Permutation pipeline in the Cluster1 in parallel with AD0 Advantages - Merge organization with computation to reduce overhead - As flexible as VLIW - Simplified implementation interpolate DCT Intra predict IIT vOADD vR2, vR1, vR0
15
2-D stream storage and addressing Multimedia temporal data behavior - 2-D block by block - Row and column access - Byte alignment - Flexible block jumping Conventional 1-D addressing impose burdens on Computation Elements for address generation and address alignment tasks Related works - Linear addressing with circle buffer, Blackfin - Special transpose unit, Trimedia
16
2-D stream storage and addressing Proposed storage and addressing mode - 2-D stream storage (base, 2-D stride, 2-D offset) - Row and interleave data arrangement (row access & column access ) - Base update for block jump (UPDATE B0, OX0, OY0, B0) - C-like programming model is friendly to programmer asm: vLDOBR B0, 4, 2, vR0; C: for(i=0; i<8; i++) r [i] = b [2][4+i]; Advantages - Reduce addressing and aligning overhead (avoid transpose)
17
Cycle accurate instruction set simulator Useful for benchmarking and ISA design space exploration during early stage - Input is assemble text program not binary code - Focus on function not micro-architecture Consist of - Resource modeling - ISA function modeling at each pipeline - Behavior and timing modeling - Debug and profiling support 3 men for 2 months work, about 60,000 lines C++ code
18
Benchmarking and performance Mapped benchmarks: - Full H.264 baseline decoder kernels like integer transform, intra predict, interpolation and de-blocking. - H.264 fast motion estimation - MPEG2 motion compensation and DCT/IDCT The cycle accurate and function correct programs help: - Make assembler, simulator more robust - Demonstrate the performance of ISA - Explore and refine ISA (more than 900 instructions are refined to 316 in the end ) Performance - 4-CIF(704x576) H.264 baseline real-time decoder @ 200MHz - 16 kB code size for H.264 baseline decoder Cycles for 8x8 IDCT with IEEE compliant precision 0 100 200 300 400 500 600 RISC- Media[10] MMX TMS320C6xNEC V830VIRAM Proposed
19
Outline Overview of media processor Progress on Spock Progress on Schubert - Overview - Key features - Performance Conclusions & Problems
20
Conclusions Integration of a general MCU with heterogeneous ASIPs in a SoC platform is a good choice for media processing in China - a good trade-off between performance and flexibility - overcome our IC design level constraint(200MHz@.18um) Progress on our Media processor - CK510 and Spock is finished - A dual-core SoC of CK510 and Spock is taped out - Novel features of Schubert are verified and the RTL implement is on-going
21
Problems Application coverage to function coverage SW-HW partition: 10%-90% locality Assembly instruction set specification Design of Assembler and Simulator Build golden model Benchmark instruction set Behavior function verification Micro-architecture design RTL coding Backend design Design for test RTL code verification Test chip fabrication & test board prototype Good performance? Behavior Synthesis tool The Behavior synthesis stage in our ASIP design depends on human experience not tools, which takes too much effort. It is very valuable to research and develop CAD tools for design space exploration of ASIP ISA and ASIP SoC communication during the early stage
22
Thank you!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.