Presentation is loading. Please wait.

Presentation is loading. Please wait.

MPEG2 Video Encoding on Imagine November 16, 2000 Scott Rixner.

Similar presentations


Presentation on theme: "MPEG2 Video Encoding on Imagine November 16, 2000 Scott Rixner."— Presentation transcript:

1 MPEG2 Video Encoding on Imagine November 16, 2000 Scott Rixner

2 Imagine Architecture2 Programming Imagine  Architecture features –Data bandwidth management –Data-parallel clusters –Parallel-subword operations  Stream programming model –Natural data streams of application –Computation kernels perform “functions”  Challenge is to think in terms of streams instead of traditional C-style sequential code

3 Scott RixnerImagine Architecture3 Application Development (1)  Compose stream and kernel diagram –Identify natural streams in the application –Understand data-parallelism and how to map it to the clusters –Stream-oriented algorithmic choices  Write kernel code –C-like syntax –idebug enables quick non-performance, functional debugging –iscd/schedviz enables C-level performance tuning

4 Scott RixnerImagine Architecture4 Application Development (2)  Write stream code –First cut: simple mapping of stream/kernel diagram –idebug enables quick functional testing –Second cut: convert to macrocode (soon to be obsolete) –isim yields cycle-accurate simulation  Performance tuning –schedviz allows quick kernel tuning –appviz shows where application run-time is going

5 Scott RixnerImagine Architecture5 MPEG2 Encoding  Color Conversion (RGB  YCbCr)  Motion Estimation  Discrete Cosine Transform  Quantization  Run-level Encoding  Variable-length Coding  IDCTQ/Correlation for Reference Frame

6 Scott RixnerImagine Architecture6 Streams and Kernels

7 Scott RixnerImagine Architecture7 Imagine Programming Environment StereoDepthExtraction(…) { // Load Input Images... // Run Kernels convolve7x7 (RawImage,ConvImage); convolve3x3 (ConvImage,Conv2Image);... // Store Output } Convolve7x7(…) {... while(!In.empty()) {... p0 = k0 * in10; p12 = k21 * in32; p34 = k43 * in54; p56 = k65 * in76; sum = (p0 + p12) + (p34 + p56);... }

8 Scott RixnerImagine Architecture8 Imagine Programming Tools

9 Scott RixnerImagine Architecture9 KernelC loop_stream(datain) pipeline(1) { datain >> color1 >> color2 >> color3 >> color4; // c = 0.299R || 0.114B c1 = hi(mulrnd(RB_SCALE, shift(a1, 1))); c2 = hi(mulrnd(RB_SCALE, shift(a2, 1))); c3 = hi(mulrnd(RB_SCALE, shift(a3, 1))); c4 = hi(mulrnd(RB_SCALE, shift(a4, 1))); … Yout << hi(mulrnd(Ymadj, shift(temp0, 1)))+Yaadj; Yout << hi(mulrnd(Ymadj, shift(temp1, 1)))+Yaadj; first = hi(mulrnd((a1a3 - (z1 + z3)), C_SCALE)) + one_two_eight; second = hi(mulrnd((a2a4 - (z2 + z4)), C_SCALE)) + one_two_eight; first = commucperm(perm_a, first); second = commucperm(perm_b, second); CrCbout << select(low, first, second); }

10 Scott RixnerImagine Architecture10 7x7 Convolution Kernel ALUsComm/SPStreams Pipeline Stage 0 Pipeline Stage 1 Pipeline Stage 2

11 Scott RixnerImagine Architecture11 StreamC for (row=0; row<NROWS; row++) { // update quantization factor for rate control quantizerScale = newQuantizerScale; // setup streams for this row... // Perform I-Frame encoding convert(InputRow, &YRow, &CbCrRow); dct(YRow, dctIconstants, quantizerScale, &DCTYRow); dct(CbCrRow, dctIconstants, quantizerScale, &DCTCbCrRow); rle(DCTYRow, DCTCbCrRow, rleConstants, &RunLevelsRow); vlc(RunLevelsRow, &bitStream, &newQuantizerScale); // Store generated bit stream... // Generate reference image for subsequent P or B frames idct(DCTYRow, idctIconstants, quantizerScale, &RefYRow); idct(DCTCbCrRow, idctIconstants, quantizerScale, &RefCbCrRow); // Store reference rows... }

12 Scott RixnerImagine Architecture12 Macrocode for (int row = 0; row < mb_height; row++) { for (int col = 0; col < mb_width; col += iNumBlocks) { rts.write_ucr(1, image_size_param); rts.write_ucr(2, idxparams); rts.vect_op(idxgen, 0, 1, iframe.colorIndices); rts.vect_load(false, iframe.imageBuffer[even], iframe.colorIndices, memInputFrame, msg); rts.vect_op(icolor, 1, 2, "icolor conversion", iframe.imageBuffer[odd], iframe.blkY1dct, iframe.blkCrCb1dct); rts.write_ucr(1, quantizer_scale); rts.vect_op(dct, 2, 1, "Y dct", iframe.blkY1dct, dctIntraConsts, iframe.blkY2rle); rts.write_ucr(1, quantizer_scale); rts.vect_op(dct, 2, 1, "CrCb dct", iframe.blkCrCb1dct, dctIntraConsts, iframe.blkCrCb2rle); rts.write_ucr(1, 0); rts.write_ucr(2, quant_scale); rts.vect_op(rle, 4, 1, "RLE“ iframe.blkY2rle, iframe.blkCrCb2rle, rle_consts, zeroLength, UP(iframe.blkRunLevels[odd])); rts.vect_store(false, iframe.blkRunLevels[odd], memOutputFrame, msg); rts.write_ucr(1, iquantizer_scale); rts.vect_op(idct, 2, 1, "Y idct", iframe.blkY2rle, idctIntraConsts, iframe.blkY3); rts.write_ucr(1, iquantizer_scale); rts.vect_op(idct, 2, 1, "CrCb idct", iframe.blkCrCb2rle, idctIntraConsts, iframe.blkCrCb3); rts.write_ucr(1, 0); rts.vect_op(correlate, 4, 2, "correlate", iframe.blkY3, iframe.blkCrCb3, iframe.dummy_blkYMVref, iframe.dummy_blkCrCbMVref, iframe.blkYref[odd], iframe.blkCrCbref[odd]); rts.vect_store(false, iframe.blkYref[odd], memNewRefY, msg); rts.vect_store(false, iframe.blkCrCbref[odd], memNewRefCrCb, msg); }

13 Scott RixnerImagine Architecture13 Stereo Depth Extractor Load original packed row Unpack (8bit  16 bit) 7x7 Convolve 3x3 Convolve Store convolved row Load Convolved Rows Calculate BlockSADs at different disparities Store best disparity values ConvolutionsDisparity Search

14 Scott RixnerImagine Architecture14 Tools  idebug (functional simulator) –Built on top of visual studio (any C++ compiler)  iscd (kernel scheduler) –Generates optimized VLIW assembly from C-like code  isim (cycle-accurate simulator) –Simulates current Imagine architecture (configurable)  schedviz (schedule/application visualizer) –Interactive visualization of resource utilization  stream scheduler (run-time stream manager)

15 Scott RixnerImagine Architecture15 idebug  Macros and libraries  Enable Imagine StreamC/KernelC to be directly compiled by a C++ compiler  Enables the use of any C++ debugger to debug Imagine code  Can add arbitrary C++ code into the StreamC/KernelC for debugging –Function stubs –printf’s, etc.

16 Scott RixnerImagine Architecture16 Imagine Debugging

17 Scott RixnerImagine Architecture17 IDebug

18 Scott RixnerImagine Architecture18 iscd  Optimizing VLIW scheduler  Compiles KernelC  Currently supports –copy propagation & dead code elimination –software pipelining –loop unrolling –schedule randomization –inline functions (no function calls)  Configurable target architecture

19 Scott RixnerImagine Architecture19 isim  Similar application performance to RTL  ~4M cycles per hour (>1000 cycles per second)  Configurable –Machine description file (same file as for iscd) –# clusters, ALU mix/connection, memory system, etc.  Interactive command prompt –Debugging –Performance monitoring/reporting –Memory/file comparison

20 Scott RixnerImagine Architecture20 schedviz  Interactive schedule visualizer  Visual Basic  Shows resource utilization –Operation scheduling –Communication scheduling  Enables source-level performance optimization –Never look at assembly code!  Also view application execution –Cluster, memory, network utilization

21 Scott RixnerImagine Architecture21 Stream Scheduler (1)  Converts StreamC functions into Imagine operations  Allocates: operation issue slots stream-level registers stream register file (SRF) memory  Determines dependencies between operations

22 Scott RixnerImagine Architecture22 Stream Scheduler (2)  SRF allocation is critical –requires usage information –requires foreknowledge –too costly to perform at run time  Stream scheduler is profile based –run once with simple allocation –collect usage information –perform good allocation –run repeatedly with good allocation

23 Scott RixnerImagine Architecture23 Handling Large Streams  Strip mining  Double buffering

24 Scott RixnerImagine Architecture24 Stream Algorithms: Blocksearch Reference Image Row from Current Image Row 0 Row 1 Row 2 blocksearch Motion Vectors Reference row 0 Reference row 1 Reference row 2 Current row search region

25 Scott RixnerImagine Architecture25 MPEG2 Characteristics  Operations –56% 8-bit ADD/SUB  Little locality –1.47 accesses per word of global data  Computationally intense –155 operations per global data reference

26 Scott RixnerImagine Architecture26 Performance & Power  Raw Performance –360x288, 24-bit: 350 fps –720x486, 24-bit: 104 fps  Clusters provide high arithmetic bandwidth –27.6 GOPS on blocksearch kernel –17.9 GOPS overall  SRF provides necessary data locality, bandwidth –Only temporary data in off-chip memory are reference frames –2.4 GB/s required, 32 GB/s available  Power Efficiency: 10.7 GOPS/W

27 Scott RixnerImagine Architecture27 Bandwidth Hierarchy 2GB/s32GB/s SDRAM Stream Register File ALU Cluster 544GB/s

28 Scott RixnerImagine Architecture28 Stream Recirculation

29 Scott RixnerImagine Architecture29 MPEG Bandwidth

30 Scott RixnerImagine Architecture30 MPEG Execution

31 Scott RixnerImagine Architecture31 Challenges  VLC (Huffman Coding) –Difficult and inefficient to implement on clusters (SIMD on 32-bit data) –Instead, send RLE data over network to FPGA –Could add special-purpose Huffman coding stream unit  Rate Control –Difficult because multiple macroblocks encoded in parallel –Must perform on a coarser granularity (impact on picture quality?) –For smaller image sizes, can simply re-encode a group of macroblocks at a higher quantization level if necessary in real- time

32 Scott RixnerImagine Architecture32 Imagine Programming  Think in terms of streams  Range of software tools –Compilers –Visualizers –Simulators  Achieve new levels of performance –Less programming effort –Greater power efficiency

33 Scott RixnerImagine Architecture33 If-Statement Example if (case) { f(x); } else { g(x); } if (case) { strA << x; } else { strB << x; } PE0PE1PE2PE3 0 1 0 1 Case values Should PEs execute f( ) or g( )? PE0PE1PE2PE3 SRF0 SRF1 SRF2 SRF3 Shared Control 0 1 0 1 Case values Shared Control

34 Scott RixnerImagine Architecture34 Conditional Streams –Data streams that are accessed conditionally based on a local case value –Results in an arbitrary expansion or compression of stream in space and time


Download ppt "MPEG2 Video Encoding on Imagine November 16, 2000 Scott Rixner."

Similar presentations


Ads by Google