Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dongkeun Oh Sanghamitra Roy

Similar presentations


Presentation on theme: "Dongkeun Oh Sanghamitra Roy"— Presentation transcript:

1 Dongkeun Oh Sanghamitra Roy
Acceleration of motion estimation by edge detection algorithm using PLX sub-word parallel ISA Dongkeun Oh Sanghamitra Roy

2 Low bit rate Video coding(1)
Block based algorithms H.263, MPEG-1,2 Good easy to implement, good image quality at low bit rates Bad Image quality degraded at very low bit rates

3 Low bit rate video coding (2)
Object or Segmentation based algorithm Subdividing an image into moving objects and background Good : Efficient compression rate Bad : Hard to implement Necessary condition Accurate representation of the shape of Objects

4 Edge detection for object recognition
Block is visually continuous and discontinuous Lines of discontinuous interface: edge Coded edges : structure of an image Edge detection Sobel Laplace Canny’s

5 Canny’s Edge detection
Stages 1. Gaussian Smoothing 2. First derivative for x,y of all pixels 3. Magnitude of the gradient 4. Non-maximal suppression 5. Use hysteresis to mark the edge pixels We simulate 2nd stages using PLX code

6 Derivative Mask Gx(z5)=(z6-z4) Gy(z5)=(z8-z2 ) z1 z2 z3 z4 z5 z6 z7 z8
-1 1 -1 1 Gx(z5)=(z6-z4) Gy(z5)=(z8-z2 )

7 C code for x-derivative calculation
for(r=0; r < rows; r++) { pos = r * cols; del_x[pos] = s[pos + 1] – s[pos]; for(c = 1; c < (cols – 1); c++, pos++) { del_x[pos] = s[pos + 1] – s[pos – 1]; } del_x[pos] = s[pos] – s[pos – 1]; Unfold

8 Loop unfolded C code for sub-word parallel implementation
for(r=0; r < 100; r++) { pos = r * cols; del_x[pos] = s[pos + 1] – s[pos]; for(c = 1; c < 24; c++, pos+= 4) { del_x[pos] = s[pos + 1] – s[pos – 1]; del_x[pos + 1] = s[pos + 2] – s[pos]; del_x[pos + 2] = s[pos + 3] – s[pos + 1]; del_x[pos + 3] = s[pos + 4] – s[pos + 2]; } …. del_x[pos] = s[pos] – s[pos – 1];

9 PLX sub-word parallel ISA
1, 2, 4, or 8 bytes sub-words 32 general purpose registers Aligned memory address 4/8 bytes SIMD instructions allow parallel operations with faster performance

10 Issues in PLX implementation
Interfacing with C code short int = 2 bytes use fwrite/fread to write/read binary data from C Memory aligned load load address: multiple of 4 bytes to avoid trap Load from aligned address and shift/add to get required sub-words Loops using predicated jump instruction

11 Results PLX FFCF, FFB5, FFB5, 0002 C

12 Snapshot of PLX code

13 Thanks !


Download ppt "Dongkeun Oh Sanghamitra Roy"

Similar presentations


Ads by Google