Download presentation
Presentation is loading. Please wait.
1
Dongkeun Oh Sanghamitra Roy
Acceleration of motion estimation by edge detection algorithm using PLX sub-word parallel ISA Dongkeun Oh Sanghamitra Roy
2
Low bit rate Video coding(1)
Block based algorithms H.263, MPEG-1,2 Good easy to implement, good image quality at low bit rates Bad Image quality degraded at very low bit rates
3
Low bit rate video coding (2)
Object or Segmentation based algorithm Subdividing an image into moving objects and background Good : Efficient compression rate Bad : Hard to implement Necessary condition Accurate representation of the shape of Objects
4
Edge detection for object recognition
Block is visually continuous and discontinuous Lines of discontinuous interface: edge Coded edges : structure of an image Edge detection Sobel Laplace Canny’s
5
Canny’s Edge detection
Stages 1. Gaussian Smoothing 2. First derivative for x,y of all pixels 3. Magnitude of the gradient 4. Non-maximal suppression 5. Use hysteresis to mark the edge pixels We simulate 2nd stages using PLX code
6
Derivative Mask Gx(z5)=(z6-z4) Gy(z5)=(z8-z2 ) z1 z2 z3 z4 z5 z6 z7 z8
-1 1 -1 1 Gx(z5)=(z6-z4) Gy(z5)=(z8-z2 )
7
C code for x-derivative calculation
for(r=0; r < rows; r++) { pos = r * cols; del_x[pos] = s[pos + 1] – s[pos]; for(c = 1; c < (cols – 1); c++, pos++) { del_x[pos] = s[pos + 1] – s[pos – 1]; } del_x[pos] = s[pos] – s[pos – 1]; Unfold
8
Loop unfolded C code for sub-word parallel implementation
for(r=0; r < 100; r++) { pos = r * cols; del_x[pos] = s[pos + 1] – s[pos]; for(c = 1; c < 24; c++, pos+= 4) { del_x[pos] = s[pos + 1] – s[pos – 1]; del_x[pos + 1] = s[pos + 2] – s[pos]; del_x[pos + 2] = s[pos + 3] – s[pos + 1]; del_x[pos + 3] = s[pos + 4] – s[pos + 2]; } …. del_x[pos] = s[pos] – s[pos – 1];
9
PLX sub-word parallel ISA
1, 2, 4, or 8 bytes sub-words 32 general purpose registers Aligned memory address 4/8 bytes SIMD instructions allow parallel operations with faster performance
10
Issues in PLX implementation
Interfacing with C code short int = 2 bytes use fwrite/fread to write/read binary data from C Memory aligned load load address: multiple of 4 bytes to avoid trap Load from aligned address and shift/add to get required sub-words Loops using predicated jump instruction
11
Results PLX FFCF, FFB5, FFB5, 0002 C
12
Snapshot of PLX code
13
Thanks !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.