Dongkeun Oh Sanghamitra Roy

Dongkeun Oh Sanghamitra Roy
Acceleration of motion estimation by edge detection algorithm using PLX sub-word parallel ISA Dongkeun Oh Sanghamitra Roy

Low bit rate Video coding(1)
Block based algorithms H.263, MPEG-1,2 Good easy to implement, good image quality at low bit rates Bad Image quality degraded at very low bit rates

Low bit rate video coding (2)
Object or Segmentation based algorithm Subdividing an image into moving objects and background Good : Efficient compression rate Bad : Hard to implement Necessary condition Accurate representation of the shape of Objects

Edge detection for object recognition
Block is visually continuous and discontinuous Lines of discontinuous interface: edge Coded edges : structure of an image Edge detection Sobel Laplace Canny’s

Canny’s Edge detection
Stages 1. Gaussian Smoothing 2. First derivative for x,y of all pixels 3. Magnitude of the gradient 4. Non-maximal suppression 5. Use hysteresis to mark the edge pixels We simulate 2nd stages using PLX code

Derivative Mask Gx(z5)=(z6-z4) Gy(z5)=(z8-z2 ) z1 z2 z3 z4 z5 z6 z7 z8
-1 1 -1 1 Gx(z5)=(z6-z4) Gy(z5)=(z8-z2 )

C code for x-derivative calculation
for(r=0; r < rows; r++) { pos = r * cols; del_x[pos] = s[pos + 1] – s[pos]; for(c = 1; c < (cols – 1); c++, pos++) { del_x[pos] = s[pos + 1] – s[pos – 1]; } del_x[pos] = s[pos] – s[pos – 1]; Unfold

Loop unfolded C code for sub-word parallel implementation
for(r=0; r < 100; r++) { pos = r * cols; del_x[pos] = s[pos + 1] – s[pos]; for(c = 1; c < 24; c++, pos+= 4) { del_x[pos] = s[pos + 1] – s[pos – 1]; del_x[pos + 1] = s[pos + 2] – s[pos]; del_x[pos + 2] = s[pos + 3] – s[pos + 1]; del_x[pos + 3] = s[pos + 4] – s[pos + 2]; } …. del_x[pos] = s[pos] – s[pos – 1];

PLX sub-word parallel ISA
1, 2, 4, or 8 bytes sub-words 32 general purpose registers Aligned memory address 4/8 bytes SIMD instructions allow parallel operations with faster performance

Issues in PLX implementation
Interfacing with C code short int = 2 bytes use fwrite/fread to write/read binary data from C Memory aligned load load address: multiple of 4 bytes to avoid trap Load from aligned address and shift/add to get required sub-words Loops using predicated jump instruction

Results PLX FFCF, FFB5, FFB5, 0002 C

Snapshot of PLX code

Thanks !

Dongkeun Oh Sanghamitra Roy

Similar presentations

Presentation on theme: "Dongkeun Oh Sanghamitra Roy"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dongkeun Oh Sanghamitra Roy

Similar presentations

Presentation on theme: "Dongkeun Oh Sanghamitra Roy"— Presentation transcript:

Similar presentations

About project

Feedback