Frame-Level Pipelined Motion Estimation Array Processor Surin Kittitornkun and Yu Hen Hu IEEE Trans. on, for Video Tech., Vol. 11, NO.2 FEB, 2001
OUTLINE Methodology for VLSI Array Processors Design An Example on Frame Level Block Matching Algorithm
Design Levels Sequential Algorithm 1.DG Design 2.SFG Design 3.VLSI Array Design
Dependence Graph (DG)
DG: 1.Shift Invariant Shift-Unvariant DG for Sorting Algorithm For i from 1 to N For j from 1 to i m( i +1, j ) <- max[ x ( i, j ), m( i, j )] x( i, j +1) <- min[ x ( i, j ),m( i, j )]
DG: 2.Localization Broadcast vs. Transmittent Data
DG: 3.Reversible Arcs for Associative Operations If the operation used in the recursion is associative, then the directions of the arcs may be reversible.
DG: 4.Localization with Intermediate Variables Involved AR Filtering Algorithm
DG: 4.Localization with Intermediate Variables Involved AR Filtering Algorithm –Spiral Communication Approach –Local Communication Approach
Signal Flow Graph (SFG) Input(1) Output(1) Input(2) Output(2) D x(n) x(n-1)
SFG Projection Procedure For any projection direction, a processor space is orthogonal to the projection direction. Replace the arcs in the DG with zero or nonzero delay edges between their corresponding processors. Attach the input and output data to their corresponding processors.
Projection Example Insertion sorting Insertion Sorting Selection sorting Bubble sorting Selection Sorting Insertion Sorting
SFG to Systolic Array Replace Operation Node with PE. Place data and Input/Output pin with delay units.
Frame-Level Pipelined Motion Estimation Array Processor Surin Kittitornkun and Yu Hen Hu IEEE Trans. on, for Video Tech., Vol. 11, NO.2 FEB, 2001
Six-level nested Do-loop FSBM
Two-level nested Do-loop FSBM
k th -clock cycle (v-1)N h N 2 (h-1)N 2 (i-1)N j k th -clock cycle
2D Localized DG of row 1, v =1 Search area and current frame coordinates of N v = 3; N h = 2; p =N/2 = 1. 2 p +1
Linear SFG of (2p + 1) 2 PEs, p = N/2 = 1 after systolic mapping of 2-D DG.
Systolic array with spiral interconnections
Microarchitecture of PE
Scheduled search area data
Performance