Download presentation
Presentation is loading. Please wait.
1
M2: Team Paradigm :: Milestone 2 2-D Discrete Cosine Transform Group M2: Tommy Taylor Brandon Hsiung Changshi Xiao Bongkwan Kim Project Manager: Yaping Zhan
2
M2: Team Paradigm Project status Design Proposal (Complete) Architecture Proposal (Almost Complete) :Algorithm description (Done) :High level simulation (Done) :Mapping algorithm into hardware (Done) :Behavioral Verilog and test bench (Debugging) Size estimates/floor plan (To be completed) :Structural Verilog :More accurate transistor count :Floor plan
3
M2: Team Paradigm Design decisions Do not include motion prediction Go with 2-D DCT Use SRAM No pipelining Will not run in real-time
4
M2: Team Paradigm Distributed algorithm of 1D DCT : A = cos( /4) B = cos( /8) C = sin( /8) D = cos( /16) E = cos(3 /16) F = sin(3 /16) G = sin( /16) A A B C -C -B A -A -A A C -B B -C x 0 + x 7 x 1 + x 6 x 2 + x 5 x 3 + x 4 X0X0 X2X2 X4X4 X6X6 =1/2 D E F G E -G -D -F F -D G E G -F E -D x 0 - x 7 x 1 - x 6 x 2 - x 5 x 3 - x 4 X1X1 X3X3 X5X5 X7X7 =1/2
5
M2: Team Paradigm In two’s complement representation: u i = -b ui B-1 + j=1, B-1 2 -j b ui j Where, b ui j is the jth bit, b ui B-1 is the MSB, i.e. the sign bit X n = j=1,B-1 2 -j D n (b j ) – D n (b B-1 ), where D n (b j ) = ( i=1,3 C i,n b ui j ) A A B C -C -B A -A -A A C -B B -C b 0 15 b 0 14 …b 0 0 b 1 15 b 1 14 …b 1 0 b 2 15 b 2 14 …b 2 0 b 3 15 b 3 14 …b 3 0 X0X0 X2X2 X4X4 X6X6 = For example, D 0 (b 14 ) = Ab 0 14 +Ab 1 14 +Ab 2 14 +Ab 3 14 Distributed algorithm of 1D DCT (continued):
6
M2: Team Paradigm 1D DCT architecture out_data(16) Selector +- ++ RR Parallel to serial Control logic ROM in_data(16) in_valid out_valid out_ready out_done clk vdd vss reset Register file 8x16 Bit address generator Bit address generator ROM
7
M2: Team Paradigm 2D DCT : Two 1D DCT can operate in pipeline to boost throughput performance, this requires RAM can be read and wrote at the same time and each 1D DCT module read/write the RAM in row and column order alternatively. 1D DCT (on rows) 1D DCT (on columns) Transpose RAM Data in Data out Control logic
8
M2: Team Paradigm Transistor count and performance estimation : adderregisterROMControl logictotalpins 4x16x3018x16x208x16x21000~9k40 1DDCT module : 2DDCT = 2x1DDCT + SRAM ~ 24k throughputlatency 8 samples/64 cycle528 cycle
9
M2: Team Paradigm High level simulation (in C/C++) : three implementation of 1DDCT: 1.Based on definition 2.Based on fast algorithm 3.Based on distributed algorithm input Function 1 Function 2 Function 3 Matlab compare pass/fail
10
M2: Team Paradigm - Selector R0R7 We begin by inputting eight, sixteen bit values into individual registers We use a selector to select the registers that will be added and subtracted The R0 & R7 values are added and subtracted in parallel...So forth for R1 & R6...R2 & R5....R3 & R4 It will take 8 clock cycles to get all the data R7R0 Step 1:
11
M2: Team Paradigm Step 1 (Verilog) always @ (posedge clk or negedge rst) begin if(rst==0) begin count <= 0; end else begin if(in_clr==1) begin count <= 0; end else begin if(in_valid && ~out_full) begin buf[count] <= in_data; count <= count + 1; end end // always @ (posedge clk or negedge rst) always @ (posedge clk) begin if(in_read) begin out_data1 <= buf[in_addr]; out_data2 <= buf[7-in_addr]; end Write operation Read operation
12
M2: Team Paradigm Bit Address Generator Store the results from the addition and subtraction into 8, 16' registers Taking the first bit in each of the four registers (addition results and subtraction result) we use the value to allow the bit address generator to store it in the proper position in ROM R0R7 bit 1 1011 Rom0Rom7 Step 2
13
M2: Team Paradigm Step 2 (Verilog) always @ (posedge clk or negedge rst) begin if(rst==0) begin count <= 0; end else begin if(in_clr==1) begin count <= 0; end else begin if(in_read & ~out_full) begin buf[count] <= in_data; count <= count + 1; end always @ (in_bitpos) begin out_addr[3] <= buf[0][in_bitpos:in_bitpos]; out_addr[2] <= buf[1][in_bitpos:in_bitpos]; out_addr[1] <= buf[2][in_bitpos:in_bitpos]; out_addr[0] <= buf[3][in_bitpos:in_bitpos]; end Bit address generator Read operation
14
M2: Team Paradigm Rom0Rom7 R5R6 S1 S0 Parallel to Serial From the ROM the data in the addresses are added, stored in a register then the result is shifted (multiplied by a factor of two...two's complement) Step 3
15
M2: Team Paradigm Step 3 (Verilog) always @ (posedge clk or negedge rst) begin if(rst==0) begin out_data <= 0; bit_pos <= 15; end else begin if(in_clr==1) begin out_data <= 0; bit_pos <= 15; end else begin if(~out_done) begin out_data <= out_data + in_data; bit_pos <= bit_pos - 1; end end // else: !if(in_clr==1) end
16
M2: Team Paradigm C Code Result
17
M2: Team Paradigm ::conclusion & questions :Implementing 2D DCT :Roughly 24k transistor count :Verilog needs debugging
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.