Download presentation
Presentation is loading. Please wait.
1
硬體描述語言 Verilog範例電路設計 國立中興大學電機系 廖彥璋、黃穎聰
2
Introduction Goal: get familiar with the Verilog coding through a set of design examples Classifications of design examples Combinational logic Data storage and memory Counter and finite state machine (FSM) each design example includes Circuit function description Verilog coding Synthesis results Design symbol Synthesized circuit schematic Gate count report Simulation results
3
Part A. Combinational Logic Design
4
Outline one bit full adder 4-bit full adder design (unsigned)
4-bit adder/subtractor design. 8-input Priority encoder BCD to binary converter design 7-segment LED display decoder Odd parity checker Absolute value function Simplified 8-bit ALU Design
5
1. One Bit Adder (1) A1. A one bit full adder design, input a, b, cin; output sum, cout; a) Using behavioral modeling b) Using data flow modeling 1 Bit 加法器基本電路結構
6
1. One Bit Adder (2) a) Verilog design Using behavioral modeling
7
1. One Bit Adder (3) Gate level design of the synthesis result b XOR
AO cin XOR a
8
1. One Bit Adder (4) b) Using data flow modeling Parenthesis indicates
preference in logic implementation
9
1. One Bit Adder (5) “a” and “b” are XORed first because of the () specification in expression Although behavioral and data flow modeling yield identical logic structure, i.e., 2XOR and 1AO, the input orderings are different
10
1. One Bit Adder (6) Simulation results
sum Behavioral cout sum Data flow cout Synthesis Report by Design Vision Cell Reference Library Area Attributes U XOR2X slow U AO22X slow U XOR2X slow Total 3 cells
11
2. Four Bit Adder (1) A2. A 4-bit full adder design (unsigned)
Using structural level modeling and constructing the adders with four 1-bit adders Using behavioral modeling a)小題使用第一小題得到的module串連程下面圖中的樣子完成。 b)小題則是直接使用behavior level來完成。
12
2. Four Bit Adder (2) Using structural level modeling and constructing the adders with four 1-bit adders Use name mapping
13
2. Four Bit Adder (3) structural modeling Symbol view
Symbol view will show all the interface signals in module declaration
14
2. Four Bit Adder (4) Synthesis Report by Design Vision
Cell Reference Library Area Attributes u adder_1bit_dataflow_ h u adder_1bit_dataflow_ h u adder_1bit_dataflow_ h u adder_1bit_dataflow_ h Total 4 cells ***** End Of Report ***** Note that the area of 1-bit full adder is 33.94 The area of the 4-bit adder is roughly 4 times larger
15
2. Four Bit Adder (5) Using behavioral modeling
Inputs are 4-bit vectors
16
2. Four Bit Adder (6) Synthesized result of behavioral modeling c3
cout c2 c1 A ripple carry structure is synthesized Sum logic
17
2. Four Bit Adder (7) Simulation results
18
2. Four Bit Adder (8) Synthesis Report by Design Vision
Cell Reference Library Area Attributes U OR2X slow U XNOR2X slow U XNOR2X slow U XNOR2X slow U XNOR2X slow U XNOR2X slow U XNOR2X slow U XOR2X slow U XOR2X slow U OAI2BB1X slow U OAI21XL slow U AO22X slow U OAI2BB1X slow U OAI21XL slow U OAI2BB1X slow U OAI21XL slow Total 16 cells ***** End Of Report ***** Structural modeling leads to a smaller area when compared with the synthesis result of behavioral modeling This is because explicit structural information is available in structural modeling Versus 4X3 cells Versus behavioral structural behavioral structural
19
3. Adder & Substractor (1) Symbol view
A4. A 4-bit adder/subtractor design. Input A[3:0], B[3:0], function select s = 1 (add), s = 0(subtract), output Y[4:0] 加減法器的其中一種形式 Symbol view Input resource sharing because Add and sub functions are mutually exclusive
20
3. Adder & Substractor (2) Default option is added to avoid the inference of a latch
21
3. Adder & Substractor (3) Synthesized circuit - - ++
Exclusive gates for 1’s Complement control of input operand - - ++
22
3. Adder & Substractor (4) Synthesis Report by Design Vision
Cell Reference Library Area Attributes U XNOR2X slow U OAI22XL slow U CLKINVX slow U AND2X slow U XOR2X slow U XOR2X slow U OA21XL slow U OAI2BB1X1 slow U XOR2X slow U XOR2X slow U XOR2X slow U OA21XL slow U OAI2BB1X1 slow U XOR2X slow U XOR2X slow U XOR2X slow U AOI2BB2X slow U NAND2X slow U XOR2X slow U XOR2X slow U XOR2X slow U XOR2X slow Total 22 cells ***** End Of Report ***** The area is larger than that of a pure adder design
23
4. Priority Encoder (1) A6. 8-input Priority encoder. Input a[7:0], output q[2:0], y The 8-bit input has a decreasing order of priority from MSB to LSB. Output q shows the bit location of input equal to 1 with the highest priority and y is set as 1. If none of the input bits equals to 1, q is set to 3’b0 and y is set to 0. Symbol view Truth table
24
4. Priority Encoder (2) Coding 1: Use if-else-if sequence
25
4. Priority Encoder (3) Synthesized circuit
26
4. Priority Encoder (4) Synthesis Report by Design Vision
Cell Reference Library Area Attributes U NAND4BBXL slow U NOR2X slow U CLKINVX slow U OAI211X slow U CLKINVX slow U OR2X slow U OAI211X slow U AOI21X slow U NAND3X slow U CLKINVX slow U NOR4X slow U CLKINVX slow Total 12 cells ***** End Of Report *****
27
4. Priority Encoder (5) Coding 2: Use casex Same result
28
4. Priority Encoder (6) Synthesized circuit
Smaller than use if-else-if statement
29
5. BCD to Binary Conversion (1)
A9. A 2-digit BCD to binary converter design. Input a[3:0] (MSD), b[3:0] (LSD), output y[6:0]
30
5. BCD to Binary Conversion (2)
y = 10*a + b Left shift 3-bit = 8a Left shift 1-bit = 2a Note: use “shift” instead of “multiplication” to reduce the logic complexity Modern synthesis tools are capable of synthesizing constant multiplication with shifters
31
5. BCD to Binary Conversion (3)
The synthesized circuit is simply too complicated to verify manually!! 6x16+3 = 99
32
5. BCD to Binary Conversion (4)
Synthesis Report by Design Vision Cell Reference Library Area Attributes U XOR2X slow U XNOR2X slow U NOR2X slow U NAND2X slow U XOR2X slow U NOR2X slow U XNOR2X slow U OA21XL slow U OAI2BB1X1 slow U XOR2X slow U AOI21X slow U OA21XL slow U XOR2X slow U XNOR2X slow U NAND2X slow U XOR2X slow U XNOR2X slow U CLKINVX slow U XOR2X slow U OAI21XL slow U OAI2BB1X slow U CLKINVX slow U XOR2X slow U XNOR2X slow U NAND2X slow U XOR2X slow Total 26 cells ***** End Of Report *****
33
6. Seven-Segment Decoder (1)
A10. A 7-segment LED display decoder. Input x[3:0], output a,b,c,d,e,f,g. The LED segment turns on if the control signal equal to 1.
34
6. Seven-Segment Decoder (2)
Case description is equivalent to write down the truth table There is no need to attempt to derive the Boolean equations of a ~ g yourself The synthesis tool can perform sophisticated logic minimizations efficiently to obtain the Boolean functions Note: to perform multiple-output Boolean logic minimization, you may resort to Quine-McClauskey algorithm
35
6. Seven-Segment Decoder (3)
36
6. Seven-Segment Decoder (4)
37
6. Seven-Segment Decoder (5)
Synthesis Report by Design Vision Cell Reference Library Area Attributes U NAND2X slow U OAI21XL slow U NAND3BX slow U OA21XL slow U NAND3X slow U MXI2X slow U NOR2X slow U OA21XL slow U OAI211X slow U CLKINVX slow U CLKINVX slow U OAI221XL slow U NOR2X slow U NOR3X slow U NAND4X slow U NAND3BX slow U CLKINVX slow U CLKINVX slow U NOR2X slow U CLKINVX slow U NAND2X slow U NOR2X slow U NAND2X slow U NOR2X slow U CLKINVX slow Total 25 cells ***** End Of Report *****
38
7. Odd Parity Checker (1) Odd parity Even parity
A7. Odd Parity Checker. Input x[7:0], output y = 1 if there are odd number of 1’s in input x. Odd parity Even parity Note: If it’s an odd parity bit generator, the parity bit should be 1 if there are even number of 1’s
39
7. Odd Parity Checker (2) Reduction XOR 0100_0101(3) 0100_1101(4)
1111_1101(7)
40
7. Odd Parity Checker (3)
41
7. Odd Parity Checker (4) 架構1: 兩架構之結果相同,但造成的延遲時間 會有不小的差距,同樣7個XOR的狀況下,
架構一只有三級,架構二卻有七級,故架 構二會使電路減慢速度。 架構2:
42
7. Odd Parity Checker (5) Synthesis Report by Design Vision
Cell Reference Library Area Attributes U XOR2X slow U XOR2X slow U XNOR2X slow U XNOR2X slow U XOR2X slow U XNOR2X slow U XNOR2X slow Total 7 cells ***** End Of Report *****
43
8. Absolute value function (1)
A13. ABS function. Input a[7:0]; return the absolute value of a
44
8. Absolute value function (2)
Verilog 1995 不能用關鍵字"Signed" 1001_1110 = -98 0110_0010 = 98
45
8. Absolute value function (3)
Verilog 2001 能用關鍵字"Signed"
46
8. Absolute value function (4)
Verilog 1995 coding synthesis result
47
8. Absolute value function (5)
Verilog 2001 coding synthesis result is identical to that of Verilog 1995
48
8. Absolute value function (7)
Synthesis Report by Design Vision Cell Reference Library Area Attributes U AO22X slow U AO22X slow U AO22X slow U AO22X slow U AO22X slow U AO22X slow U AO22X slow U NAND2X slow U CLKINVX slow U XOR2X slow U NAND2BX1 slow U CLKINVX slow U XNOR2X slow U NOR2BX slow U XNOR2X slow U NOR3BXL slow U XOR2X slow U NAND2BX1 slow U XNOR2X slow U NOR3X slow U XNOR2X slow U NOR2X slow U XOR2X slow Total 23 cells
49
9. ALU design (1) A17. A simplified 8-bit ALU Design with cmd[1:0] as a 2-bit OP code, A[7:0] and B[7:0] as two 8-bit input operands, and Y[7:0] as a 8-bit output. It also has 2 flags. Flag z = 1 if Y==0. Flag c = 1 if carry out at MSB occurs when performing the addition. cmd operation 00 Y = A+B 01 Y = A-B 10 Y = A or B 11 Y = A and B
50
9. ALU design (2) Use “case” to describe different functions performed by the ALU Flag update
51
9. ALU design (3) Synthesized ALU circuit
52
9. ALU design (4) Synthesis report Total 32 cells 628.038015
Cell Reference Library Area Attributes U NOR2X slow U NAND4X slow U NAND4X slow U NOR3BXL slow U CLKINVX slow U AOI222XL slow U AO21X slow U CLKINVX slow U AOI222XL slow U AO21X slow U CLKINVX slow U AOI222XL slow U AO21X slow U CLKINVX slow U AOI222XL slow U AO21X slow U CLKINVX slow U AOI222XL slow U AO21X slow U CLKINVX slow U AOI222XL slow U AO21X slow U CLKINVX slow U AOI222XL slow U AO21X slow U CLKINVX slow U AOI222XL slow U AO21X slow U NOR2X slow U CLKINVX slow U NOR2BX slow r addsub Total 32 cells ***** End Of Report *****
53
Part B. Data Storage and Memory
54
Outline 8-bit Shift Register Multiply-and-Accumulate module
256x16 Single Port RAM
55
1. 8-bit Shift Register (1) B2. 8-bit Shift Register, positive edge triggered, input din[7:0], cmd[1:0], output q[7:0] Command Operation 00 Load register 01 Shift left, LSB takes in a “0” 10 Logical shift right, MSB takes in a “0” 11 Arithmetic shift right, MSB is sign bit extension*
56
1. 8-bit Shift Register (2) Asynchronous Reset, Active High
利用Case進行Command的選擇。 * Verilogger不支援2001標準無法使用 Arithmetic shift運算子,故僅用Logical shift right。 Arithmetic shift 運算子為 >>>、<<<
57
1. 8-bit Shift Register (3) Load C0 => 1100_0000
1100_0000 << 1 => 1000_0000(80) 1000_0000 >> 1 => 0100_0000(40) 0100_0000 >> 1 => 0010_0000(20) (1) (2) (3) (4)
58
1. 8-bit Shift Register (4)
59
1. 8-bit Shift Register (5) Area Report: Net Report: Fanout:扇出數,由該接線所驅
動物件之總數。 Ex.因此設計有8個DFF,故clk 之 Fanout為8。
60
1. 8-bit Shift Register (6) Timing Report: (1) Critical Path之起終點。
範例中為q[1]到q[0]之間。 (2) Critical Path 路徑 (3) Flip Flop之Setup Time (4) Slack:值越大越好。 意義上正值表示滿足FF 之Setup/Hold Time。 負值表示不滿足,須降低 clock rate。 (2) (3) (4)
61
2. Multiply-and-Accumulate module(1)
B3. Multiply-and-Accumulate module Perform y[17:0] = a[7:0]*b[7:0] + acc[17:0], a and b are two input operands, and acc is the output of an accumulating register. Y is then loaded to the accumulating register on the rising edge of the clock.
62
2. Multiply-and-Accumulate module(2)
Asynchronous Reset, Active High Feedback
63
2. Multiply-and-Accumulate module(3)
Start input data A(10) * 14(20) = C8(200) F(15) * 19(25) + C8(200) = 23F(575) 8(8) * 6(6) + 23F(575) = 26F(623) 1(1) * 2(2) + 26F(623) = 271(625) (1) (2) (3) (4) (5)
64
2. Multiply-and-Accumulate module(4)
Adder Output Register (acc or y) Multiplyer
65
2. Multiply-and-Accumulate module(5)
Area Report: Net Report:
66
2. Multiply-and-Accumulate module(6)
Timing Report: Slack接近0,僅剛好符合Timing, 可調整電路加大一點讓後面的流程 更容易設計。
67
3. 256x16 Single Port RAM (1) B5. 256X16 single port memory.
The memory module is addressed by a 8-bit address addr[7:0] and has a bi-directional data port “data[15:0]”. The module has 2 control signals rw: write if rw = 1, read if rw = 0 cs: active low chip select signal, the RAM functions only if cs = 0. The data port is high impedance if cs = 1. Inout Port
68
3. 256x16 Single Port RAM (2) 左邊用於宣告一個inout兩用port,須搭配左
下的assign決定何時為輸出何時為輸入。 當rw=0(讀取)及cs=0(chip function開啟)時為 輸出,反之則為高阻抗,高阻抗用於輸入。 當rw=0(讀取狀態)時,輸出該ADDR之資料 For 迴圈,可合成但須注意使用方法與 C語言不同。僅用於擴展規律性描述句。 此例自動擴展為:ram[0] <= 8’d0; ram[1] <= 8’d0; ram[2] <= 8’d0; ……
69
3. 256x16 Single Port RAM (3) 當cs= 1,暫停所有功能,各Register不動作。
當rw= 1,寫入資料到Address所指的位置。 當rw= 0,各Register維持原本的值。
70
3. 256x16 Single Port RAM (4) Area Report: 可以發現總面積比前一個範例大很多,
通常使用Register製作RAM不是好選擇, 以1Kbit為分界點,以上使用DRAM面積 會較低,為比較優秀的選擇。
71
3. 256x16 Single Port RAM (4) Net Report:
左邊三個紅框為Fanout>1000的Net,過高的Fanout會使需驅動的電容變大而讓速度降低,應盡量避免此狀況。 可對Compiler下Constrains來限制最高Fanout數解決此狀況。 可以從右邊的圖發現,N527、N528兩條接線是被ADDR[0]&ADDR[1]所驅動。
72
3. 256x16 Single Port RAM (5) Timing Report:
73
Part C. Counter And Finite State Machine
74
Outline Clock Frequency Divider(divide by 4) 4-bit Universal Counter
PWM Module Debouncing Circuit Module ADD-XOR Compute
75
1. Clock Frequency Divider(x0.25)(1)
C4. A clock frequency divider (divide by 4) using a 2-bit counter. Note that duty cycle is 50%, i.e. the period of the divided clock being 1 is equal to 50% of the total (divided) clock period. Duty Cycle 50% 一個clock週期內1與0的比例 為各一半。
76
1. Clock Frequency Divider(x0.25)(2)
除頻器利用一個counter及一個簡單的判斷即可達成,此例中: 將4clock合為1clock,使用2bit counter (4 state)並讓 counter = 0、1時輸出0, counter = 2、3時輸出1。
77
1. Clock Frequency Divider(x0.25)(3)
開始計數。 Counter動作,但仍小於2故輸出0。 Counter動作,此時大於等於2故輸出1。 計數結束,Counter歸零。 可以看到clk_o為4個clk_i的週期。 (1) (2) (3) (4) 內部信號counter
78
1. Clock Frequency Divider(x0.25)(4)
合成結果可以發現與例C1有類似之處,為一簡單的2bit counter, 分析counter的行為(右下表)可知,直接輸出reg[1]即為除頻後的結果。 Counter Output 0 0 0 1 1 0 1 1
79
1. Clock Frequency Divider(x0.25)(5)
Area Report: Net Report:
80
1. Clock Frequency Divider(x0.25)(6)
Timing Report:
81
2. 4-bit Universal Counter (1)
C6. 4-bit universal counter The counter has a 4-bit input “data[3:0]” and a 4-bit output “count[3:0]”. The counter also has the following control inputs with decreasing priority: Reset: synchronous reset, active high (i.e. reset when 1) Load: set output “count[3:0]” value as input “data[3:0]”, synch load Enable: the counter counts only if enable is set to 1 Up_Down: up counting if set as 1, down counting if set as 0
82
2. 4-bit Universal Counter (2)
可以發現四個信號之間有優先順序: Reset > Load > Enable > Up_Down 此電路為作業的簡略版。
83
2. 4-bit Universal Counter (3)
Verilog 之 if 敘述合成: If 的一般來說會合成為多工器且具有優先順序。在Part A的Priority Decoder便是利用這樣的特性,以下是if-elseif-else的合成結果: 但實際上視Synthesiser及Code行為有可能合成出平行的Mux,如上圖及下圖可達成相同的電路行為。
84
2. 4-bit Universal Counter (4)
Load及Enable為1及reset為0,故讀入值C(12) 此時因enable為0故不動作 Enable及up_down為1,往上加1C+1=D(13) 同上狀況所以往上+10+1=1(1) Enable為1但up_down為0,遞減故1-0=0(0) (1) (2) (3) (4) (5)
85
2. 4-bit Universal Counter (5)
86
2. 4-bit Universal Counter (6)
Area Report: Net Report:
87
2. 4-bit Universal Counter (7)
Timing Report:
88
3. PWM Module (1) C8. A PWM (pulse width modulation) module to control the brightness level of a LED. Input clk, ctrl[1:0], output y. ctrl is the control signal, y is the output assumes the waveforms shown below according to the control signal
89
3. PWM Module (2) PWM 簡介: PWM(Pulse Width Modulation,脈衝寬度調變),是將類比信號 轉換為脈波的一種技術,一般轉換後脈波的週期固定,但脈波的Duty Cycle會依類比信號的大小而改變。許多類比電路,電壓和電流可直接用來進行控制,例如家用電器設備中的音量開關控制、LED燈泡的亮度控制等等。 一般而言,負載需要的調製頻率要高於10Hz,在實際應用中,頻率約在1kHz到200kHz之間。
90
3. PWM Module (3) 此例題可依照除頻器的方式來完成。 由題目可以發現0%、25% 、50%、75%
剛好將整個週期分成了4等份,故使用 2bit counter。可看做FSM的State register及Next state logic。 利用Case來完成一個多工器,使用ctrl來當控制信號選擇輸出哪一組亮度(Duty Cycle)。可看做FSM之Output Logic。 類似前面除頻器及VGA同步信號產生的方法 來控制Duty Cycle
91
3. PWM Module (4) 每個區間皆為4 clock cycle Ctrl = 0, Duty cycle = 0%
(1) (2) (3) (4)
92
3. PWM Module (5) State Register Next State Logic PWM 輸出 Output Logic
93
3. PWM Module (6) Area Report: Net Report:
94
3. PWM Module (7) Timing Report:
95
4. Debouncing Circuit Module(1)
C10. Design a debouncing circuit module. Bouncing is often caused by a mechanical switch that takes time to settle when switching occurs. Debouncing circuit will sample input signal at the rising edges of the clock and will change its output state only when a consistent signal is sampled in 3 consecutive clock cycles. Debounce電路用於消除機械按鈕所產生的彈跳現象,範例示範了只有連續3個clock cycle輸入皆為High情況下才輸出高電位,在箭頭所指處的彈跳並沒有影響到輸出。
96
4. Debouncing Circuit Module(2)
State 總共分為4個: 1. WAIT:當輸入為0時的狀態。 2. DETECT_1:第一次偵測到Input為High時。 3. DETECT_2:Input維持High的第二個Clock。 4. DETECT_3:Input維持High的第三個Clock 此時若輸入一直維持,則不 跳回WAIT狀態。 State Register , 4個狀態故2位元。
97
4. Debouncing Circuit Module(3)
Next State Logic,不斷根據in (輸入)判斷是否 跳入另一個State,若輸入持續為HIGH,最後 會維持在DETECT_3狀態。 Output Logic,僅在DETECT_3時輸出,其餘 狀態沒寫出來的話會根據default使輸出為0。
98
4. Debouncing Circuit Module(4)
即使偵測到In為High,out仍為Low,此時狀態應為DETECT_2。 同上狀態,此時進入狀態DETECT_2。 輸入持續,此時進入DETECT_3狀態,out輸出HIGH。 (1) (2) (3)
99
4. Debouncing Circuit Module(6)
輸出 輸入 以上面的圖對比前的範例,可以發現到其實FSM的架構都差不多,以 State Register Next State Logic Output Logic 所組成,電路規模會隨著State即輸出變多而成長,而Register部分也會隨著State編碼方式不同而改變數量,常見的方法為: One Hot 每個狀態僅會有一個bit為1,最耗費Register但電路會較簡單。 Gray Code 某一狀態與前後狀態只有1bit的差異,穩定性高。 Sequence根據一般的數字做邊碼,所有範例皆以此方法製成。
100
4. Debouncing Circuit Module(5)
Area Report: Net Report:
101
4. Debouncing Circuit Module(7)
Timing Report: 一般而言,FSM不會是電路中的Critical Path,通常是一些複雜運算,如加減乘除這類的。
102
5. ADD-XOR Compute(1) C12. Verilog design for the data path shown below 先相加,後將所有位元進行XOR。
103
5. ADD-XOR Compute(2) 完全根據架構圖進行設計,將輸入放入Reg,然後相加後放入Reg,最後XOR再放入Reg。
此種運算子在Part A中使用過,另有以下數種相同類型的運算子,皆僅需一個運算圓。 運算子 描述 ^a 逐位做XOR |b 逐位做OR &c 逐位做AND
104
5. ADD-XOR Compute(3) 1. A+8 = 12(0001_0010), ^(12)=0
2. 8D+4D=DA( ),^(DA)=1 Reg D 1 Reg A Reg B Reg C
105
5. ADD-XOR Compute(4) 與Odd Parity Checker (Part A)架構相似的XOR排列。 Reg C
Reg B Reg D Adder Reg A
106
5. ADD-XOR Compute(5) Area Report: Net Report:
107
5. ADD-XOR Compute(6) Timing Report:
雖然Slack仍足夠,但比起前面的例子要少的多,仔細觀察Cridical Path可以發現路徑集中在Adder的部分。 對比起ParB的第3題,結構類似,但由於加入了Reg使得Slack能提升來加快速度,這樣的概念就是Pipeline。
108
Thanks for your Listening !!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.