Download presentation
Presentation is loading. Please wait.
Published byJohn Cameron Modified over 6 years ago
1
UNIT VI SUBSYSTEM DESIGN PROCESSES AND ILLUSTRATION
2
UNIT – V SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
INTRODUCTION Objectives: Design consideration, problem and solution Design processes Basic digital processor structure Datapath Bus Architecture Design 4 – bit shifter Design of ALU subsystem Adders Multipliers UNIT – V SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
3
GENERAL CONSIDERATIONS
Lower unit cost Higher reliability Lower power dissipation, lower weight and lower volume Better performance Enhanced repeatability Possibility of reduced design/development periods UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
4
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
SOME PROBLEMS How to design complex systems in a reasonable time & with reasonable effort. The nature of architectures best suited to take full advantage of VLSI and the technology. The testability of large/complex systems once implemented on silicon. UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
5
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
SOME SOLUTIONS Problem 1 & 3 are greatly reduced if two aspects of standard practices are accepted: a) Top-down design approach with adequate CAD tools to do the job b) Partitioning the system sensibly c) Aiming for simple interconnections d) High regularity within subsystem e) Generate and then verify each section of the design. Devote significant portion of total chip area to test and diagnostic facility Select architectures that allow design objectives and high regularity in realization UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
6
ILLUSTRATION OF DESIGN PROCESSES
Structured design begins with the concept of hierarchy It is possible to divide any complex function into less complex subfunctions that is up to leaf cells Process is known as top-down design As a systems complexity increases, its organization changes as different factors become relevant to its creation Coupling can be used as a measure of how much submodels interact It is crucial that components interacting with high frequency be physically proximate, since one may pay severe penalties for long, high-bandwidth interconnects Concurrency should be exploited – it is desirable that all gates on the chip do useful work most of the time Because technology changes so fast, the adaptation to a new process must occur in a short time. UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
7
ILLUSTRATION OF DESIGN PROCESSES Approaches used at Different Stages
Conventional circuit symbols Logic symbols Stick diagram Any mixture of logic symbols and stick diagram that is convenient at a stage Mask layouts Architectural block diagrams and floor plans UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
8
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Figure 6.1: Basic digital processor structure UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
9
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Figure 6.2: Communication strategy for the datapath UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
10
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Figure 6.3: Subunits and basic interconnection for datapath UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
11
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Figure 6.4: One bus architecture Sequence: 1. 1st operand from registers to ALU. Operand is stored there. 2. 2nd operand from register to ALU and added. 3. Result is passed through shifter and stored in the register UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
12
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Figure 6.5: Two bus architecture Sequence: 1. Two operands (A & B) are sent from register(s) to ALU & are operated upon, result (S) in ALU. 2. Result is passed through the shifter & stored in registers. UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
13
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Figure 6.6: Three bus architecture Sequence: Two operands (A & B) are sent from registers, operated upon, and shifted result (S) returned to another register, all in same clock period. UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
14
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Figure 6.7: Tentative floor plan for 4 – bit datapath UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
15
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit Arithmetic Processor Points to be noted for design: Metal can cross poly or diffusion Poly crossing diffusion form a transistor Whenever lines touch on the same level an interconnection is formed Simple contacts can be used to join diffusion or poly to metal Buried contacts or a butting contacts can be used to join diffusion and poly Some processes use 2nd metal 1st and 2nd metal layers may be joined using a via Each layer has particular electrical properties which must be taken into account For CMOS layouts, p-and n-diffusion wires must not directly join each other Nor may they cross either a p-well or an n-well boundary UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
16
ILLUSTRATION OF DESIGN PROCESSES Design of a 4-bit Shifter
Any general purpose n-bit shifter should be able to shift incoming data by up to (n – 1) place in a right-shift or left-shift direction. Further specifying that all shifts should be on an end-around basis, so that any bit shifted out at one end of a data word will be shifted in at the other end of the word, then the problem of right shift or left shift is greatly eased. The shifter must have: input from a four line parallel data bus four output lines for the shifted data means of transferring input data to output lines with any shift from 0 to 3 bits UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
17
ILLUSTRATION OF DESIGN PROCESSES Design of a 4-bit Shifter
Figure 6.8: 4 X 4 crossbar switch using MOS UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
18
ILLUSTRATION OF DESIGN PROCESSES Design of a 4-bit Shifter
Figure 6.9: 4 X 4 barrel shifter UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
19
ILLUSTRATION OF DESIGN PROCESSES Summary of Design Processes
Set out the specifications Partition the architecture into subsystems Set a tentative floor plan Determine the interconnects Choose layers for the bus & control lines Conceive a regular architecture Develop stick diagram Produce mask layouts for standard cell Cascade & replicate standard cells as required to complete the design UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
20
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Figure 6.10: 4-bit data path for processor UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
21
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Design of 4-bit adder: From the table one form of the equation is: Sum Sk = HkCk-l’ + Hk’Ck-1 New carry Ck = AkBk + HkCk-1 Where Half sum Hk = Ak’Bk + AkBk’ UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
22
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Adder element requirement: Table reveals that the adder requirement may be stated as: If Ak = Bk then Sk = Ck-1 Else Sk = Ck-l’ And for the carry Ck If Ak = Bk then Ck = Ak = Bk Else Ck = Ck-l Thus the standard adder element for 1-bit is as shown in the figure 6.11 Figure 6.11: Adder element UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
23
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Adder element requirement: Figure 6.12: Multiplexer based adder UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
24
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Adder element requirement: Figure 6.13: CMOS based adder UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
25
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Standard cells required for adder: Figure 6.14: Multiplexer cell with or without cut UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
26
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Standard cells required for adder: Figure 6.15: NMOS (butting contact) inverters UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
27
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Standard cells required for adder: Figure 6.16: NMOS (buried contact) inverters UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
28
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Standard cells required for adder: Figure 6.17: CMOS inverter design UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
29
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Adder element bounding box: Figure 6.18: Approximate bounding box and floor plan for CMOS adder element UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
30
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Adder element bounding box: Figure 6.19: 4-bit adder element UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
31
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Implementing ALU functions with an adder: The adder equations are: Sum Sk = HkCk-l’ + Hk’Ck-1 New carry Ck = AkBk + Hk Ck-1 Half sum Hk = Ak’Bk + Ak Bk’ Let us consider the sum output, if the previous carry is at logical 0, then Sk = Hk. 1 + Hk’. 0 Sk = Hk = Ak’Bk + Ak Bk’ – An Ex-or operation Now, if Ck-1 is logically 1, then Sk = Hk. 0 + Hk’. 1 Sk = Hk’ – An Ex-Nor operation Next, consider the carry output of each element, first Ck-1 is held at logical 0, then Ck = AkBk + Hk . 0 Ck = AkBk - An And operation Now if Ck-1 is at logical 1, then Ck = AkBk + Hk . 1 On solving Ck = Ak + Bk - An Or operation UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
32
COMPUTATIONAL ELEMENTS Design of an ALU Subsystem
Implementing ALU functions with an adder: Figure 6.20: 1-bit adder element and 4-bit ALU UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
33
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Generation: This principle of generation allows the system to take advantage of the occurrences “Ak=Bk”. Propagation: If we are able to localize a chain of bits Ak Ak+1... Ak+p and Bk Bk+1... Bk+p for which Ak not equal to Bk for k in [k, k+p], then the output carry bit of this chain will be equal to the input carry bit of the chain. These remarks constitute the principle of generation and propagation used to speed the addition of two numbers. All adders which use this principle calculate in a first stage. Pk = Ak XOR Bk Gk = Ak Bk UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
34
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Figure 6.21: CMOS adder element and using pass/generate concept UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
35
COMPUTATIONAL ELEMENTS Further Consideration of Adder
The Manchester Carry Chain: If the carry path is precharged to VDD, the transmission gate is then reduced to a simple NMOS transistor. In the same way the PMOS transistors of the carry generation is removed. The Manchester cell is very fast, but a large set of such cascaded cells would be slow due to the distributed RC effect and the body effect making the propagation time grow with the square of the number of cells. Figure 6.22: Manchester carry-chain element UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
36
COMPUTATIONAL ELEMENTS Further Consideration of Adder
The Manchester Carry Chain: Figure 6.23: Cascaded Manchester carry-chain elements with buffering UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
37
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry select adders: Figure 6.24: Carry select adder structure UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
38
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry select adders: Figure 6.25: Carry select adder structure (6-bit) UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
39
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry select adders: Optimization of the carry select adder: Computational time T = k1n k1 – delay through one adder cell Dividing the adder into blocks with 2 parallel paths T = k1n/2 + k2 k2 – time needed by multiplexer of next block to select actual output carry For a n-bit adder of M-blocks and each block contains P adder cells in series so that T = Pk1 + (M – 1) k2 ; n = M.P minimum value for T is when M= (k1n / k2 )1/2 UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
40
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry skip adders: Figure 6.26: Carry skip adder structure UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
41
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry skip adders: Figure 6.27: Carry skip adder structure UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
42
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry skip adders: Figure 6.28: Carry skip adder structure (24-bit) UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
43
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry skip adders: Optimization of the carry skip adder: Let us formalize that the total adder is made of N adder cells. It contains M blocks of P adder cells. The total of adder cells is then N = M.P The time T needed by the carry signal to propagate through P adder cells is T = k1.P The time T' needed by the carry signal to skip through M adder blocks is T‘ = k2.M The problem to solve is to minimize the worst case delay which is: Tworst = 2(P – 1).k1 + (M – 2) where P = n/M T is minimum when M = (2n.k1/k2)1/2 UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
44
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry skip adders: Optimization of the carry skip adder: Figure 6.29: Worst case carry propagation carry skip adder UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
45
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry skip adders: Optimization of the carry skip adder: Figure 6.30: Block propagation carry skip adder UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
46
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry look-ahead (CLA) adders: Figure 6.31: Carry look-ahead adder structure UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
47
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry look-ahead (CLA) adders: Figure 6.32: Carry look-ahead and ripple through compromise UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
48
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry look-ahead (CLA) adders: Figure 6.33: 4-bit Carry look-ahead adder unit UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
49
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry look-ahead (CLA) adders: Figure 6.34: 16-bit, 4X4 block Carry look-ahead adder unit UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
50
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry look-ahead (CLA) adders: Figure 6.35: Generation of carry out (from 4-bits and carry in) UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
51
COMPUTATIONAL ELEMENTS Further Consideration of Adder
Adder Enhancement Techniques: Carry look-ahead (CLA) adders: Figure 6.36: Four-cell Manchester carry-chain UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
52
Introduction to CPLDs and FPGAs
53
CPLD Families
54
CPLD Block Diagram Programmable switch for interconnecting various FBs
FF 1 An individual switch In a crossbar is a diamond switch I/Ps O/Ps Programmable switch for interconnecting various FBs Function block (~ PLA w/ 1 o/p that can be FF’ed) Crossbar Switch
55
CPLD Function Block Extra function (e.g., g, h) i/ps for OR term
2:1 Mux Example function f= ab+bc’+g+h D-FF PLA-like AND array Literal inputs (e.g., a, b, c)
56
Field Programmable Gate Arrays (FPGAs)
57
FPGA Types (Anti-fuse technology)
58
FPGA Families
59
SRAM-type FPGA Interconnect Architecture
Diamond switch Horizontal routing (interconnect) channel PSM: Programmable Switch Matrix (for making connections between interconnects of different channels). The structure shown only allows i-to-i connections Vertical routing channels CLB: Configuration Logic Block (programmable logic cell)
60
SRAM-type FPGA Interconnect Architecture (contd)
Cell Connection Matrix (CCM) PSM
61
Configuration Logic Block (CLB)
5-i/p function implemented using G, F and H LUTs (Look Up Tables) using Shannon’s Expansion: p(a,b,c,d,e) = a p(1, b, c, d, e) + a’ p(0, b, c, d, e) = a q(b,c,d,e) + a’r(b,c,d,e). q( ) impl. using LUT G, r impl. using LUT F and p=ag + a’h impl. using LUT H The LUT o/ps can go through a FF (for seq. ckt design) or bypass it for a combinational o/p This is called technology mapping: mapping the logic to CLB logic components
62
Technology Mapping
63
Programming a CLB (contd)
65
Components of Modern FPGAs
66
Digital System: Implementation Spectrum
Microprocessor Reconfigurable Hardware ASIC Software Firmware Hardware ASIC gives high performance at cost of inflexibility. Processor is very flexible but not tuned to the application. Reconfigurable hardware is a nice compromise.
67
Simplified FPGA Logic Element
68
High-level Compilers & FPGAs
Difficult to estimate hardware resources. Some parts of program more appropriate for processor (hardware/software codesign). Compiler must parallelize computation across many resources. Engineers like to write in C/VHDL/Verilog rather than pushing little blocks around. for (i = 0; i<n, i++) { c[i] = a[i] + b[i] } Some success stories
69
Translating a Design to an FPGA
RTL . C = A+B Circuit A B + C Array CAD to translate circuit from text description to physical implementation well understood. Most current FPGA designers use register-transfer level specification (allocation and scheduling) Same basic steps as ASIC design.
70
Circuit Compilation & Implementation: Basic Steps
Technology Mapping Placement Routing LUT 4. Convert all implementation “details” to FPGA programming info (configuration bits): LUT RAM bits, CCM & PSM FF/SRAM bits, etc. Can store config bits on disk or ROM and load into FPGA as needed Can thus use the FPGA to implement multiple digital systems (at different times or sometimes simultaneously in different FPGA partitions) LUT ? Assign a logical LUT to a physical location. Select wire segments and switches for Interconnection.
71
Technology Mapping: A Simple Example
Made of Full Adders FA A B Co Ci S A+B = D Logic synthesis tool reduces circuit to SOP form S = ABCi + ABCi + ABCi + ABCi A A B LUT Co B LUT S Ci Ci Co = ABCi + ABCi + ABCi + ABCi
72
Processor + FPGA Three possibilities
daughtercard Proc FPGA chip Backplane bus (e.g. PCI) 1. FPGA serves as coprocessor for data intensive applications – possible project. Proc FPGA chip 2. FPGA serves as embedded digital system for lower latency processing. “Reconfigurable Functional Unit”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.