Introduction and Overview

Introduction and Overview
Chapter One Introduction and Overview Dr. Chuck Lillie

From Essentials of Computer Architecture by Douglas E. Comer
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved.

Fundamentals of Digital Logic
CSC 3650 Introduction to Computer Architecture Time: 3:30 to 6:30 Meeting Days: W Location: Oxendine 1237B Textbook: Essentials of Computer Architecture, Author: Douglas E. Comer, 2005, Pearson Prentice Hall Spring 2011 Chapter Two Fundamentals of Digital Logic Dr. Chuck Lillie

Boolean Functions AND, OR, XOR, NOT
x AND y is true only if x is true and y is true, otherwise it is false (b) x OR y is false only when x is false and y is false, otherwise it is true (c) x XOR y is true when x is false and y is true or when x is true and y is false, otherwise it is false (d) NOT x is true when x is false and false when x is true Boolean functions form the basis for computer systems. A computer functions based on a single bit which can be on or off, true or false, or 1 or 0. The tables in Table 1-1 are truth tables for four Boolean functions, AND, OR, XOR (exclusive OR), and NOT. You will need to master Boolean functions in order to understand computer architecture. Although the tables show AND, OR, and XOR with just two inputs, they may have more than two inputs.

Boolean Functions NAND, NOR, XNOR
x NAND y is false only if x is true and y is true, otherwise it is true (b) x NOR y is true only when x is false and y is false, otherwise it is false (c) x XNOR y is false when x is false and y is true or when x is true and y is false, otherwise it is true AND, OR, and XOR have complementary counterparts, NAND, NOR, and XNOR. The tables in Table 1-2 are the truth tables for NAND, NOR, and XNOR. Although the table shows them with two inputs, they may have more than two inputs.

All Possible Boolean Functions
All the possible Boolean functions you will need to build computer systems. Notice the symbols for the Boolean operators x^y => x AND y, xy => x AND y x‘ => NOT x x ⊕ y => x XOR y x v y => x OR y, x + y => x OR y There are many symbols that are used to represent Boolean operators. This text book uses the ones noted above.

Truth Table for xy’ + yz This is a truth table for the Boolean function xy’ + yz. The table is initialized to include all possible combinations of true or false (1 or 0) for all values of x, y, and z. Since there are three variables, the number of possibilities are 8. Note that the x, y , and z values are arranged in binary numeric order from 000 (= 0) to 111 (= 7). The first term of the equation, x AND NOT y (xy’) is calculated, then the next term y AND z (yz). Finally, xy’ OR yz is calculated to get the results.

Truth Table for (x + y’)(y + z)
This is a more complicated function (x + y’)(y + z). Three variables, 8 possible combinations, or 23 possible combinations ( 23 = 8). x OR NOT y is calculated, then y OR z, finally (x + y’) AND (y + z).

Equivalent Function for (xy’ + yz)’
x y z xy’ yz (xy’ + yz)’ (xy’ + yz)’ = (xy’)’(yz)’ deMorgan’s Law A truth table can be used to prove two Boolean functions are equivalent. For (xy’ + yz)’ the truth table is: x y z xy’ yz xy’ + yz (xy’ + yz)’ Note that the Boolean values for (xy’ + yz)’, that is 1, 1, 1, 0, 0, 0, 1, 0, are the exact same values for (xy’)(yz)’, (x’ + y)(y’ + z’), and x’y’ + x’z’ + yz’. Therefore, the four equations are equivalent.

DeMorgan’s Law Convert an AND function to an equivalent OR function and vice versa Used to reduce Boolean functions, minimize logic, or generate the complement of a function (ab)’ = a’ + b’ (a + b)’ = a’b’ This can be verified by constructing a truth table From the previous chart: (xy’ + yz)’ = (xy’)’(yz)’ (a + b)’ = a’b’ = (x’ + y)(y’ + z’) (ab)’ = a’ + b’ = x’y’ + x’z’ + yy’ + yz’ Boolean algebra property = x’y’ + x’z’ + yz’ Note: yy’ = 0

Voltage between two points represents potential force
Current represents flow of electrons along a wire From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved.

Logic Symbols for Gates
Here are the symbols used to represent the gates: AND, (b) OR, (c) Exclusive OR or XOR, (d) NOT, (e) NAND, (f) NOR, (g) Exclusive NOR or XNOR AND, (b) OR, (c) Exclusive OR or XOR, (d) NOT, (e) NAND, (f) NOR, (g) Exclusive NOR or XNOR

7400: four nand gates 7402: four nor gates 7406: six inverters
These are combinatorial circuits, values change only when input changes From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved.

4 to 1 Multiplexer Circuit
A multiplexer has n inputs and one output. In this example, there are 4 inputs and one output. The output is selected based on the selection lines S1 and S0. We will start our investigation of more complex circuits, that is, more than just a gate, by looking at combinatorial logic circuits. Later we will look at sequential logic circuits. A combinatorial logic circuit depends solely on its current inputs. When certain input values are set a combinatorial logic circuit output values corresponding to the input values. When the input values change, the output values change. A multiplexer is a combinatorial logic circuit. A multiplexer has n inputs and one output. In this example, there are 4 inputs and one output. The output is selected based on the selection lines S1 and S0. This is the circuit diagram for a multiplexer, that is, the basic logic gates that make up the multiplexer.

4 to 1 Multiplexer Schematic with Active High Enable
This is the schematic for a multiplexer with four inputs and one output. Since it has four inputs, it need two selection lines to determine which input to select to send to the output. The number of input lines = 2n, where n is the number of selection lines. For an active high enable, the output is shown in the truth table. If the enable (E) is 0, it doesn’t matter if the input is 0 or 1 , a high impedance output (in essence, nothing is allowed to pass) is selected. Notice S1 and S0. For input of 0, 0 line Input 0 is selected as output. For 0, 1 line Input 1 is selected, 1, 0 line Input 2 is selected, and 1, 1 line Input 3 is selected. (Note: S1 and S0 is counting in binary from 0 to 3 relates to lines Input 0 to Input 3.) Enable (E) must be high for anything to happen. The diagram on the left will be used to represent a multiplexer. The number of input lines = 2n, where n is the number of selection lines. E = 0 causes a high impedance and nothing passes, no matter value of S

4 to 1 Multiplexer Schematic with Active Low Enable
This active low enable is the same as the previous explanation, except the enable must be low to allow the multiplexer to be active.

4 to 1 Multiplexer Constructed Using 2 to 1 Multiplexers
Multiplexers can be combined to handle multiple inputs. In the first two multiplexers, S0 is used to select one of two inputs. Then S1 is used to select the outputs for the first two multiplexers as input to the third multiplexer.

Data and Program Representation
Chapter Three Data and Program Representation Dr. Chuck Lillie

The Variety of Processors and Computational Engines
CSC 3650 Introduction to Computer Architecture Time: 3:30 to 6:30 Meeting Days: W Location: Oxendine 1237B Textbook: Essentials of Computer Architecture, Author: Douglas E. Comer, 2005, Pearson Prentice Hall Spring 2011 Chapter Four The Variety of Processors and Computational Engines Dr. Chuck Lillie

State Tables State tables describe the activities of a state diagram.
State tables describe the activities of a state diagram. This Table shows (a) a generic state table which consists of the present state, the inputs, the next state based on the inputs and the current state, and the outputs based on the inputs and the current state. (b) is the state table for the alarm clock problem and (c) is the state table for the alarm clock problem with inaction. For a present state of Asleep, and the input is the Alarm in the on position, and you do not care if it is a weekday, the next state will be Awake in bed, and the output will be to turn off the alarm. If you are in state Awake in bed and the alarm is off and it is a weekday the next state is to remain awake and get out of bed with the output being the not to turn off the alarm. If you are awake in bed and the alarm is off and it is not a weekday the next state is to go back to sleep and the output is not to turn off the alarm.

State Diagrams for Alarm Clock Example
Moore Machine Arrows are transitions Labels on arrows are inputs Outputs are associated with states 1 indicates that turn off alarm is yes 0 indicates that turn off alarm is no (b) Mealy Machine Outputs associated with transitions Labels on directed arrows indicates input/output State diagram for alarm clock problem modeled as Moore Machine. The circles or ellipses are the states and the arrows are the transitions. The labels on the arrows are the input. The outputs are associated with the states. 1 indicates that the turn off alarm is yes and 0 indicates that turn off alarm is no. State diagram for alarm clock problem modeled as Mealy Machine. For the Mealy Machine the outputs are associated with the transitions, so the labels on the directed arrows indicate the input/output.

State Tables for JK Flip-Flop
Flip-flop has two states, Y and Z, and two inputs J and K For given present state Y, and an input 0,0, the next state is Y and the output is 0 For present state Z, and inupt 0,0, the next state is Z and the output is 1 The flip-flop has two states, Y and Z, and two inputs, J and K. For a given present state and a given input, the next state is determined along with the output Q. For a present state Y, and an input 0,0, the next state is Y and the output is 0. For a present state of Z, and an input 0, 0, the next state is Z and the output is 1. (b) is a reduced version of (a) that shows the don’t care states, that is X indicates we don’t care the input on K or J. For example, look at the first two lines in (a). The same next state and output whether K is 0 or 1, as long as J is 0. Same thing for the third and fourth lines, it doesn’t matter the input of K as long as the input for J is 1 and the current state is Y, the next state will be Z and the output will be 1.

State Tables for JK Flip-Flop
Reduced version of previous state table Don’t care conditions When in Y, it doesn’t matter the value of K, the next state is always Y with output of 0 for J = 0, and Z, 1 for J = 1 When in Z, it doesn’t matter the value of J, the next state and output are determined by K The flip-flop has two states, Y and Z, and two inputs, J and K. For a given present state and a given input, the next state is determined along with the output Q. For a present state Y, and an input 0,0, the next state is Y and the output is 0. For a present state of Z, and an input 0, 0, the next state is Z and the output is 1. (b) is a reduced version of (a) that shows the don’t care states, that is X indicates we don’t care the input on K or J. For example, look at the first two lines in (a). The same next state and output whether K is 0 or 1, as long as J is 0. Same thing for the third and fourth lines, it doesn’t matter the input of K as long as the input for J is 1 and the current state is Y, the next state will be Z and the output will be 1.

State Diagrams for J-K Flip-Flop
If you are in state Y and J = 0 and K = X (don’t care) will go back to state Y and output 0. If in state Y and J = 1 and K = X (don’t care), will go to state Z and output 1. If in state Z and K = 0 and J = X, go back to state Z and output 1. If in state Z and K = 1 and J = X will go to state Y and output 0. This is a Moore Machine model because the output is part of the states not the transitions. Conditions in terms of J and K: J and J’, K and K’. Conditions with values of J and K: 1X and X1 and X0 and 0X If you are in state Y and J = 0 and K = X (don’t care) will go back to state Y and output 0. If in state Y and J = 1 and K = X (don’t care), will go to state Z and output 1. If in state Z and K = 0 and J = X, go back to state Z and output 1. If in state Z and K = 1 and J = X will go to state Y and output 0. This is a Moore Machine model because the output is part of the states not the transitions.

JK Flip-Flop with Modified Outputs
(b) State Diagram (a) State Table In this example, the output is 1 if the state changes and 0 if the machine remains in the same state. This is best modeled with a Mealy machine, because the output occurs with the transition. For this to be modeled with a Moore machine additional states would have to be present to represent this system. In this example, the output is 1 if the state changes and 0 if the machine remains in the same state. This is best modeled with a Mealy machine, because the output occurs with the transition. For this to be modeled with a Moore machine additional states would have to be present to represent this system.

Modulo 6 Counter Counts from 0 to 5 and starts over
U is input, 0 is indicates no change, 1 is increment counter C used to indicate when counter goes from 101 to 000 Values 110 and 111 not used This is the state diagram for a modulo 6 counter. A modulo 6 counter is a three bit counter that counts through the sequence 000  001  010  011  100  101  000  001  … The counter counts from 0 to 5 and starts over with 0. We need 3 bits because the binary representation of 5 is The values 110 and 111 are not used. We have six states, S0, S1, S2, S3, S4, S5, and they progress S0  S1  S2  S3  S4  S5  S0 … The input value U controls the counter. When U = 1, the counter is incremented on the up tick of the clock, but if U = 0 the counter does not change its value (state). The counter starts in state S0. There are two inputs, U = 0 and U = 1. If U = 0 , the next state is S0 and the output V0V1V2 is The C output is used to designate when the counter goes from 101 to 000 and C remains 1 until the counter goes to 001. In state S0 with an input of U = 0, the counter remains in state S0 and outputs C = 1 and V0V1V2 = 000. In state S0 with an input of U = 1, the counter goes to state S1 and outputs C = 0 and V0V1V2 = 001. In state S1 with an input of U = 0, the counter remains in state S1 and outputs C = 0 and V0V1V2 = 001. In state S1 with an input of U = 1, the counter toes to state S2 and outputs C = 0 and V0V1V2 = 010. The rest of the states do similar things.

(a) State Diagram for Modulo 6 Counter Modeled as Mealy Machine
With the Mealy machine the outputs are associated with the transitions. From S0 to S1 requires an input of 1 and an output of So the values on the arc of the diagram are 1/0001 corresponding to U/CV0V1V2.

(b) State Diagram for Modulo 6 Counter Modeled as Moore Machine
The Moore machine state is where the output occurs. So for the modulo 6 counter the Moore machine is represented in the above figure. In this case, we use the notation U to represent U = 1 and U’ to represent U = 0. Each state represents the output so C = 1 V = 000 is the output for state S0 and C = 0 V = 001 is the output for S1.

State Table for String Checker
A string checker inputs a string of bits, one bit per clock cycle. When the previous three bits form the pattern 110, it sets output M = 1; otherwise M = 0. The system checks bits 1, 2, 3 then 2, 3, 4 then 3, 4, 5. Etc. A string checker inputs a string of bits, one bit per clock cycle. When the previous three bits form the pattern 110, it sets output M = 1; otherwise M = 0. The system checks bits 1, 2, 3 then 2, 3, 4 then 3, 4, 5. Etc. There are eight possible states, S0 to S7. Each state represents the current bit pattern, 101 is represented by S5, 111 by S7. If you are in state S5 (101), the left 1 is shifted out (giving you 01_) and you will go to state S2 (010) if the input is 0 and state S3 (011) if the input is 1.

State Diagrams for String Checker
These two state diagrams represent the Mealy machine and Moore machine for the string checker. Moore Mealy

State Table for String Checker with Revised State Assignments.
Only 4 states needed This is a more efficient state table for the string checker. It is not necessary to have eight states, only four states, if you use each state to determine where to go next. In state S0 there are no current bits so an input of 0 sends you back to S0 but an input of 1 send you to state S1 because you have the beginning of the desired string. In state S1 an input of 0 gives you 10 and it doesn’t matter the next input you need to go back to S0 to reset. If the input is 1 then you have the first two bits set to 1 and go to state S2.

Practical Perspective: Mealy and Moore State Diagrams for Revised String Checker
These are the revised state diagrams for the revised string checker. Notice the simplicity compared to the state diagrams in Figure 2.6.

States for the Tool Booth Controller
R: light is red G: light is green A: Alarm sounded This example is a toll booth controller. There are 10 states, SNOCAR is when the toll booth is empty., S0 is the state that a car is in the toll booth but has not made a payment. S5 is when the car is in the toll booth and has paid 5 cents of the 35 cents price. S10 is the state that the driver has paid 10 cents of the 35 cents total. S15 the driver has paid 15 cents. S20, the driver has paid 20 cents. S25 the driver has paid 25 cents. S30 the driver has paid 30 cents. SPAID the drive has paid at least the required amount. SCHEAT the drive has left the toll booth without paying the required amount.

State Table for Toll Booth Controller
When a nickel is paid I1I0 = 01. When a dime is paid I1I0 = 10. When a quarter is paid I1I0 = 11. When nothing new is paid I1I0 = 00. When C = 0 the car leaves the toll booth. When C = 1 a car is in the toll booth. Notice there are three inputs for each state and each state except SNOCAR, SCHEAT, and SPAID have four options. When a toll booth is in state S0 and C = 0 (no car), it doesn’t matter what I1I0 is, the next state is SCHEAT because the car departed the toll booth without paying and outputs a red light (R=1) and activates an alarm (A=1). When a toll booth is in state S20 and C = 1 (car is still in toll booth) and the drive deposits 25 cents (I1I0 = 11) the next state is SPAID and he green light is lit (G = 1), no red light (R=0) and no alarm (A=0). Step through each state evaluating the input and determine the next state and the output (RGA)

State Diagram for Moore Machine for Toll Booth Controller
This is the state diagram for the toll booth controller. This is the Moore machine so the output is associated with each state. Note that only when the output is = 1 is it noted on the diagram. Also note that if I1I0 is XX it is not included on the diagram. Compare this to the state table and trace each state in the state table with the state and transition in the state diagram.

(a) State Diagram for Mealy Machine for the Modulo 6 Counter
This is the Mealy machine for the modulo 6 counter from an earlier example. Notice that the state names have been changed from S0 to 000, S1 to 001, S2 to 010, etc. Three bits are chosen to identify the state because there are six states and that requires 3 bits (23 = 8, so we could have up to 8 states). This naming system is used because it identifies the states based on the output. Notice that state 100 is reached whenever the output is This will help minimize the logic needed ot generate the output and next state values. (Basically using the same logic for both.) Use 3 bits to identify the state because there are six states Identifies state based on output (state 100 is reached when output is 0100

(b) State Diagram for the Moore Machine for the Modulo 6 Counter
Same thing for the Moore machine, that is, the state identification is the same as the output for that state.

(a) Generic Mealy Machine
This is a generic Mealy machine. The register is used to store the current state value. The inputs are used to set the output and select the next state. The current state is used to set the output and select the next state. So the Next state logic is set by the values of the inputs and current state. The outputs are generated based on the inputs and the current state. Register used to store current state value Inputs used to set the output and select the next state

(b) Mealy Machine for Modulo 6 Counter
The input for the modulo 6 counter is U (0 or 1) and the outputs are C and V2, V1, V0. /3 indicates that the path is a 3 bit path. The register is loaded whenever the clock is in the up-tick. Since we chose our state names to correspond to the output values, the logic for both is the same so we can use the same logic circuit. Since we chose our state names to correspond to the output values, the logic for both is the same so we can use the same logic circuit.

(a) Generic Moore Machine
This is a generic Moore machine. The inputs and the current state determine the next state. The next state determines the output (recall that in a Moore machine the output is identified with a state). The current state is stored in the register. The output logic and next state logic are two different circuits in the Moore machine.

(b) Moore Machine Implementation of Modulo 6 Counter
The Moore machine requires that circuits be designed for both next state logic and output logic. The input U along with the current state determines the next state. The outputs V2, V1, V0 and C are determined by next state. If we choose the next state identifier to be the same as the output then no circuitry is needed.

State Table for Modulo 6 Counter
This is the state table for the modulo 6 counter. There are two possible inputs for U, 0 or 1, so each state has two choices. If U = 0 stay in the current state, if U = 1 to to the next state. P2P1P0 represents the current state. N2N1N0 represents the next state. From the state diagram in Figure 2.8, state 000 go to 000 with U = 0 and to state 001 with U = 1. The inputs are P2P1P0 and U.

Karnaugh Maps (K-maps) for the Next State of Modulo 6 Counter
Using the state table in Table 2.7, build the K-maps depicted above. Notice that P2P1 run along the left side and P0U along the top. The 0 in the upper left cell of the K-map for N2 is because N2 in the state table is 0 for P2P1P0U = The 1 in the lower right cell of the K-map for N2 is because N2 in the state table is 1 for P2P1P0U = The X in the left third down cell of the K-map for N2 is because N2 in the state table is don’t care (X) for P2P1P0U = 1100 (in fact, 1100 is not part of the state table because it will never occur. The X is used to reduce the equations. Grouping the 1 and X into two, four or eight give the groupings as shown in the above diagram. The equations for each K-map: N2 = P2P0’ + P2U’ + P1P0U N1 = P1P0’ + P1U’ + P2’P1’P0U N0 = P0’U + P0U’ N1 = P1P0’ + P1U’ + P2’P1’P0U N0 = P0’U + P0U’ N2 = P2P0’ + P2U’ + P1P0U

Next State Logic for Modulo 6 Counter
These are the NOT gates, AND gates, and OR gates for the modulo 6 counter designed on previous chart. The inputs are on top, P2, P1, P0, U and the next state is to the right with N2, N1, N0.

Two State Tables for the Modulo 6 Counter Divided by Input U
Another approach is to designate the next state logic using multiplexers. Each input to the multiplexer corresponds to the next state under one possible value of the system inputs. The inputs drive the select signals of the multiplexer. This is the truth tables for the new design using the input U to break into two truth tables. These tables are put into K-maps (see next chart) to produce the equations. The equations are: For U = 0 N2 = P2 N1 = P1 N0 = P0 For U = 1 N2 = P2P0’ + P1P0 N1 = P1P0’ + P2’P1’P0 N0 = P0’

K-Maps for Modulo 6 Counter for Multiplexer
These are the K-maps for the modulo 6 counter for multiplex implementation. Notice how the equations are derived for each K-map.

Preliminary Implementation of the Next State Logic for the Modulo 6 Counter Using a Multiplexer
This is the multiplexer to implement the modulo 6 counter. U can be 0 or 1. If U is 0, the top input is selected for output. If U is 1, the bottom input is selected for output.

Final Implementation of the Next State Logic for Modulo 6 Counter Using Multiplexer
U = 0, this input is selected This is the diagram with the implementation for the modulo 6 counter. The input is determined by the circuit to the left of the multiplexer. Based on the value of U, the upper or lower inputs are selected. U = 1, this input is selected

Modulo 6 Counter Implemented with a Look-Up ROM
P2 P1 P0 U The present state value and all inputs are connected to the address inputs for the ROM. From chart (b), address 0000, which is P2P1P0U, address 8 is 1000, and address 11 is 1011. If address 0 is selected (0000) the output is 000, which is the next state for present state 000 with U = 0. If address 5 is selected (0101) the output is 011, which is the next state for current state 010 and U = 1. If address 6 is selected (0110) the output is 011, which is the next state for present state 011 with U = 0.

Output for Modulo 6 Counter
This is the output for the modulo 6 counter. Note that the Moore machine output is less than the Mealy machine output because in the Moore machine the output is tied to the state while in the Mealy machine the output is tied to the transition. See the next page for the K-maps and equation derivation.

K-maps for Output of Modulo 6 Counter – Mealy Machine
These are the K-maps for both the Mealy machine and Moore machine. By this point you should be able to retrieve the reduced equations from the K-maps.

Mealy Machine Implementation of Modulo 6 Counter
This is the Mealy machine implementation of the modulo 6 counter. Notice that the next state logic is what was develop and shown in Figure The output is the same as the next state plus the value for C which indicates when the counter goes from 5 to 0 (C is set to 1 when counter goes from 5 to 0 and remains set to 1 until the counter goes to 1). V2V1V0 has same value as next state because we chose our next state identity to be the same as the output. This way we can use the same circuit for the next state as for the output.

Moore Machine Implementation of Modulo 6 Counter
This is the Moore machine implementation of the modulo 6 counter. Note that the next state logic is the same as in Figure This implementation uses registers to store the next state that becomes the current state.

(a) Moore Machine Implementation of Modulo 6 Counter Using Lookup ROM
The data stored in the ROM is the address of the next state as well as the output. The current state is used to determine the address on the ROM to retrieve the data. The next page contains a chart showing the data stored at each address.

(b)ROM Data for Moore Machine Implementation of Modulo 6 Counter
This is the ROM layout for the modulo 6 counter. Notice that the address is made up of the current state identifier and U (count/ do not count vartiable). For example, address 3 (0011) generates output 0100 (N2N1N0C).

(a) Alternative Design using Counter and Decoder to Implement Modulo 6 Counter

(b) Moore Machine Implementation of Modulo 6 Counter Using Counter and Decoder
In this case, the counter is used to store the next state. The next state is determined by incrementing the counter or clearing the counter (to go from 5 to 0). The decoder has three inputs and select one of the outputs. If the input is 000, the 0 output is selected which set C = 1 and V2V1V0 are set to 000, and a signal is placed on the INC line of the Counter. The Counter is cleared when the output from the Decoder is line 5 (this resets the counter to 000).

State Table for String Checker
K-Maps for Next State and Output M 1 There are four inputs, P2P1P0 (present state) and I. The K-maps that generate the equations for next state and output are to the left of the state table. The output M equals 1 only when the machine is in state S6.

Moore Machine Implementation of Eight-State String Checker
This is the Moore machine implementation of he string checker. N0 becomes I, N1 becomes what was on P0 line, and N2 becomes what was on P1 line.

(a) Moore Modulo 6 Counter Including Unused States
So far we have ignored what would happen if the machine entered the unused state (also known as unknown state or undefined state). If the machine does inadvertently enter the unused state it could cause problems. This could happen during power up operations. This figure show one way to handle this problem. If the machine does end up in either state 110 or state 111 it simple traverses those state and to to state At the worst case, the machine may take two cycles to get to an acceptable state.

(b) Invalid Design for Modulo 6 Counter with Unused States
This diagram illustrates an unacceptable design. In this case, if the machine ends up in an unknown state, it will stay there forever. That is not acceptable.

Dummy States for Unused State Values
In this design, dummy states are established to account for the unused states just in case the machine somehow is in those states. In these cases, the machine just goes to the 000 (or start) state.

State Table for Modulo 6 Counter with Unused States
This is the state table for the design accounting for the unused states. As you will see, a more complex circuit is derived. The following page has the K-maps for this design.

K-Maps for Modulo 6 Counter with Unused States
These are the K-maps for the modulo 6 counter with the unused states. Notice that the equations are more complex. NOTE: the equation in the textbook, page 86, N2 = P2P1’P0’ + P2P1’U’ + P1P0U is incorrect, it should be N2 = P2P1’P0’ + P2P1’U’ + P2’P1P0U

Generic Asynchronous Moore Machine
At times it is desirable to have an asynchronous machine to implement our design. Since it does not depend on clock input, this is a much faster circuit. But the problem is that one input may be ready before the others, and incorrect states will be realized.

A preliminary diagram showing an asynchronous Moore machine for the modulo 6 counter. The timing problem has not been addressed.

State Values and State Diagram for Asynchronous Modulo 6 Counter
One way to solve the timing problem is to use Gray codes to number the states. This way, only one variable is changing at a time. This is the state values and state diagram for the new design using Gray codes. Notice that state 000  001  011  111  110  100  000 and only one bit is changing at a time.

State Table for Modulo 6 Counter with Revised State Values
This is the state table for the asynchronous design. The K-maps to derive the equations are on the next page.

K-Maps for Modulo 6 Counter Asynchronous Design

Hardware Implementation of Modulo 6 Counter – Asynchronous Design
This design will work properly without a clock, but it does not know when to stop. Since it is not dependent on the clock, as soon as the data is available at the input, it will be processed. This problem can be resolved, but at the expense of adding complexity to the circuit. The state diagram with the extra states is on the next page.

Revised State Diagram for Modulo 6 Counter
The extra states are added to ensure that the states are changed only when U is properly set. In this case, a state will be changed not when U = 1 but on the rising edge when U is changed.

Converting a Mealy State Diagram to a Moore State Diagram

Processor Types and Instruction Sets
Chapter Five Processor Types and Instruction Sets Dr. Chuck Lillie

Instruction Set Set of operations the hardware recognizes
Representation the hardware uses for each operation The set of operations a processor provides represents a tradeoff among the copst of the hardware, the convenience for a programmer, and engineering considerations such as power consumption

Program Flow in Computer

Flowchart to execute assembly language program

Java Program Execution

Instruction Set Illustration

Instruction Formats

Variable-Length vs Fixed-Length Instructions
Makes optimal use of memory Requires complex hardware to decode Fixed-Length Requires less complex hardware Processor can operate at higher speeds Can fetch and decode instruction without examining opcode

Registers General Purpose Fixed size Supports fetch and store
Acts as temporary storage facility Small number of registers, < 100 Usually large enough to hold an integer Processor does 32 bit arithmetic, registers have 32 bits Numbered from 0 to N-1

Registers Programming with Registers
Operands stored in general purpose registers Place results in general purpose registers Must move value to registers and from registers load a copy of X into register 3 Load a copyh of Y into register 6 Add the value in register 3 to the value in register 6 and place the result in register 7 Store a copy of the value in register 7 in Z

Operands from an instruction must come from different banks

Since operands must come from different banks, this presents a problem
X and Y must be in separate banks Z and X must be in different banks So either Y or Z will have to be moved to complete T From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved.

Complex and Reduced Instruction Sets
Complex Instruction Set Computer (CISC) Includes many instructions (hundreds) Each instruction can perform an arbitrarily complex computation Intel’s Pentium is CISC Provides hundreds of instructions Complex instructions that require a long time to complete Instructions that manipulate graphics in memory, instructions to compute sine and cosine functions

Complex and Reduced Instruction Sets
Reduced Instruction Set Computer (RISC) Minimum set of instructions sufficient for all computations, around 32 Each instruction performs a basic computation Instructions are fixed size Execute instruction in one clock cycle Motorola’s MIPS processor, had 32 instructions and each takes only one clock cycle

Fetch instruction Examine opcode Fetch operands Perform operations Store results Although a RISC processor cannot perform all steps of the fetch-execute in a single clock cycle, an instruction pipeline with parallel hardware provides approximately the same performance once the pipeline is full, one instruction completes on every clock cycle From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved.

Other Causes of Stalls Any instruction that delays processing or disrupts the normal flow Accesses external storage Invokes a coprocessor Branches to a new location Calls a subroutine

Delay D  subtract E C until C is available

Add a feature to the processor to detect the stall
Sends the output from Instruction K directly to Instruction K + 1 From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved.

Types of Operations Instructions are divided into basic categories
Arithmetic instructions (integer arithmetic) Logical instructions (also called Boolean) Data access and transfer instructions Conditional and unconditional branch instructions Floating point instructions Processor control instructions

Data movement instructions for the 8085 microprocessor

Data Operation instructions for the 8085 microprocessor

Program Control instructions for the 8085 microprocessor

Program Counter, Fetch-Execute, and Branching
Program counter: used to store the location of the next instruction in memory Start the fetch-execute cycle by getting the address of the next instruction in memory from the program counter Once the instruction is fetched, update program counter

Algorithm used to move through the fetch-execute cycle
Assign the program counter an intial program address. Repeat forever { Fetch: access the next step of the program from the location given by the program counter. Set an internal address register, A, to the address beyond the instruction that was just fetched Execute: Perform the step of the program Copy the contents of address register A to the program counter

Subroutine Calls, Arguments, and Register Windows
Two basic methods to pass parameters Store them in memory, eg, put on a stack Could be slow Use registers Faster, but limited number which may cause conflict with operands Could use a register window Subset of registers used to pass parameters

Registers are numbered from 0 through the window size – 1
Program places the parameters in registers 4 – 7 Subroutine gets the parameters from its registers 0 – 3 xi only available to main program, In only to subroutine From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved.

Operand Addressing and Instruction Representation
Chapter Six Operand Addressing and Instruction Representation Dr. Chuck Lillie

Logic Symbols for Buffers
A tri-state buffer is used to control the flow of current through a line. Current will flow through the buffer unless a voltage is placed on the enable line. These are the symbols for a tri-state buffer: Regular buffer: allows current to flow. Tri-state buffer with active high enable: enough positive voltage is applied to stop the current from flowing. Tri-state buffer with active low enable: enough negative voltage is applied to stop the current from flowing.

Truth Tables for Buffers
Truth table for regular buffer: 0 input produces a 0 output; 1 input produces 1 output. In essence, whatever goes in comes out unimpeded with a boosted current. Truth table for tri-state buffer with active high enable: If E is 1 (high voltage), the buffer is activated and the value is passed through (if input is 1, 1 is output, if input is 0, 0 is output); if E is 0, the buffer is disabled and the output is a high impedance state Z (in essence nothing is allowed through), regardless of the inputs (which is indicates as X (don’t care what the input is). Truth table for tri-state buffer with active low enable: If E is 0 (negative voltage), the buffer is activated and the value is passed through. If E is 1, a high impedance state is output.

(a) 4 to 1 Multiplexer Circuit (n inputs, 1 output)
We will start our investigation of more complex circuits, that is, more than just a gate, by looking at combinatorial logic circuits. Later we will look at sequential logic circuits. A combinatorial logic circuit depends solely on its current inputs. When certain input values are set a combinatorial logic circuit output values corresponding to the input values. When the input values change, the output values change. A multiplexer is a combinatorial logic circuit. A multiplexer has n inputs and one output. In this example, there are 4 inputs and one output. The output is selected based on the selection lines S1 and S0. This is the circuit diagram for a multiplexer, that is, the basic logic gates that make up the multiplexer.

(b) 4 to 1 Multiplexer Schematic with Active High Enable
This is the schematic for a multiplexer with four inputs and one output. Since it has four inputs, it need two selection lines to determine which input to select to send to the output. The number of input lines = 2n, where n is the number of selection lines. For an active high enable, the output is shown in the truth table. If the enable (E) is 0, it doesn’t matter if the input is 0 or 1 , a high impedance output (in essence, nothing is allowed to pass) is selected. Notice S1 and S0. For input of 0, 0 line Input 0 is selected as output. For 0, 1 line Input 1 is selected, 1, 0 line Input 2 is selected, and 1, 1 line Input 3 is selected. (Note: S1 and S0 is counting in binary from 0 to 3 relates to lines Input 0 to Input 3.) Enable (E) must be high for anything to happen. The diagram on the left will be used to represent a multiplexer.

(c) 4 to 1 Multiplexer Schematic with Active Low Enable
This active low enable is the same as the previous explanation, except the enable must be low to allow the multiplexer to be active.

4 to 1 Multiplexer Constructed Using 2 to 1 Multiplexers
Multiplexers can be combined to handle multiple inputs. In the first two multiplexers, S0 is used to select one of two inputs. Then S1 is used to select the outputs for the first two multiplexers as input to the third multiplexer.

(a) Circuit for 2 to 4 Decoder (n inputs 2n outputs)
Selects a value and decodes it. It has n inputs and 2n outputs. Each output represents one minterm of the inputs. A decoder with three inputs and 8 outputs will active output 6 with an input of 110. For input 11, activate output line 3

(b) 2 to 4 Decoder with Active High Enable
This decoder evaluates inputs S1, S0 and selects one of four output lines. The truth table is on the right. If E is 0 nothing happens no matter the input. If E = 1, then output line is selected based on S1 and S0.

(c) 2 to 4 Decoder with Active Low Enable
Same as previous Decoder, except E must be 0 for anything to happen.

(a) 4 to 2 Encoder (only one input active at a time)
An encoder receives 2n inputs and outputs an n-bit value. One restriction is that only one input can be active at the same time. If can have more than one input active at the same time need to use a priority encoder. For input 0010, output S1 = 1, S0 =0 2n inputs, outputs an n-bit value Only one input bit active

(b) 4 to 2 Encoder with Active High Enable
V is needed to indicate whether any of the input value are activated. Notice that if the input is 0000 (that is no activated input value) the output is the same (00) and if the input was Need to V value output to tell us if there were any input values.

(c) 4 to 2 Encoder with Active Low Enable
This is just the active low enable encoder.

(a) 4 to 2 Priority Encoder (if more than one input active at a time)
A priority encoder is used if more than one input can be active at the same time. If more than one input is active, the output is set to correspond to the highest input. Output corresponds to highest input

(b) and (c) 4 to 2 Priority Encoder
(b) is just another implementation of the priority encoder on the previous chart.

One-bit Comparator Compares xi to yi Selects one of the output lines
This is the circuit for a one-bit comparator and the truth table. Xi is compared to Yi. One of three lines is selected, X>Y, X=Y, or X<Y. Walk through the circuit: if X = 0 and Y = 0, the middle XNOR gate has the only high output (the top AND gate will be 0 because X is 0; the bottom AND gate will be 0 because Y is 0). For X = 1 and Y = 1, the top AND gate will be 0 because NOT Y will be 0 and the bottom AND gate will be 0 because NOT X will be 0. If X > Y then X must be 1 and Y must be 0 so the top AND gate will output a 1 but the bottom AND gage will output a 0 while the middle XNOR gate will also output a 0 (look up the truth table for XNOR gate). Compares xi to yi Selects one of the output lines

1-Bit Comparator with Propagated Inputs
The output from a previous one-bit comparison is used as input to the next bits in the comparison.

n-Bit Comparator Constructed Using 1-Bit Comparators
The comparators must be initialized on the left with inputs of 0, 1, 0 indicating that the inputs are equal. Try this for two three-bit numbers, for example 101 compared to The first comparator (on the left) will have an output of 0, 1, 0 because Xn-1 = Yn-1. Output from the second comparator (in the middle) will be 010 because Xn-2 = Yn-2. The final output from the last comparator (on the right) will be X>Y because X0 is greater than Y0.

Half Adder For X = 1 and Y = 1, (1 = 1 = 10), get C = 1 and S = 0
Inputs two one-bit values and outputs their two-bit sum, carry (C) and sum (S). Walk through the circuit. When X = 0 and Y = 0 the AND gate will set C = 0 and the XOR gate will set S = 0. For X = 0 and Y = 1, the AND will set C = 0 (because of X = 0) and S = 1 (check out the truth table for XOR). For X = 1 and Y = 1, C = 1 (1 AND 1 = 1) and S = 0 (1 XOR 1 = 0) which is what we want = 10 which is S = 0 with C = 1. For X = 1 and Y = 1, (1 = 1 = 10), get C = 1 and S = 0

Full Adder (add more than 1-bit numbers)
Need to add more than 1-bit numbers. The full adder allows us to do that. This is a circuit diagram, schematic, and truth table for a full adder. Once again, have two inputs X and Y, but this time need to consider the carry bit from a previous operation (that is, carry in Cin). Since we have three variables (X, Y, C) as input have 8 possible conditions (28). Need a C and S output for each input condition.

Four-Bit Adder Using Full Adder
Combined 4 one-bit adders to create a 4-bit adder. The schematic for a four-bit adder. Note the input X and Y with a / and 4. That indicates that this input line represents four bits. The output also does the same thing because this schematic represents an output of four bits. There is a carry in Cin and a carry out Cout as well. The initial carry in (on right) is 0.

Full Subtracter Note that with a subtracter we have a borrow (Bi+1) instead of a carry bit. Same logic as a full adder.

Memory Chip Memory is a group of circuits used to store data. It has a number of memory locations, each of which stores a binary value (0 or 1). Depending on the material used to build the chip, memory can be read only or read/write. Read only memory chips are referred to as ROM and read/write chips are referred to as RAM (although both types of chips can be randomly accessed). The address inputs of a memory chip chooses one of its locations and places the data on the output line. is a ROM chip: Read Only Memory. Once the information is placed in the chip it cannot be changed or updated. ROM chips are used for things like basic input/output systems (BIOS) or things that will not change once they are set. This chip has as address space consisting of n bits, thus the n input lines and a data storage capability of m bits, the m output lines. The two control lines are CE, chip enable, causes the chip to be selected, and OE, output enable, causing the data to be placed on the output lines. The address is broadcast to every chip in a particularly memory group, but only the chips with the CE line high will be selected for reading, and only the data with the OE high will be allowed to pass through the output buffer. is a RAM chip: Random Access Memory. Information can be changed and updated many times. RAM is also referred to as read/write memory and is considered volatile (will loose information when power is removed). This chips is used for cache and primary memory. This chip also has an address space of n bits and a data storage capability of m bits. However, the two control lines are CS for chip select and R/W’ for read (when line is high) and write (when line is low – the W’). The address is broadcast to every chip in a particularly memory group, but only the chips with the CS line high will be selected for activity (either read or write), and a read is performed if R/W’ is high, otherwise a write is performed. The address is broadcast to every chip in the memory but only the chips with the CE or CS high will be selected for reading

7-Segment LED Display This is an example of combinatorial logic. This is used in circuits to display decimal digits from binary coded decimals data. Is a layout of the figures, that is, all possible lights that are represented by letters a to g. shows which lights are lit for which decimal number from 0 to 9. is the truth table for the decimal numbers 0 to 9. Input lines are x3, x2, x1, x0. So to activate the lights to display 0, x3 = 0, x2 = 0, x1 = 0 and x0 = 0. The lights that must be activated are a, b, c, d, e, f, and they get an entry of 1 in the first line of the table. The light that must be off is g so it has an entry of 0 in the first line of the table. To display the number 5 (which is 0101 binary) lights a, c, d, f, g must be lit (set to 1) and lights b, e must be off (set to 0). See line 6 in the table. We need four bits to represent 10 entries (0 to 9). But we could have up to 16 entries (24 = 16), so we have six don’t care conditions. This will help us reduce our equations when we get to the K-Map phase.

(a) Design for LED segments
Here are two K-Maps for segments b and c of the LED problem. Notice the entries in the K-Map for segment b. For each x3x2x1x0 value associated with a segment b LED, a 1, 0, or X is entered in the K-Map that corresponds to the value in the truth table on the previous chart. Also notice that X entries (don’t care conditions) are for x3x2x’1x’0, x3x2x’1x0, x3x2x1x0, x3x2x’1x0, x3x’2x1x0, x3x’2x1x’0. This will be the same for all K-Maps. The 1s are grouped in the largest groups possible that are powers of 2. For segment b there are two groupings of 4 and one grouping of 8 (notice the wrap around to get the grouping of 8. For segment c there are three groupings of 8. Segment b equation is: x’2 + x’1x’0 + x1x0 Segment c equation is: x’1 + x0 + x2

(b) Circuits to implement Segments b and c
To light segments b and c, inputs from x2, x1, and x0 are needed. Segment b contains one OR gate, two AND gates, and three NOT gates. Segment c contains one OR gate and one NOT gate. x’2 + x’1x’0 + x1x0 x’1 + x0 + x2

(a) Two Input Compare and Swap
Comparator X>Y X>Y X≤Y X>Y X≤Y X>Y X≤Y X≤Y X3 Y3 X2 Y2 X1 Y1 X0 Y0 X=Y X and Y are four bit numbers. Both numbers are sent to the 4-bit comparator as well as both 4-bit multiplexers. If X is greater than Y the upper MUX selects input line 1 which is the X value to be output and the lower MUX selects the Y value because the X<Y line out of the comparator is set to 0 so the lower MUX selects the 0 line which is the Y value. The callout shows the makeup of the 4-bit comparator.

(b) Four Input Data Sorter
Comparator 1 Comparator 3 Comparator 5 Each box in this diagram contains the circuitry from the previous chart, that is the comparator and the MUXes. By arranging them in the order shown on the chart, four 4-bit numbers can be arranged in order with the largest on the top, the next to largest second from the top, the next to smallest the second from the bottom and the smallest on the bottom. Try inputting the numbers a = 1, b = 2, c = 3, d = 4. The output from Comparator 1 will be 2 goes to X in comparator 3 and 1 goes to X in Comparator 4. The output from Comparator 2 will be 3 goes to Y in Comparator 4 and 4 goes to Y in Comparator 3. The output from Comparator 3 will be 4 from Max and 3 to input X in Comparator 5. The output form Comparator 4 will be 1 to Min output and 2 to Y input in Comparator 5. The output from Comparator 5 will be 3 to Max and 2 to Min. Comparator 2 Comparator 4

A Typical Clock Sequence
For a 1GH processor this interval is 1 nanosecond We will now move to sequential components. Unlike combinatorial components, sequential components can retain their output values even when their input values change. Most sequential components have a clock input. This diagram is a typical clock sequence which alternates between o and 1 based on the clock speed of the computer. The clock is used to synchronize the flow of data in a digital system. A clock cycle typically starts on the up-tic of the clock (which, by the way, is not as horizontal as this diagram makes it out to be) and ends just before the next up-tick. The time from the start of the clock cycle to the end of the clock cycle is the clock period and determines the speed of the computer. So a computer with a speed of 1 GHz (1 X 109) has a clock period of 1/109 seconds or 1 nanoseconds. So, the fastest this 1GH processor can execute an instruction is 1 nanosecond, that is, only if the instruction takes only one step. Most instructions require more than one step, and some (like the multiply) could require up to 50 steps, thus 50 nanoseconds. Start clock cycle End clock cycle

(a) Positive Edge Triggered D Flip Flop
This is a positive edge triggered D flip flop. A value is placed on the input line D. The positive edge of the clock (clock changes from 0 to 1) activates the flip flop and if the LD (load signal is high) the input from D is made available to the output via Q (the value input) or Q’ (the compliment of the value input). The truth table is next to the diagram. Notice that as long as LD is 0, the output remains the same, Q0. And as long as there is not clock input not , the output remains the same, Q0. The output will only change if LD is 1 and the clock is in the up-tic, . For a D flip flop the clock must be in up-tic and LD must be 1 for the output to change to the value of the input, otherwise the output value remains the same.

(b) Positive Level Triggered D Latch
A latch works similar to a flip flop, except as long as LD and the clock inputs are 1 the value of output is set to the value of the input D, even though D changes while the clock and LD are 1. Notice the difference, the latch will continue to change as long as the clock and LD are 1 while the flip flop is set to the input value only when LD is 1 and the clock is in the up-tic.

Positive Level Triggered D Latch with Set and Clear
Some latches and flip flops have set and clear lines. SET makes the output Q = 1 and CLR (clear) makes the output Q = 0. For SET to make Q = 1, it does not matter what the value of any of the other inputs, when SET = 1 the output is set to 1. However, for CLR to set the output to 0, CLR must be 1 and SET must be 0, it does not matter the value of the other inputs. (Including the clock). As long as SET and CLR are 0, the latch works as the one in the previous slide.

SR Latch The SR (or RS) latch (S for Set, R for Reset) will set the output Q to 0 if S = 0 and R = 1; it will set the output Q to 1 if S = 1 and R = 0; No change to Q (thus the Q0) if S = 0 and R = 0. The output is undefined if S = 1 and R = 1. The SR latch does not have a clock, the inputs are always active.

JK Flip Flop The JK flip flop is driven by the clock. If the clock is not in the up-tic (not ), it does not matter the inputs J and K, the output will not change. If the clock is in the up-tic , and J = 0, K = 0, there is no change to the output (it remains the same as before). If J = 0 and K = 1, the output is reset to 0 when the clock is in the up-tic. If J = 1 and K = 0 the output is set to 1 when the clock is in the up-tic. If J = 1 and K = 1 when the clock is in the up-tic, the output is complimented. This takes care of the SR latch with an undefined condition.

T Flip Flop The T flip flop (or toggle flip flop) compliments the output value when T = 1 on the up-tic of the clock, otherwise the output stays the same.

4-Bit D Flip Flop Flip flops are used to store data, but we need to store more than one bit of data at a time. This is a four-bit flip flop used to store a four-bit number. Notice that the SET, clk, LD, and CLR are sent to all four flip flops at the same time so they are activated together and act as a single unit. (b) is the schematic diagram for this four-bit flip flop.

Four-Bit Counter Using D Flip Flops
FF3 FF2 FF1 FF0 D D D D This is a 4-bit counter along with it’s truth table. Start the process by putting a 1 on the CLR line which will reset all flip flops to 0, so X0 = 0, X1 = 0, X2 = 0, and X3 = 0 and the input on D = 1 (from the Q0). Nothing changes until the CLK and INC are both 1. This makes X0 = 1 and Q’ = 0 in FF0 so D = 0. CLK in FF1 will be 0 (from Q’ in FF0) and nothing will change in FF2 and FF3 because FF1 does not change. The next time INC is 1 and the CLK is up-tic, FF0 will change X0 to 0 and Q’ to 1 which will cause CLK on FF1 to be 1 and change X1 to 1 while Q’ in FF1 will be 0 so CLK in FF2 and FF3 will not change. The result of the second INC to 1 is X3X2X1X0 = 0010 = 2. You should be able to step through the rest of the counting until 1111 = 16. The counter will then start over with 0000.

4-Bit Up/Down Counter with Parallel Load
This is an up/down counter with parallel load, that is all flip flops can be loaded at the same time with a value to initialize the counter. The input is at D, the CLK must be in up-tic, LD us used to load the flip-flops with values on D, CLR is use to reset to 0, COUNT is used to increment, and U/D’ determines an up increment or down increment. U/D’ = 1 is up increment, u/D’ = 0 (D’) is down increment. The truth table gives you all the details. Remember that D and Q are four-bit lines, so there are four flip flops that make up this up/down counter.

Four-Bit Left Shift Register
This is a 4-bit left shift register made with D flip flops. The control signals are CLK and SHL line (shift left). When SHL is 1 and CLK is up-tic, the value on the D inputs for each flip flop is entered changing the value of X0, X1, X2, and X3. In essence the bit on the Xin line is moved into the first flip flop and each flip flop shifts its bit to the next flip flop, with the leftmost bit shifted out. Try it by starting with X0 = 0, X1 = 0, X2 = 0, X3 = 0, and Xin = 1.

Shift Operations Here are different shift operations.
Linear shift left moves each bit to the left and a new bit into the right most position. Linear shift right moves each bit to the right and a new bit into the left most bit position Circular shift left moves each bit to the left and the left most bit to the right most bit position Circular shift right moves each bit to the right and the right most bit to the left most bit position Arithmetic shift left moves each bit to the left except the left most bit, which remains the same and shifts a new bit into the right most bit positon Arithmetic shift right moves each bit to the right, and the left most bit remains the same while the right most bit is discarded. Arithmetic shift is used when dealing with two’s complement numbers (that is, signed arithmetic).

Programmable Logic Array
Programmable Logic Arrays (PLA) are pre arranged gates that are used to build circuits. An x indicates that the input is sent to the gate, no x indicates an open circuit. This PLA is programmed for the equation X2’ + X1’X0’ +X1X2 and the equation X2 + X1’ + X0.

Programmable Array of Logic
A programmable array of logic (PAL) is a little different than a PAL in that the PAL’s OR gates can be selected while the PAL’s OR cannot be selected, they are preset. This PAL implements the same equations as the PAL.

CPUs: Microcode, Protection, and Processor Modes
Chapter Seven CPUs: Microcode, Protection, and Processor Modes Dr. Chuck Lillie

Instruction Code Formats
opcode operands Opcodes Add = 1010 Move = 1000 Load = 0000 Store = 0001 Push B = 0101 Push C = 0110 Pop A = 1100 Less complex with fewer operands, but more difficult to program

Instruction Set Architecture (ISA) Design
What should the ISA and its processor be able to do? Completeness: does the instruction set hall all the instructions a program needs to perform its required tasks PC needs rich set Microwave oven needs limited set Orthogonality: no overlap of instructions Minimize overlap – provides necessary functions with minimum instructions Register set: more registers speeds up CPU operations, but unused registers take up needed space Does the processor have to be backward compatible with others? What types and size of data will microprocessor deal with? Need floating point instructions? Need character instructions? Are interrupts necessary? Are conditional instructions necessary?

A Relatively Simple Instruction Set Architecture
Memory Module 64K (= 216) bytes of memory with each byte having 8 bits (64K X 8) I/O is treated as memory access (Memory Mapped I/O) 3 Registers Accumulator AC 8 bit register used as on of the operands Data loaded from memory to AC and AC to memory R 8 bit general purpose register 1-bit zero flag Z Set when arithmetic or logic instruction is executed 0 results sets flag to 1

Instruction Set for Relatively Simple CPU
M[Γ] is memory location at Γ Γ = 16 bit memory address An instruction with a memory reference requires 3-bytes Instruction with 3-bytes stores address bytes with low order bytes first and high order bytes next 25: JUMP 1234H 25: (JUMP) 26: (34H) 27: (12H)

Instruction Format for Relatively Simple CPU
3-byte Format 1-byte Format

Data Movement Instruction for 8085 Microprocessor

Instruction Format for 8085 Processor

Data Operation Instruction for 8085 Microprocessor

Program Control Instruction for 8085 Microprocessor

Assembly Language and Programming Paradigm
Chapter Eight Assembly Language and Programming Paradigm Dr. Chuck Lillie

Physical Memory and Physical Addressing
Chapter Ten Physical Memory and Physical Addressing Dr. Chuck Lillie

Pipelining vs. Parallel processing
In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into a sequence of pieces, where each piece is handled by a different (specialized) functional unit Parallel processing: each thing is processed entirely by a single functional unit We will briefly introduce the key ideas behind parallel processing instruction level parallelism thread-level parallelism

It is all about dependences!

Exploiting Parallelism
Of the computing problems for which performance is important, many have inherent parallelism Best example: computer games Graphics, physics, sound, AI etc. can be done separately Furthermore, there is often parallelism within each of these: Each pixel on the screen’s color can be computed independently Non-contacting objects can be updated/simulated independently Artificial intelligence of non-human entities done independently Another example: Google queries Every query is independent Google is read-only!!

Parallelism at the Instruction Level
add $2 <- $3, $6 or $2 <- $2, $4 lw $6 <- 0($4) addi $7 <- $6, 0x5 sub $8 <- $8, $4 Dependences? RAW WAW WAR When can we reorder instructions? When should we reorder instructions? Surperscalar Processors: Multiple instructions executing in parallel at *same* stage add $2 <- $3, $6 or $5 <- $2, $4 lw $6 <- 0($4) sub $8 <- $8, $4 addi $7 <- $6, 0x5

OoO Execution Hardware

Exploiting Parallelism at the Data Level
Consider adding together two arrays: void array_add(int A[], int B[], int C[], int length) { int i; for (i = 0 ; i < length ; ++ i) { C[i] = A[i] + B[i]; } Operating on one element at a time +

Exploiting Parallelism at the Data Level (SIMD)
Consider adding together two arrays: void array_add(int A[], int B[], int C[], int length) { int i; for (i = 0 ; i < length ; ++ i) { C[i] = A[i] + B[i]; } Operate on MULTIPLE elements + + + + Single Instruction, Multiple Data (SIMD)

Intel SSE/SSE2 as an example of SIMD
Added new 128 bit registers (XMM0 – XMM7), each can store 4 single precision FP values (SSE) 4 * 32b 2 double precision FP values (SSE2) 2 * 64b 16 byte values (SSE2) 16 * 8b 8 word values (SSE2) 8 * 16b 4 double word values (SSE2) 4 * 32b bit integer value (SSE2) 1 * 128b 4.0 (32 bits) + 3.5 (32 bits) -2.0 (32 bits) 2.3 (32 bits) 1.7 (32 bits) 2.0 (32 bits) -1.5 (32 bits) 0.3 (32 bits) 5.2 (32 bits) 6.0 (32 bits) 2.5 (32 bits)

Is it always that easy? Not always… a more challenging example:
unsigned sum_array(unsigned *array, int length) { int total = 0; for (int i = 0 ; i < length ; ++ i) { total += array[i]; } return total; Is there parallelism here?

We first need to restructure the code
unsigned sum_array2(unsigned *array, int length) { unsigned total, i; unsigned temp[4] = {0, 0, 0, 0}; for (i = 0 ; i < length & ~0x3 ; i += 4) { temp[0] += array[i]; temp[1] += array[i+1]; temp[2] += array[i+2]; temp[3] += array[i+3]; } total = temp[0] + temp[1] + temp[2] + temp[3]; for ( ; i < length ; ++ i) { total += array[i]; return total;

Then we can write SIMD code for the hot part
unsigned sum_array2(unsigned *array, int length) { unsigned total, i; unsigned temp[4] = {0, 0, 0, 0}; for (i = 0 ; i < length & ~0x3 ; i += 4) { temp[0] += array[i]; temp[1] += array[i+1]; temp[2] += array[i+2]; temp[3] += array[i+3]; } total = temp[0] + temp[1] + temp[2] + temp[3]; for ( ; i < length ; ++ i) { total += array[i]; return total;

Thread level parallelism: Multi-Core Processors
Two (or more) complete processors, fabricated on the same silicon chip Execute instructions from two (or more) programs/threads at same time #1 #2 IBM Power5

Multi-Cores are Everywhere
Intel Core Duo in new Macs: 2 x86 processors on same chip XBox360: 3 PowerPC cores Sony Playstation 3: Cell processor, an asymmetric multi-core with 9 cores (1 general-purpose, 8 special purpose SIMD processors)

Why Multi-cores Now? Number of transistors we can put on a chip growing exponentially…

… and performance growing too…
But power is growing even faster!! Power has become limiting factor in current chips

As programmers, do we care?
What happens if we run a program on a multi-core? void array_add(int A[], int B[], int C[], int length) { int i; for (i = 0 ; i < length ; ++i) { C[i] = A[i] + B[i]; } #1 #2

What if we want a program to run on both processors?
We have to explicitly tell the machine exactly how to do this This is called parallel programming or concurrent programming There are many parallel/concurrent programming models We will look at a relatively simple one: fork-join parallelism Posix threads and explicit synchronization

Fork/Join Logical Example
Fork N-1 threads Break work into N pieces (and do it) Join (N-1) threads void array_add(int A[], int B[], int C[], int length) { cpu_num = fork(N-1); int i; for (i = cpu_num ; i < length ; i += N) { C[i] = A[i] + B[i]; } join(); How good is this with caches?

How does this help performance?
Parallel speedup measures improvement from parallelization: time for best serial version time for version with p processors What can we realistically expect? speedup(p) =

Reason #1: Amdahl’s Law In general, the whole computation is not (easily) parallelizable Serial regions

Reason #1: Amdahl’s Law Speedup = New Execution Time = 1-s + s P
Suppose a program takes 1 unit of time to execute serially A fraction of the program, s, is inherently serial (unparallelizable) For example, consider a program that, when executing on one processor, spends 10% of its time in a non-parallelizable region. How much faster will this program run on a 3-processor system? What is the maximum speedup from parallelization? New Execution Time = 1-s + s P New Execution Time = .9T + .1T 3 Speedup =

Reason #2: Overhead Forking and joining is not instantaneous
void array_add(int A[], int B[], int C[], int length) { cpu_num = fork(N-1); int i; for (i = cpu_num ; i < length ; i += N) { C[i] = A[i] + B[i]; } join(); Forking and joining is not instantaneous Involves communicating between processors May involve calls into the operating system Depends on the implementation New Execution Time = 1-s + s overhead(P) P

Programming Explicit Thread-level Parallelism
As noted previously, the programmer must specify how to parallelize But, want path of least effort Division of labor between the Human and the Compiler Humans: good at expressing parallelism, bad at bookkeeping Compilers: bad at finding parallelism, good at bookkeeping Want a way to take serial code and say “Do this in parallel!” without: Having to manage the synchronization between processors Having to know a priori how many processors the system has Deciding exactly which processor does what Replicate the private state of each thread OpenMP: an industry standard set of compiler extensions Works very well for programs with structured parallelism.

Performance Optimization
Until you are an expert, first write a working version of the program Then, and only then, begin tuning, first collecting data, and iterate Otherwise, you will likely optimize what doesn’t matter “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” -- Sir Tony Hoare

Summary Multi-core is having more than one processor on the same chip.
Soon most PCs/servers and game consoles will be multi-core Results from Moore’s law and power constraint Exploiting multi-core requires parallel programming Automatically extracting parallelism too hard for compiler, in general. But, can have compiler do much of the bookkeeping for us OpenMP Fork-Join model of parallelism At parallel region, fork a bunch of threads, do the work in parallel, and then join, continuing with just one thread Expect a speedup of less than P on P processors Amdahl’s Law: speedup limited by serial portion of program Overhead: forking and joining are not free

Lecture 18: Pipelining I

Laundry Pipeling Example
Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold, and put away A B C D Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers

Sequential Laundry Sequential laundry takes 8 hours for 4 loads 30
Time 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d e B C D A Sequential laundry takes 8 hours for 4 loads

Pipelined Laundry Pipelined laundry takes 3.5 hours for 4 loads! 12
2 AM 6 PM 7 8 9 10 11 1 Time 30 T a s k O r d e B C D A Pipelined laundry takes 3.5 hours for 4 loads!

General Definitions Latency: time to completely execute a certain task
for example, time to read a sector from disk is disk access time or disk latency Throughput: amount of work that can be done over a period of time

Pipelining Lessons (1/2)
Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Time to “fill” pipeline and time to “drain” it reduces speedup: 2.3X v. 4X in this example 6 PM 7 8 9 Time B C D A 30 T a s k O r d e

Pipelining Lessons (2/2)
Suppose new Washer takes 20 minutes, new Stasher takes 20 minutes. How much faster is pipeline? Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages also reduces speedup 6 PM 7 8 9 Time B C D A 30 T a s k O r d e

Steps in Executing MIPS
1) IFetch: Fetch Instruction, Increment PC 2) Decode Instruction, Read Registers 3) Execute: Mem-ref: Calculate Address Arith-log: Perform Operation 4) Memory: Load: Read Data from Memory Store: Write Data to Memory 5) Write Back: Write Data to Register

Pipelined Execution Representation
Time IFtch Dcd Exec Mem WB Every instruction must take same number of steps, also called pipeline “stages”, so some will go idle sometimes

Review: Datapath for MIPS
PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Write Back Use datapath figure to represent pipeline IFtch Dcd Exec Mem WB ALU I$ Reg D$

Graphical Pipeline Representation
(In Reg, right half highlight read, left half write) Time (clock cycles) I n s t r. O r d e I$ ALU Reg Reg I$ D$ ALU I$ Reg I$ ALU Reg D$ I$ Load Add Store Sub Or D$ Reg ALU Reg D$ ALU Reg D$ Reg

Example Suppose 2 ns for memory access, 2 ns for ALU operation, and 1 ns for register file read or write; compute instr rate Nonpipelined Execution: lw : IF + Read Reg + ALU + Memory + Write Reg = = 8 ns add: IF + Read Reg + ALU + Write Reg = = 6 ns Pipelined Execution: Max(IF,Read Reg,ALU,Memory,Write Reg) = 2 ns

Pipeline Hazard: Matching socks in later load
12 2 AM 6 PM 7 8 9 10 11 1 Time 30 T a s k O r d e B C D A E F bubble A depends on D; stall since folder tied up

Limits to pipelining Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)

Structural Hazard #1: Single Memory (1/2)
Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r. O r d e Time (clock cycles) Read same memory twice in same clock cycle

Structural Hazard #1: Single Memory (2/2)
Solution: infeasible and inefficient to create second memory so handle this by having two Level 1 Caches (a temporary smaller [of usually most recently used] copy of memory) have both an L1 Instruction Cache and an L1 Data Cache need more complex hardware to control when both caches miss

Structural Hazard #2: Registers (1/2)
sw Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r. O r d e Time (clock cycles) Can’t read and write to registers simultaneously

Structural Hazard #2: Registers (2/2)
Fact: Register access is VERY fast: takes less than half the time of ALU stage Solution: introduce convention always Write to Registers during first half of each clock cycle always Read from Registers during second half of each clock cycle Result: can perform Read and Write during same clock cycle

Things to Remember Optimal Pipeline What makes this work?
Each stage is executing part of an instruction each clock cycle. One instruction finishes during each clock cycle. On average, execute far more quickly. What makes this work? Similarities between instructions allow us to use same stages for all instructions (generally). Each stage takes about the same amount of time as all others: little wasted time.

Introduction and Overview

Similar presentations

Presentation on theme: "Introduction and Overview"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction and Overview

Similar presentations

Presentation on theme: "Introduction and Overview"— Presentation transcript:

Similar presentations

About project

Feedback