Computer Systems Organization and Architecture Topic 3: Processor Design.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Instruction Set Design
Computer Organization and Architecture
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Computer Organization and Architecture The CPU Structure.
COE 308: Computer Architecture (T041) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (Appendix A, Computer Architecture: A Quantitative Approach,
RICARFENS AUGUSTIN JARED COELLO OSVALDO QUINONES Chapter 12 Processor Structure and Function.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
Arithmetic for Computers
Processor Structure & Operations of an Accumulator Machine
The Structure of the CPU
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
Computer Arithmetic.
Machine Instruction Characteristics
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
ECE232: Hardware Organization and Design
Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
L/O/G/O CPU Arithmetic Chapter 7 CS.216 Computer Architecture and Organization.
ECE 456 Computer Architecture
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Computer Arithmetic See Stallings Chapter 9 Sep 10, 2009
COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 21 & 22 Processor Organization Register Organization Course Instructor: Engr. Aisha Danish.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
PART 4: (1/2) Central Processing Unit (CPU) Basics CHAPTER 12: P ROCESSOR S TRUCTURE AND F UNCTION.
Processor Organization
Chapter 12 Processor Structure and Function. Central Processing Unit CPU architecture, Register organization, Instruction formats and addressing modes(Intel.
Chapter 9 Computer Arithmetic
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 7th Edition
Processor Organization and Architecture
CSCE 350 Computer Architecture
Computer Organization and ASSEMBLY LANGUAGE
Arithmetic Logical Unit
ECEG-3202 Computer Architecture and Organization
Computer Architecture
Chapter 8 Computer Arithmetic
CPU Structure and Function
Chapter 11 Processor Structure and function
Presentation transcript:

Computer Systems Organization and Architecture Topic 3: Processor Design

 CPU must: ◦ Fetch instructions (suruhan ambil) ◦ Interpret _______ (tafsir _______) ◦ _______ data (_______ data) ◦ Process data (proses data) ◦ Write data (tulis data)

 Registers form the highest level of the memory hierarchy (hierarki ingatan) ◦ Small set of high speed storage locations ◦ ______ storage for data and control information  Two types of registers ◦ User-visible  May be referenced by assembly-level instructions (suruhan paras perhimpunan) and are thus “_______” to the user ◦ Control (kawalan) and _______ registers  Used to control the operation of the CPU  Most are not visible to the user

 General categories based on function ◦ General purpose (Serba guna)  Can be assigned a variety of functions  Ideally, they are defined _______ to the operations within the instructions ◦ _______  These registers only hold data ◦ Address (Alamat)  These registers only hold _______ information  Examples: general purpose address registers, segment pointers, stack pointers, index registers ◦ _______ codes (Kod _______)  Visible to the user but values set by the CPU as the result of performing operations  Example code bits: _______, _______, overflow (limpahan)  Bit values are used as the basis for conditional jump instructions (suruhan lompat bersyarat)

 Design trade off (tukar ganti) between general purpose and specialized registers ◦ General purpose registers _______ flexibility in instruction design ◦ _______ purpose registers permit implicit register specification in instructions - reduces register field size in an instruction ◦ No clear “best” design approach  How many registers are enough? ◦ More registers permit more operands (kendalian) to be held within the CPU - reducing memory bandwidth requirements to some extent ◦ More registers cause an _______ in the field sizes needed to specify registers in an instruction word ◦ Locality of reference may not support too many registers ◦ Most machines use _______registers

 How big (wide)? ◦ Address registers should be _______ enough to hold the longest address ◦ Data registers should be wide enough to hold most data types  Would not want to use _______-bit registers if the vast majority of data operations used 16 and 32-bit operands  Related to width of memory _______ bus  Concatenate registers together to store longer formats  B-C registers in the 8085  AccA-AccB registers in the 68HC11

 These registers are used during the _______, decoding (penyahkodan) and _______ of instructions ◦ Many are not visible to the user / programmer ◦ Some are visible but can not be (easily) modified  Typical registers ◦ _______ counter (PC)  Points to the next instruction to be executed ◦ _______ register (IR)  Contains the instruction being executed (most recently) ◦ Memory _______ register (MAR)  Contains the address of a location in memory ◦ Memory _______ / _______ register (MBR)  Contains a word of data to be written to memory or the word most recently read ◦ Program _______ word(s)  Superset of condition code register  Interrupt masks, supervisory modes, etc.  Status information

 A set of bits  Includes Condition Codes  _______ ◦ Contains the sign of the result of the last arithmetic operation  _______ ◦ Set when the result is 0  _______ ◦ Set if an operation resulted in a carry (addition) into or borrow (subtraction) out of a high-order bit  _______ ◦ Set if a logical compare result is equality  _______ ◦ Used to indicate arithmetic overflow  Interrupt enable/disable ◦ Used to enable or disable interrupts  Supervisor ◦ Indicates whether the CPU is executing in supervisor or user mode

 _______ Cycle ◦ May require memory access to fetch operands ◦ _______ addressing requires more memory accesses ◦ Can be thought of as additional instruction ________

 Depends on CPU design  In general:  _______ ◦ PC contains _______ of next instruction ◦ Address moved to _______ ◦ Address placed on address bus ◦ Control unit requests memory read ◦ Result placed on _______ bus, copied to MBR, then to IR ◦ Meanwhile PC _______ by 1

 IR is examined  If indirect addressing, indirect cycle is _______ ◦ Right most N bits of _______ transferred to _______ ◦ Control unit requests memory _______ ◦ Result (address of _______) moved to MBR

 May take many forms  Depends on _______ being executed  May include ◦ _______ read/write ◦ Input/Output ◦ _______ transfers ◦ _______ operations

 _______  Current PC saved to allow resumption after interrupt  Contents of PC copied to MBR  Special memory location (e.g. _______ pointer) loaded to MAR  MBR written to _______  PC loaded with address of interrupt handling routine  Next instruction (first of _______ handler) can be fetched

 Prefetch ◦ Fetch accessing main _______ ◦ Execution usually does not _______ main memory ◦ Can fetch next instruction during execution of current instruction ◦ Called instruction _______  Improved Performance ◦ But not doubled:  Fetch usually _______ than execution  Prefetch more than one instruction?  Any jump or _______ means that prefetched instructions are not the required instructions ◦ Add more _______ to improve performance

 The Central Processing Unit (CPU) is the _______ combination (kombinasi lojik) of the _______ _______ _______ (ALU) and the system’s control unit  In this sub-section, we focus on the ALU and its operation ◦ Overview of the ALU ◦ Data representation (Perwakilan data) ◦ Computer Arithmetic and its hardware implementation

 The ALU is that part of the computer that actually performs _______ and _______ operations on data  All other elements of the computer system are there mainly to bring _______ to the ALU for processing or to take _______ from the ALU  Registers are used as _______ and _______ for most ALU operations  In early machines, _______ and _______ determined the overall structure of the CPU and its ALU ◦ Result was that machines were built around a single register, known as the __________ (penumpuk) ◦ The __________ was used in almost all ALU related _________

 The _______ and _______of the CPU and the ALU is improved through increases in the complexity of the hardware ◦ Use _______ register sets to store operands, addresses and results ◦ _______ the capabilities of the ALU ◦ Use special hardware to support _______ of execution between points in a program ◦ _______ functional units within the ALU to permit concurrent operations  Problem: design a minimal cost yet fully functional ALU ◦ What building block components would be included?

 Solution: ◦ Only 2 basic _______ are required to produce a fully functional ALU  A bit-wide _______ _______ unit  A 2-input _______ gate ◦ NAND is a functionally complete logic operation ◦ Similarly, if you can add, all other arithmetic operations can be derived from addition. ◦ To conduct operations on _______ bit words is clearly tedious (menjemukan)! ◦ Goal then is to develop arithmetic and logic circuitry that is algorithmically _______ while remaining cost effective

 _______-_______ format ◦ Positional representation using n bits ◦ Left most bit position is the sign bit  0 for _______ number  1 for _______ number ◦ Remaining n-1 bits represent the _______ ◦ Range: {-2 n-1 -1, +2 n-1 -1} ◦ Problems:  Sign must be considered during arithmetic operations  Dual representation of zero (-0 and +0)

 Ones ______________ format ◦ Binary case of diminished (menyusut) _______ complement ◦ Negative numbers are represented by a bit-by-bit ______________ of the (positive) magnitude (the process of negation) ◦ Sign bit interpreted as in sign-magnitude format ◦ Examples (8-bit words): +42 = = ◦ Still have a _______ representation for zero (all zeros and all ones)

 Twos ______________ format ◦ Binary case of radix complement ◦ Negative numbers, -X, are represented by the pseudo- positive number 2 n - |X| ◦ With 2 n symbols  2 n-1 -1 _______ numbers  2 n-1 _______ numbers ◦ Given the representation for +X, the representation for -X is found by taking the 1s complement of +X and adding 1 ◦ Caution: avoid confusion with “2s complement _______ (representation) and the 2s complement _______

◦ Converting between two word lengths (e.g., convert an 8- bit format into a 16-bit format) requires a sign extension:  The _______ bit is extended from its current location up to the new location  All bits in the extension take on the value of the old _______ bit +18= = = =

 Use of a single _______ adder is the simplest hardware ◦ Must implement an n-repetition for-loop for an n-bit addition ◦ This is lots of _______ for a typical addition  Use a _______ adder unit instead ◦ n full adder units cascaded together ◦ In adding X and Y together unit i adds X i and Y i to produce SUM i and CARRY i ◦ Carry out of each stage is the carry in to the next stage ◦ Worst case add time is n times the delay of each unit -- despite the _______ operation of each adder unit -- Order (n) delay ◦ With signed numbers, watch out for _______: when adding 2 positive or 2 negative numbers, _______ has occurred if the result has the _______ sign

 Alternatives to the ripple adder ◦ Must allow for the worst case delay in a ripple adder ◦ In most cases, _______ signals do not propagate through the entire adder ◦ Provide additional hardware to detect where carries will occur or when the carry _______ is completed ◦ Carry Completion Sensing Adders use additional circuitry to detect the time when all carries are completed  Signal control unit that add is finished  Essentially an ______________ device  Typical add times are O(log n)

◦ Carry ___________ Adders  Predict in advance what adder stage of a ripple adder will generate a carry out  Use prediction to avoid the carry propagation delays -- generate all of the carries at once  Add time is a _______, regardless of the width, n, of the word -- O(1)  Problem: prediction in stage i requires information from all previous stages -- gates to implement this require large numbers of inputs, making this adder impractical for even moderate values of n

 To perform X-Y, realize that X-Y = X+(-Y)  Therefore, the following hardware is “typical”

 A number of methods exist to perform integer multiplication ◦ Repeated _______: add the multiplicand to itself “multiplier” times ◦ Shift and add -- traditional “pen and paper” way of multiplying (extended to binary format) ◦ High speed (special purpose) hardware multipliers  _______ addition ◦ Least sophisticated method ◦ Just use adder over and over again ◦ If the multiplier is n bits, can have as many as 2 n iterations of addition -- O(2 n ) !!!! ◦ Not used in an _______

 Shift and add ◦ Computer’s version of the pen and paper approach: 1011 (11) x1101 (13) =========== Partial products =========== (143) ◦ The computer version accumulates the partial products into a running (partial) sum as the algorithm progresses ◦ Each partial product generation results in an _______ and _______ operation

Shift and add hardware for unsigned integers

Shift and add flowchart for unsigned integers

 To multiply signed numbers (2s ____________) ◦ Normal shift and add does not work (problem in the basic algorithm of no sign extension to 2 n bits) ◦ ________ all numbers to their positive magnitudes, multiple, then figure out the correct sign ◦ Use a method that works for both positive and negative numbers  ________ algorithm is popular (recoding the multiplier) ◦ ________ algorithm  As in S&A, strings of 0s in the ________ only require shifting (no addition steps)  “Recode” strings of 1s to permit similar ________  String of 1s from 2 u down to 2 v is treated as 2 u v

 In other words, - At the right end of a string of 1s in the multiplier, perform a ________ - At the left end of the string perform an ________ - For all of the 1s in between, just do ________  Hardware modifications required in (Figure shift and add hardware for unsigned integers) - Ability to perform ________ - Ability to perform ________ shifting rather than logical shifting (for sign extension) - A flip flop for bit Q -1  To determine ________ (add and shift, subtract and shift, shift) examine the bits Q 0 Q or 11: just shift - 10: ________ and shift - 01: ________ and shift

Booth’s algorithm for multiplication

 Advantages of Booth: - Treats positive and negative numbers ________ - Strings of 1s and 0s can be skipped over with shift operations for faster ________ time  High performance multipliers ◦ ________ the computation time by employing more hardware than would normally be found in a S&A-type multiplier unit ◦ Not generally found in general-purpose processors due to expense ◦ Examples  Combinational hardware multipliers  Pipelined Wallace Tree adders from Carry-Save Adder units

 Once you have committed to implementing multiplication, implementing division is a relatively easy next step that utilizes much of the same hardware  Want to find quotient, Q, and remainder, R, such that D = Q x V + R  Restoring division for ________ integers ◦ Algorithm adapted from the traditional “pen and paper” approach ◦ Algorithm is of time complexity O(n) for n-bit dividend ◦ Uses essentially the same ALU hardware as the ________ multiplication algorithm  Adder / subtractor unit  ________ wide shift register AQ that can be shifted to the left  ________ for the divisor  Control logic

Restoring division algorithm for unsigned integers

 For two’s complement numbers, must deal with the ________ extension “problem”  Algorithm: ◦ Load M with divisor, AQ with dividend (using sign bit extension) ◦ ________ AQ left 1 position ◦ If M and A have same sign, A  A-M, otherwise A  A+M ◦ Q 0  1 if sign bit of A has not changed or (A=0 AND Q=0), otherwise Q 0 =0 and restore *A ◦ Repeat ________ and +/- operations for all bits in Q ◦ Remainder is in A, quotient in Q  If the signs of the divisor and the dividend were the same, quotient is correct, otherwise, Q is the 2’s complement of the quotient

2’s complement division examples

 ________ fixed point schemes do not have the ability to represent very large or very small numbers  Need the ability to dynamically ________ the decimal point to a convenient location  Format: +/-M x R +/-E  Significand / mantissas are stored in a ________ format ◦ Either 1.xxxxx or 0.1xxxxx ◦ Since the 1 is required, don’t need to explicitly store it in the data word -- insert it for calculations only  Exponents can be positive or negative values ◦ Use ________ (Excess coding) to avoid operating on negative exponents ◦ ________ is added to all exponents to store as positive numbers

 For a fixed n-bit representation length, 2 n combinations of symbols ◦ If floating point ________ the range of numbers in the format (compared to integer representation) then the “spacing” between the numbers must increase  This causes a ________ in the format’s precision ◦ If more bits are allocated to the exponent, range is ________ at the expense of decreased precision ◦ Similarly, more significand bits increases the ________ and reduces the range ◦ The ________ is chosen at design time and is not explicitly represented in the format  Small -- smaller range  Large -- increased range but loss of significant bits as a result of mantissa alignment when normalizing

 Problems to deal with in the format ◦ Representation of ________ ◦ Over and ________ and how to detect ◦ ________ operations  IEEE 754 format ◦ Defines single and double ________ formats (32 and 64 bits) ◦ Standardizes formats across many different platforms ◦ Radix 2 ◦ Single  Range to  8-bit exponent with 127 bias  23-bit mantissa ◦ Double  Range to  11-bit exponent with 1023 bias  52-bit mantissa

IEEE 754 Formats

 Floating point arithmetic operations ◦ Addition and subtraction  ________ significand  Add or subtract significand  Post ________ ◦ Multiplication  ________ exponents  Multiply significand  Post normalize ◦ Division  ________ exponents  Divide significand  Post normalize

 In this section, we have focused on the operation of the CPU ◦ Registers and their use ◦ Instruction execution  Looked at the basicd concepts associated with computer arithmetic ◦ Number representation ◦ Basic ALU construction ◦ Hardware and software implementations of multiplication and division operations ◦ Floating point numbers and operations

 Computer Organization and Architecture, 6th Edition. Stallings, W. Prentice Hall.  Computer Organization and Design. David A. Patterson, John L. Hennessy. Morgan Kaufmann