John Kubiatowicz Electrical Engineering and Computer Sciences

CS252 Graduate Computer Architecture Lecture 20 Cyclic Redundancy Checking, I/O and Buses
John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

Review: Error Correction
Motivation: DRAM is dense Signals are easily disturbed High Capacity  higher probability of failure Approach: Redundancy Add extra information so that we can recover from errors Can we do better than just create complete copies? Block Codes: Data Coded in blocks k data bits coded into n encoded bits Measure of overhead: Rate of Code: K/N Often called an (n,k) code Consider data as vectors in GF(2) [ i.e. vectors of bits ] Code Space is set of all 2n vectors, Data space set of 2k vectors Encoding function: C=f(d) Decoding function: d=f(C’) Not all possible code vectors, C, are valid! Systematic Codes: original data appears within coded data 2/25/2019 cs252-S07, Lecture 20

General Idea: Code Vector Space
Code Space d0 C0=f(d0) Code Distance (Hamming Distance) Not every vector in the code space is valid Hamming Distance (d): Minimum number of bit flips to turn one code word into another Number of errors that we can detect: (d-1) Number of errors that we can fix: ½(d-1) 2/25/2019 cs252-S07, Lecture 20

Review: Code Types Linear Codes: Code is generated by G and in null-space of H Hamming Codes: Design the H matrix d = 3  Columns nonzero, Distinct d = 4  Columns nonzero, Distinct, Odd-weight Erasure Codes vs Random Codes In an Erasure code, you know where the errors are. Reed-solomon codes: Based on polynomials in GF(2k) (I.e. k-bit symbols) Data as coefficients, code space as values of polynomial: P(x)=a0+a1x1+… ak-1xk-1 Coded: P(0),P(1),P(2)….,P(n-1) Can recover polynomial as long as get any k of n Alternatively: as long as no more than n-k coded symbols erased, can recover data. Digital Fountain Codes: Sparse matrix representation rather than Dense matrices Faster to encode, no fixed number of code words 2/25/2019 cs252-S07, Lecture 20

Another Example: Redundant Check
Send a message M and a “check” word C Simple function on <M,C> to determine if both received correctly (with high probability) Example: XOR all the bytes in M and append the “checksum” byte, C, at the end Receiver XORs <M,C> What should result be? What errors are caught? *** bit i is XOR of ith bit of each byte 2/25/2019 cs252-S07, Lecture 20

Example: TCP Checksum TCP Packet Format Application (HTTP,FTP, DNS) 7
Transport (TCP, UDP) 4 Network (IP) 3 Data Link (Ethernet, b) 2 TCP Checksum a 16-bit checksum, consisting of the one's complement of the one's complement sum of the contents of the TCP segment header and data, is computed by a sender, and included in a segment transmission. (note end-around carry) Summing all the words, including the checksum word, should yield zero Physical 1 2/25/2019 cs252-S07, Lecture 20

Example: Ethernet CRC-32
Application (HTTP,FTP, DNS) 7 Transport (TCP, UDP) 4 Network (IP) 3 Data Link (Ethernet, b) 2 Physical 1 2/25/2019 cs252-S07, Lecture 20

CRC concept I have a msg polynomial M(x) of degree m
We both have a generator poly G(x) of degree n Let r(x) = remainder of M(x) xn / G(x) M(x) xn = G(x)p(x) + r(x) r(x) is of degree n What is (M(x) xn – r(x)) / G(x) ? So I send you M(x) xn – r(x) m+n degree polynomial You divide by G(x) to check M(x) is just the m most signficant coefficients, r(x) the lower n x-bit Message is viewed as coefficients of x-degree polynomial over binary numbers n bits of zero at the end tack on n bits of remainder Instead of the zeros 2/25/2019 cs252-S07, Lecture 20

Review: Galois Fields GF(2n)
Consider polynomials whose coefficients come from GF(2). Each term of the form xn is either present or absent. Examples: 0, 1, x, x2, and x7 + x6 + 1 = 1·x7 + 1· x6 + 0 · x5 + 0 · x4 + 0 · x3 + 0 · x2 + 0 · x1 + 1· x0 With addition and multiplication these form a field: “Add”: XOR each element individually with no carry: x4 + x x + 1 + x x2 + x x3 + x “Multiply”: multiplying by xn is like shifting to the left. x2 + x + 1  x + 1 x3 + x2 + x x 2/25/2019 cs252-S07, Lecture 20

So what about division (mod)
x4 + x2 = x3 + x with remainder 0 x x4 + x2 + 1 = x3 + x2 with remainder 1 X + 1 x3 + x2 + 0x + 0 x4 + 0x3 + x2 + 0x + 1 X + 1 x x3 x3 + x2 x3 + x2 0x2 + 0x 0x + 1 Remainder 1 2/25/2019 cs252-S07, Lecture 20

Polynomial division 1 1 When MSB is zero, just shift left, bringing in next bit When MSB is 1, XOR with divisor and shiftl 2/25/2019 cs252-S07, Lecture 20

CRC encoding Message sent: 2/25/2019 cs252-S07, Lecture 20

CRC decoding 2/25/2019 cs252-S07, Lecture 20

Galois Fields - The theory behind LFSRs
These polynomials form a Galois (finite) field if we take the results of this multiplication modulo a prime polynomial p(x). A prime polynomial is one that cannot be written as the product of two non-trivial polynomials q(x)r(x) Perform modulo operation by subtracting a (polynomial) multiple of p(x) from the result. If the multiple is 1, this corresponds to XOR-ing the result with p(x). For any degree, there exists at least one prime polynomial. With it we can form GF(2n) Additionally, … Every Galois field has a primitive element, , such that all non-zero elements of the field can be expressed as a power of . By raising  to powers (modulo p(x)), all non-zero field elements can be formed. Certain choices of p(x) make the simple polynomial x the primitive element. These polynomials are called primitive, and one exists for every degree. For example, x4 + x + 1 is primitive. So  = x is a primitive element and successive powers of  will generate all non-zero elements of GF(16). Example on next slide. 2/25/2019 cs252-S07, Lecture 20

Galois Fields – Primitives
0 = 1 = x 2 = x2 3 = x3 4 = x + 1 5 = x2 + x 6 = x3 + x2 7 = x x + 1 8 = x 9 = x x 10 = x2 + x + 1 11 = x3 + x2 + x 12 = x3 + x2 + x + 1 13 = x3 + x 14 = x 15 = Note this pattern of coefficients matches the bits from our 4-bit LFSR example. In general finding primitive polynomials is difficult. Most people just look them up in a table, such as: 4 = x4 mod x4 + x + 1 = x4 xor x4 + x + 1 = x + 1 2/25/2019 cs252-S07, Lecture 20

Primitive Polynomials
x2 + x +1 x3 + x +1 x4 + x +1 x5 + x2 +1 x6 + x +1 x7 + x3 +1 x8 + x4 + x3 + x2 +1 x9 + x4 +1 x10 + x3 +1 x11 + x2 +1 x12 + x6 + x4 + x +1 x13 + x4 + x3 + x +1 x14 + x10 + x6 + x +1 x15 + x +1 x16 + x12 + x3 + x +1 x17 + x3 + 1 x18 + x7 + 1 x19 + x5 + x2 + x+ 1 x20 + x3 + 1 x21 + x2 + 1 x22 + x +1 x23 + x5 +1 x24 + x7 + x2 + x +1 x25 + x3 +1 x26 + x6 + x2 + x +1 x27 + x5 + x2 + x +1 x28 + x3 + 1 x29 + x +1 x30 + x6 + x4 + x +1 x31 + x3 + 1 x32 + x7 + x6 + x2 +1 Galois Field Hardware Multiplication by x  shift left Taking the result mod p(x)  XOR-ing with the coefficients of p(x) when the most significant coefficient is 1. Obtaining all 2n-1 non-zero  Shifting and XOR-ing 2n-1 times. elements by evaluating xk for k = 1, …, 2n-1 2/25/2019 cs252-S07, Lecture 20

Building an LFSR from a Primitive Poly
For k-bit LFSR number the flip-flops with FF1 on the right. The feedback path comes from the Q output of the leftmost FF. Find the primitive polynomial of the form xk + … + 1. The x0 = 1 term corresponds to connecting the feedback directly to the D input of FF 1. Each term of the form xn corresponds to connecting an xor between FF n and n+1. 4-bit example, uses x4 + x + 1 x4  FF4’s Q output x  xor between FF1 and FF2 1  FF1’s D input To build an 8-bit LFSR, use the primitive polynomial x8 + x4 + x3 + x2 + 1 and connect xors between FF2 and FF3, FF3 and FF4, and FF4 and FF5. 2/25/2019 cs252-S07, Lecture 20

Generating Polynomials
CRC-16: G(x) = x16 + x15 + x2 + 1 detects single and double bit errors All errors with an odd number of bits Burst errors of length 16 or less Most errors for longer bursts CRC-32: G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1 Used in ethernet Also 32 bits of 1 added on front of the message Initialize the LFSR to all 1s 2/25/2019 cs252-S07, Lecture 20

Motivation: Who Cares About I/O?
CPU Performance: 60% per year I/O system performance limited by mechanical delays (disk I/O) < 10% per year (IO per sec or MB per sec) Amdahl's Law: system speed-up limited by the slowest part! 10% IO & 10x CPU => 5x Performance (lose 50%) 10% IO & 100x CPU => 10x Performance (lose 90%) I/O bottleneck: Diminishing fraction of time in CPU Diminishing value of faster CPUs Ancestor of Java had no I/O CPU vs. Peripheral Primary vs. Secondary What maks portable, PDA exciting? 2/25/2019 cs252-S07, Lecture 20

I/O Systems Processor Cache Memory - I/O Bus Main Memory I/O
interrupts Processor Cache Memory - I/O Bus Main Memory I/O Controller I/O Controller I/O Controller Graphics Disk Disk Network 2/25/2019 cs252-S07, Lecture 20

What is a bus? A Bus Is: shared communication link
single set of wires used to connect multiple subsystems A Bus is also a fundamental tool for composing large, complex systems systematic means of abstraction Control Datapath Memory Processor Input Output 2/25/2019 cs252-S07, Lecture 20

Advantages of Buses Versatility: Low Cost:
Processer I/O Device I/O Device I/O Device Memory Versatility: New devices can be added easily Peripherals can be moved between computer systems that use the same bus standard Low Cost: A single set of wires is shared in multiple ways The two major advantages of the bus organization are versatility and low cost. By versatility, we mean new devices can easily be added. Furthermore, if a device is designed according to a industry bus standard, it can be move between computer systems that use the same bus standard. The bus organization is a low cost solution because a single set of wires is shared in multiple ways. +1 = 7 min. (X:47) 2/25/2019 cs252-S07, Lecture 20

Disadvantage of Buses It creates a communication bottleneck
Processer I/O Device I/O Device I/O Device Memory It creates a communication bottleneck The bandwidth of that bus can limit the maximum I/O throughput The maximum bus speed is largely limited by: The length of the bus The number of devices on the bus The need to support a range of devices with: Widely varying latencies Widely varying data transfer rates The major disadvantage of the bus organization is that it creates a communication bottleneck. When I/O must pass through a single bus, the bandwidth of that bus can limit the maximum I/O throughput. The maximum bus speed is also largely limited by: (a) The length of the bus. (b) The number of I/O devices on the bus. (C) And the need to support a wide range of devices with a widely varying latencies and data transfer rates. +2 = 9 min. (Y:49) 2/25/2019 cs252-S07, Lecture 20

The General Organization of a Bus
Control Lines Data Lines Control lines: Signal requests and acknowledgments Indicate what type of information is on the data lines Data lines carry information between the source and the destination: Data and Addresses Complex commands A bus generally contains a set of control lines and a set of data lines. The control lines are used to signal requests and acknowledgments and to indicate what type of information is on the data lines. The data lines carry information between the source and the destination. This information may consists of data, addresses, or complex commands. A bus transaction includes tow parts: (a) sending the address and (b) then receiving or sending the data. +1 = 10 min (X:50) 2/25/2019 cs252-S07, Lecture 20

Master versus Slave A bus transaction includes two parts:
Master issues command Bus Master Bus Slave Data can go either way A bus transaction includes two parts: Issuing the command (and address) – request Transferring the data – action Master is the one who starts the bus transaction by: issuing the command (and address) Slave is the one who responds to the address by: Sending data to the master if the master ask for data Receiving data from the master if the master wants to send data The bus master is the one who starts the bus transaction by sending out the address. The slave is the one who responds to the master by either sending data to the master if the master asks for data. Or the slave may end up receiving data from the master if the master wants to send data. In most simple I/O operations, the processor will be the bus master but as I will show you later in today’s lecture, this is not always be the case. +1 = 11 min. (X:51) 2/25/2019 cs252-S07, Lecture 20

Types of Buses Processor-Memory Bus (design specific)
Short and high speed Only need to match the memory system Maximize memory-to-processor bandwidth Connects directly to the processor Optimized for cache block transfers I/O Bus (industry standard) Usually is lengthy and slower Need to match a wide range of I/O devices Connects to the processor-memory bus or backplane bus Backplane Bus (standard or proprietary) Backplane: an interconnection structure within the chassis Allow processors, memory, and I/O devices to coexist Cost advantage: one bus for all components Buses are traditionally classified as one of 3 types: processor memory buses, I/O buses, or backplane buses. The processor memory bus is usually design specific while the I/O and backplane buses are often standard buses. In general processor bus are short and high speed. It tries to match the memory system in order to maximize the memory-to-processor BW and is connected directly to the processor. I/O bus usually is lengthy and slow because it has to match a wide range of I/O devices and it usually connects to the processor-memory bus or backplane bus. Backplane bus receives its name because it was often built into the backplane of the computer--it is an interconnection structure within the chassis. It is designed to allow processors, memory, and I/O devices to coexist on a single bus so it has the cost advantage of having only one single bus for all components. +2 = 16 min. (X:56) 2/25/2019 cs252-S07, Lecture 20

A Computer System with One Bus: Backplane Bus
Processor Memory I/O Devices A single bus (the backplane bus) is used for: Processor to memory communication Communication between I/O devices and memory Advantages: Simple and low cost Disadvantages: slow and the bus can become a major bottleneck Example: IBM PC - AT Here is an example showing a single bus, the backplane bus is used to provide communication between the processor and memory. As well as communication between I/O devices and memory. The advantage here is of course low cost. One disadvantage of this approach is that the bus with so many things attached to it will be lengthy and slow. Furthermore, the bus can become a major communication bottleneck if everybody wants to use the bus at the same time. The IBM PC is an example that uses only a backplane bus for all communication. +2 = 18 min. (X:58) 2/25/2019 cs252-S07, Lecture 20

A Two-Bus System Processor Memory I/O Bus Processor Memory Bus Adaptor I/O buses tap into the processor-memory bus via bus adaptors: Processor-memory bus: mainly for processor-memory traffic I/O buses: provide expansion slots for I/O devices Apple Macintosh-II NuBus: Processor, memory, and a few selected I/O devices SCCI Bus: the rest of the I/O devices Right before the break, I showed you a system with one bus only. Here is an example using two buses where multiple I/O buses tap into the processor-memory bus via bus adaptors. The Processor-memory bus is used mainly for processor-memory traffic while the I/O buses are used to provide expansion slots for the I/O devices. The Apple Macintosh-II adopts this organization where the NuBus is used to connect processor, memory, and a few selected I/O devices together. The rest of the I/O devices reside on an industry standard bus, the SCCI Bus, which is connected to the NuBus via a bus adaptor. +2 = 25 min. (Y:05) 2/25/2019 cs252-S07, Lecture 20

A Three-Bus System (+ backside cache)
Processor Memory Processor Memory Bus Bus Adaptor I/O Bus Backside Cache bus L2 Cache A small number of backplane buses tap into the processor-memory bus Processor-memory bus is only used for processor-memory traffic I/O buses are connected to the backplane bus Advantage: loading on the processor bus is greatly reduced Finally, in a 3-bus system, a small number of backplane buses (in our example here, just 1) tap into the processor-memory bus. The processor-memory bus is used mainly for processor memory traffic while the I/O buses are connected to the backplane bus via bus adaptors. An advantage of this organization is that the loading on the processor-memory bus is greatly reduced because of the small number of taps into the high-speed processor-memory bus. +1 = 26 min. (Y:06) 2/25/2019 cs252-S07, Lecture 20

Main components of Intel Chipset: Pentium 4
Northbridge: Handles memory Graphics Southbridge: I/O PCI bus Disk controllers USB controllers Audio Serial I/O Interrupt controller Timers 2/25/2019 cs252-S07, Lecture 20

John Kubiatowicz Electrical Engineering and Computer Sciences

Similar presentations

Presentation on theme: "John Kubiatowicz Electrical Engineering and Computer Sciences"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

John Kubiatowicz Electrical Engineering and Computer Sciences

Similar presentations

Presentation on theme: "John Kubiatowicz Electrical Engineering and Computer Sciences"— Presentation transcript:

Similar presentations

About project

Feedback