CpE 242 Computer Architecture and Engineering Interconnection Networks

CpE 242 Computer Architecture and Engineering Interconnection Networks
Start: X:40

Recap: Advantages of Buses
Processor I/O Device I/O Device I/O Device Memory Versatility: New devices can be added easily Peripherals can be moved between computer systems that use the same bus standard Low Cost: A single set of wires is shared in multiple ways The two major advantages of the bus organization are versatility and low cost. By versatility, we mean new devices can easily be added. Furthermore, if a device is designed according to a industry bus standard, it can be move between computer systems that use the same bus standard. The bus organization is a low cost solution because a single set of wires is shared in multiple ways. +1 = 7 min. (X:47)

Recap: Disadvantages of Buses
Processor I/O Device I/O Device I/O Device Memory It creates a communication bottleneck The bandwidth of that bus can limit the maximum I/O throughput The maximum bus speed is largely limited by: The length of the bus The number of devices on the bus The need to support a range of devices with: Widely varying latencies Widely varying data transfer rates The major disadvantage of the bus organization is that it creates a communication bottleneck. When I/O must pass through a single bus, the bandwidth of that bus can limit the maximum I/O throughput. The maximum bus speed is also largely limited by: (a) The length of the bus. (b) The number of I/O devices on the bus. (C) And the need to support a wide range of devices with a widely varying latencies and data transfer rates. +2 = 9 min. (Y:49)

Recap: Types of Buses Processor-Memory Bus (design specific)
Short and high speed Only need to match the memory system Maximize memory-to-processor bandwidth Connects directly to the processor I/O Bus (industry standard) Usually is lengthy and slower Need to match a wide range of I/O devices Connects to the processor-memory bus or backplane bus Backplane Bus (industry standard) Backplane: an interconnection structure within the chassis Allow processors, memory, and I/O devices to coexist Cost advantage: one single bus for all components Buses are traditionally classified as one of 3 types: processor memory buses, I/O buses, or backplane buses. The processor memory bus is usually design specific while the I/O and backplane buses are often standard buses. In general processor bus are short and high speed. It tries to match the memory system in order to maximize the memory-to-processor BW and is connected directly to the processor. I/O bus usually is lengthy and slow because it has to match a wide range of I/O devices and it usually connects to the processor-memory bus or backplane bus. Backplane bus receives its name because it was often built into the backplane of the computer--it is an interconnection structure within the chassis. It is designed to allow processors, memory, and I/O devices to coexist on a single bus so it has the cost advantage of having only one single bus for all components. +2 = 16 min. (X:56)

Recap: Increasing the Bus Bandwidth
Separate versus multiplexed address and data lines: Address and data can be transmitted in one bus cycle if separate address and data lines are available Cost: (a) more bus lines, (b) increased complexity Data bus width: By increasing the width of the data bus, transfers of multiple words require fewer bus cycles Example: SPARCstation 20’s memory bus is 128 bit wide Cost: more bus lines Block transfers: Allow the bus to transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request Our handshaking example in the previous slide used the same wires to transmit the address as well as data. The advantage is saving in signal wires. The disadvantage is that it will take multiple cycles to transmit address and data. By having separate lines for addresses and data, we can increase the bus bandwidth by transmitting address and data in the same cycle at the cost of more bus lines and increased complexity. This (1st bullet) is one way to increase bus bandwidth. Another way is to increase the width of the data bus so multiple words can be transferred in a single cycle. For example, the SPARCstation memory bus is 128 bits of 16 bytes wide. The cost of this approach is more bus lines. Finally, we can also increase the bus bandwidth by allowing the bus to transfer multiple words in back-to-back bus cycles without sending an address or releasing the bus. The cost of this last approach is an increase of complexity in the bus controller as well as a decease in response time for other parties who want to get onto the bus. +2 = 33 min. (Y:13)

Bus Summary: Bus arbitration schemes:
Daisy chain arbitration: it cannot assure fairness Centralized parallel arbitration: requires a central arbiter I/O device notifying the operating system: Polling: it can waste a lot of processor time I/O interrupt: similar to exception except it is asynchronous Delegating I/O responsibility from the CPU Direct memory access (DMA) I/O processor (IOP) Let’s summarize what we learned today. First we talked about three types of buses: the processor-memory bus, which is usually the shortest and fastest. The I/O bus, which has to deal with a large range of I/O devices, is usually the longest and slowest. The backplane bus is a general interconnect built into the chassis of the machine. The processor-memory bus, which runs at high speed, usually is synchronous while the I/O and backplane buses can either be synchronous or asynchronous. As far as bus arbitration schemes are concerned, I showed you two in details. The daisy chain scheme is simple, but it is slow and cannot assure fairness--that is a low priority device may never get to use the bus at all. The centralized parallel arbitration scheme is faster but it requires a centralized arbiter so I also show you how to build a simple arbiter using simple AND gates and JK flip flops. When we talked about OS’s role, we discussed two ways an I/O device can notify the operating system when data is ready or something goes wrong. Polling is simple to implement but it may end up wasting a lot of processor cycle. I/O interrupt is similar to exception but it is asynchronous with respect to instruction execution so we can pick our own convenient point in the pipeline to handle it. Finally we talked about two ways you can delegate I/O responsibility from the CPU: Direct memory access and I/O processor. +3 = 77 min. (Y:57)

Outline of Today’s Lecture
Recap and Introduction (5 minutes) Introduction to Buses (15 minutes) Bus Types and Bus Operation (10 minutes) Bus Arbitration and How to Design a Bus Arbiter (15 minutes) Operating System’s Role (15 minutes) Delegating I/O Responsibility from the CPU (5 minutes) Summary (5 minutes) Here is an outline of today’’s lecture. We will spend the first half of the lecture talking about buses and then after the break, we will talk about Operating System’s role in all of these. We will also talk about how to delegate I/O responsibility from the CPU. Finally we will summarize today’s lecture. +1 = 4 min. (X:44)

Networks Goal: Communication between computers
Eventual Goal: treat collection of computers as if one big computer Theme: Different computers must agree on many things => Overriding importance of standards Warning: Buzzword rich environment

Current Major Networks

Networks Facets people talk a lot about direct vs indirect topology
routing algorithm switching wiring What matters latency bandwidth cost reliability

ABCs of Networks Starting Point: Send bits between 2 computers
FIFO Queue on each end Can send both ways (“Full Duplex”) Rules for communication? “protocol” Inside a computer? Loads/Stores: Request(Address) & Response (Data) Need Request & Response Name for standard group of bits sent: Packet

A Simple Example What is format of packet? Fixed? Number bytes?
Request/ Response Address/Data 1 bit 32 bits 0: Please send data from Address 1: Data corresponding to request

Questions about Simple Example
What if more than 2 computers want to communicate? Need computer address field in packet? What if packet is garbled in transit? Add error detection field in packet? What if packet is lost? More elaborate protocols to detect loss? What if multiple processes/machine? Queue per process? Questions such as these lead to more complex protocols and packet formats

Protocol Stacks

Interconnection Networks
Examples MPP networks (CM-5): 1000s nodes; Š 25 meters per link Local Area Networks (Ethernet): 100s nodes; Š 1000 meters Wide Area Network (ATM): 1000s nodes; Š 5,000,000 meters

Interconnection Network Issues
Implementation Issues Performance Measures Architectural Issues Practical Issues

Implementation Issues
Interconnect MPP LAN WAN Example CM-5 Ethernet ATM Maximum length 25 m 500 m; copper: 100 m between nodes Š5 repeaters optical: 1000 m Number data lines 4 1 1 Clock Rate 40 MHz 10 MHz MHz Shared vs. Switch Switch Shared Switch Maximum number > 10,000 of nodes Media Material Copper Twisted pair Twisted pair copper wire copper wire or or Coaxial optical fiber cable

Media Twisted Pair: Several Mb/s up to km
– more with shielded twisted pair – category 5: 4 wires Why twisted? Coaxial Cable: Plastic Covering 10Mbps at 1km Braided outer conductor – more at shorter length Insulator Tap with T-junction or vampire Copper core Fiber Optics Total internal Air reflection Transmitter Receiver – L.E.D – Photodiode – Laser Diode light source Silica Gb/s at 1 km Multimode: many rays bouncing at different angles Single mode: diameter of fiber less than one wavelength – acts like a wave guide Line of sight (microwave) 2-40 GHz

Implementation Issues
Advantages of Serial vs. Parallel lines: No synchronizing signals Higher clock rate and longer distance than parallel lines. (e.g., 60 MHz x 256 bits x 0.5 m vs. 155 MHz x 1 bit x 100 m) Imperfections in the copper wires or integrated circuit pad drivers can cause skew in the arrival of signals, limiting the clock rate, and the length and number of the parallel lines. Switched vs. Shared Media: pairs communicate at same time: “point-to- point” connections

Network Performance Measures
Overhead: latency of interface vs. Latency: network

Example Performance Measures
Interconnect MPP LAN WAN Example CM-5 Ethernet ATM Bisection BW Nx 5MB/s MB/s N x 10 MB/s Int./Link BW 20 MB/s MB/s 10 MB/s Latency 5 µsec 15 µsec 50 to 10,000 µs HW Overhead to/from 0.5/0.5 µs 6/6 µs 6/6 µs SW Overhead to/from 1.6/12.4 µs 200/241 µs 207/360 µs (TCP/IP on LAN/WAN)

Importance of Overhead (+ Latency)
Ethernet / SS10: 9 Mb/s BW, µsecs ovhd ATM Synoptics: 78 Mb/s BW, 1,250 µsecs ovhd. NFS trace over 1 week: 95% msgs < 200 bytes Link bandwidth is a little like “peak MIPS” – it doesn’t tell you very much about delivered performance. – If you plug your 155Mb/s ATM in today to replace your 10Mb/s Ethernet don’t set your heart on a 15-fold improvement. – First off, the current network interface HW and comm. software has trouble even driving what the link is capable of. – However, the real issue is that the overhead (and a little bit of latency) is huge. This means seriously impacts the performance actually delivered because the vast majority of message are small. For every big transfer of a block, several little messages go back and forth to set up the transfer. To get the benefit of higher bandwidth we need the overhead to drop. What we see is that it is rising! For example, in a week’s trace of NFS traffic in our department 95% of the messages are under 200 bytes. If you look at the total work to move this volume of messages, ATM does shrink the raw transmission time. However, with the increase in the already large overhead there is only about a 20% gain, not 1500%. Link Bandwidth as misleading as MIPS

Example Performance Measures
Interconnect MPP LAN WAN Example CM-5 Ethernet ATM Topology “Fat” tree Line Variable, constructed from multistage switches Connection based? No No Yes Data Transfer Size Variable: Variable: Fixed: to 20B 0 to 1500B 48B

Topology Structure of the interconnect Determines
degree: number of links from a node diameter: max number of links crossed between nodes average distance: number of hops to random destination bisection minimum number of links that separate the network into two halves Warning: these three-dimensional drawings must be mapped onto chips and boards which are essentially two-dimensional media elegant when sketched on the blackboard may look awkward when constructed from chips, cables, boards, and boxes

Important Topologies 1D mesh Ring 2D mesh 2D torus Hypercube

Fat Tree

Connection based vs. Connectionless
Telephone: operator sets up connection between the caller and the receiver once the connection was established, conversation could continue for hours Share transmission lines over long distances by using switches to multiplex several conversations on the same lines “ time division multiplexing” divide BW transmission line into a fixed number of slots, with each slot assigned to a conversation Problem: lines busy based on number of conversations, not amount of information sent connectionless: every package of information must have an address => packets Each package is routed to the destination by looking at its address e.g., the postal system Split phase buses send packets

Packet formats Fields: Destination, Checksum(C), Length(L), Type(T)
Data/Header Sizes in bytes: (4 to 20)/4, (0 to 1500)/26, 48/5

Example: Ethernet (IEEE 802.3)
Essentially 10Kb/s 1 wire bus with no central control

Example: ATM (Asynchronous Transfer Mode)
Asynchronous Transfer Mode (155Mb/s, 622 in the future) Point-to-point, dedicated, switched 5+48 byte fixed sized cells Connection Oriented using Virtual Channels Bandwidth guarantees

Towards the Killer Network
High bandwidth, scalable (switched) LANs Repackaged MPP backplane (single chip switch) TMC, Intel, . . . IBM SP-2 Myrinet (Seitz & Cohen) Research ATM efforts DEC AN2 (ATM switch capable of Gb/s links) Commercial ATM products “off the curve,” but catching up Ethernet successors 100 Mbit/s: Fast Ethernet (Sun et al) 100 VGA (HP et al) Switched Ethernet Switched 100 Mbit/sec Ethernet MPP Killer Network TelCO LAN Where will the killer net come from? Any of the MPP vendors could in theory repackage their switch technology, with some small penalty due to longer physical connections. We have played with prototypes of this sort. - IBM has essentially made this move with the SP-1 SP-2, which are workstations connected to the network from the Vulcan project. - A small company, run by Chuck Seitz and Danny Cohen, has spun out of Caltech to take their MPP switch technology into the LAN arena. It is fast and cheap. - Of course, the big hoopla is the ATM efforts coming from the big players in the Telecommunications and LAN industry. The “research” efforts, such as DECs AN2 are very interesting. The first generation commercial offerings have been disappointing by comparison, but the momentum there gives us some confidence. A fast network and a fast processor is not enough, you need a fast connection between the two. In this respect there is cause for optimism. The Active Message work, do largely to Thorsten von Eiken’s efforts, have provide an “existence proof”.

Summary: Interconnections
Communication between computers Packets for standards, protocols to cover normal and abnormal events Implementation issues: length, width, media Performance issues: overhead, latency, bisection BW Topologies: many to chose from, but (SW) overheads make them look the alike; cost issues in topologies

CpE 242 Computer Architecture and Engineering Interconnection Networks

Similar presentations

Presentation on theme: "CpE 242 Computer Architecture and Engineering Interconnection Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CpE 242 Computer Architecture and Engineering Interconnection Networks

Similar presentations

Presentation on theme: "CpE 242 Computer Architecture and Engineering Interconnection Networks"— Presentation transcript:

Similar presentations

About project

Feedback