Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 Cache Bus.

Similar presentations


Presentation on theme: "Lecture 3 Cache Bus."— Presentation transcript:

1 Lecture 3 Cache Bus

2 Caches Cache is a small fast memory that holds copies of some locations of main memory Access to memory located in cache is fast, while access to memory not cached is low. Cache speeds up average memory use when properly used. Cache makes sense when CPU uses only a small set of locations at any one time, which are the active locations – working set.

3 Caches and CPUs address data cache main memory CPU controller cache

4 Cache operation Many main memory locations are mapped onto one cache entry. May have caches for: instructions; data; data + instructions (unified). Memory access time is no longer deterministic.

5 Terms Cache hit: required location is in cache.
Cache miss: required location is not in cache. Working set: set of locations used by program in a time interval.

6 Types of misses Compulsory (cold): location has never been accessed.
Capacity: working set is too large. Conflict: multiple locations in working set map to same cache entry.

7 Memory system performance
h = cache hit rate. tcache = cache access time, tmain = main memory access time. Average memory access time: tav = htcache + (1-h)tmain

8 Multiple levels of cache
L2 cache CPU L1 cache

9 Multi-level cache access time
h1 = cache hit rate. h2 = rate for miss on L1, hit on L2. Average memory access time: tav = h1tL1 + h2tL2 + (1- h2-h1)tmain

10 Replacement policies Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. Two popular strategies: Random. Least-recently used (LRU).

11 Cache organizations Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented). Direct-mapped: each memory location maps onto exactly one cache entry. N-way set-associative: each memory location can go into one of n sets.

12 Cache performance benefits
Keep frequently-accessed locations in fast cache. Cache retrieves more than one word at a time. Sequential accesses are faster after first access.

13 Write operations Write-through: always changes the cache and to main memory together. Write-back: if location in cache, write only to cache and write to main memory only when location is removed from cache.

14 Direct-mapped cache locations
Many locations map onto the same cache block. Conflict misses are easy to generate: Cache has 1024 blocks, one block for location Array a[] uses locations 0, 1, 2, … Array b[] uses locations 1024, 1025, 1026, … Operation a[i] + b[i] generates conflict misses.

15 Direct-mapped cache Tag specifies the memory location represented by a block. Valid indicates valid block. Data is the memory data. valid tag data 1 0xabcd byte byte byte ... byte cache block tag index offset = Index cache block number Offset inside the data hit value

16 Set-associative cache
A set of direct-mapped caches: Set 1 Set 2 Set n ... hit data Cache request is broadcasted to all sets simultaneously. If any of the sets has the location, hit is reported. So each set of locations is mapped to N blocks. Several locations from the same cache block can be cached in different sets.

17 direct-mapped vs. set-associative
S-a provides higher hit rate, because conflicts between small number of locations can be resolved S-a is slow, since requires more operations S-a is not predictable, the conflict in s-a are harder to analyze and therefore longest cache miss penalty is harder to determine. While in d-m it is predicatble.

18 Example: direct-mapped vs. set-associative

19 Direct-mapped cache behavior
After 001 access: block tag data After 010 access: block tag data

20 Direct-mapped cache behavior, cont’d.
After 011 access: block tag data After 100 access: block tag data

21 Direct-mapped cache behavior, cont’d.
After 101 access: block tag data After 111 access: block tag data

22 4-sets-associtive cache behavior
Final state of cache (twice as big as direct-mapped): set blk 0 tag blk 0 data blk 1 tag blk 1 data After 001 access After 010 access After 011 access After 100 access After 101 access After 111 access

23 2-sets-associative cache behavior
Final state of cache (same size as direct-mapped): set blk 0 tag blk 0 data blk 1 tag blk 1 data

24 Generic bus structure m Address: Data: Control: n c

25 Bus network Advantages: Disadvantages: Well-understood.
Easy to program. Many standards. Disadvantages: Contention. Significant capacitive load.

26 Buses The system interconnect using Memory Address Bus (MAB) and Memory Data Bus (MDB)

27 USCI The universal serial communication interface (USCI) supports multiple serial communication modes with one hardware module: UART mode I2C mode SPI mode

28 Serial vs. Parallel Bus The communication links across which computers talk to one another may be either serial or parallel. A parallel link transmits several streams of data (perhaps representing particular bits of a stream of bytes) along multiple channels (wires, printed circuit tracks, optical fibres, etc.); a serial link transmits a single stream of data. At first sight it would seem that a serial link must be inferior to a parallel one, because it can transmit less data on each clock tick. However, it is often the case that serial links can be clocked considerably faster than parallel links, and achieve a higher data rate. A number of factors allow serial to be clocked at a greater rate:

29 1. Serial bus does not have clock skew between the channels
Clock skew: in circuit designs, clock skew (sometimes timing skew) is a phenomenon in synchronous circuits in which the clock signal (sent from the clock circuit) arrives at different components at different times. This can be caused by many different things, such as wire-interconnect length, temperature variations etc. As the clock rate of a circuit increases, timing becomes more critical and less variation can be tolerated if the circuit is to function properly.

30 2. A serial connection requires fewer interconnecting cables (e. g
2. A serial connection requires fewer interconnecting cables (e.g. wires/fibres) and hence occupies less space. The extra space allows for better isolation of the channel from its surroundings ***In many cases, serial is cheaper to implement. Many circuits have serial interfaces, as opposed to parallel ones, so that they have fewer pins and are therefore less expensive.

31 3. Crosstalk is less of an issue, because there are fewer conductors in proximity.
In electronics, crosstalk (XT) is any phenomenon by which a signal transmitted on one circuit or channel of a transmission system creates an undesired effect in another circuit or channel.

32 Bus Master Bus mastering enables a device connected to the bus to initiate transactions. Also called "First-party DMA", to contrast it with ”Third Party Domain” , the situation where the system DMA controller is actually doing the transfer. Some types of bus allow only one device (typically the CPU, or its proxy) to initiate transactions. Most modern bus architectures, allow multiple devices to bus master because it significantly improves performance for general purpose OS

33 Bus Master Some real-time operating systems prohibit peripherals from becoming bus masters, because the scheduler can no longer arbitrate for the bus and hence cannot provide deterministic latency. While bus mastering theoretically allows one peripheral device to directly communicate with another, in practice almost all peripherals master the bus exclusively to perform DMA to main memory. If multiple devices are able to master the bus (multi-master bus), there needs to be an arbitration scheme to prevent multiple devices attempting to drive the bus simultaneously.

34 I2C I²C (Inter-Integrated Circuit) is a multi-master serial computer bus invented by Philips that is used to attach low-speed peripherals to a CPU, embedded system, or cell phone.

35 I2C I²C has a 7-bit address space with 16 reserved addresses, so a maximum of 112 nodes can communicate on the same bus.

36 I2C The bus has two roles for nodes: master and slave:
The bus is a multi-master bus which means any number of master nodes can be present. Additionally, master and slave roles may be changed between messages (after a STOP is sent). There are four potential modes of operation for a given bus device, although most devices only use a single role and its two modes: master transmit: master node is sending data to a slave master receive: master node is receiving data from a slave slave transmit: slave node is sending data to a master slave receive: slave node is receiving data from the master

37 I2C The master is initially in master transmit mode by sending a start bit followed by the 7-bit address of the slave it wishes to communicate with, which is finally followed by a single bit representing whether it wishes to write(0) to or read(1) from the slave. If the slave exists on the bus then it will respond with an ACK bit (active low for acknowledged) for that address. The master then continues in either transmit or receive mode (according to the read/write bit it sent), and the slave continues in its complementary mode (receive or transmit, respectively). The address and the data bytes are sent most significant bit first. The start bit is indicated by a high-to-low transition of SDA with SCL high; the stop bit is indicated by a low-to-high transition of SDA with SCL high.

38 I2C If the master wishes to write to the slave then it repeatedly sends a byte with the slave sending an ACK bit. (In this situation, the master is in master transmit mode and the slave is in slave receive mode.) If the master wishes to read from the slave then it repeatedly receives a byte from the slave, the master sending an ACK bit after every byte but the last one. (In this situation, the master is in master receive mode and the slave is in slave transmit mode.) The master then ends transmission with a stop bit, or it may send another START bit if it wishes to retain control of the bus for another transfer

39 I2C arbitration Every master monitors the bus for start and stop bits, and does not start a message while another master is keeping the bus busy. However, two masters may start transmission at about the same time; in this case, arbitration occurs. Slave transmit mode can also be arbitrated, when a master addresses multiple slaves, but this is less common. In contrast to protocols (such as Ethernet) that use random back-off delays before issuing a retry, I²C has a deterministic arbitration policy. Each transmitter checks the level of the data line (SDA) and compares them with the levels it expects; if they don't match, that transmitter has lost arbitration, and drops out of this protocol interaction.

40 Serial Peripheral Interface (SPI) Bus
A serial single-master serial data link standard named by Motorola that operates in full duplex mode. Devices communicate in master/slave mode where the master device initiates the data frame. Multiple slave devices are allowed with individual slave select (chip select) lines. Sometimes SPI is called a "four wire" serial bus, contrasting with three, two, and one wire serial buses. All devices are at the same voltage level The SPI bus can operate with a single master device and with one or more slave devices.

41 SPI logic signals The SPI bus specifies four logic signals.
SCLK — Serial Clock (output from master) MOSI/SIMO — Master Output, Slave Input (output from master) MISO/SOMI — Master Input, Slave Output (output from slave) SS — Slave Select (active low; output from master)

42 PCI The Peripheral Component Interconnect, or PCI Standard specifies a computer bus for attaching peripheral devices to a CPU. These devices can take any one of the following forms: An integrated circuit fitted onto the CPU itself, called a planar device in the PCI specification. An expansion card that fits into a socket.

43 PCI provides two separate 32-bit or 64-bit address spaces corresponding to the memory and I/O port address spaces of the CPU. Addresses in these address spaces are assigned by software. A third address space, called the PCI Configuration Space, which uses a fixed addressing scheme, allows software to determine the amount of memory and I/O address space needed by each device. Each device can request up to six areas of memory space or I/O port space via its configuration space registers. In a typical system, the firmware (or operating system) queries all PCI buses at startup time (via PCI Configuration Space) to find out what devices are present and what system resources (memory space, I/O space, interrupt lines, etc.) each needs. It then allocates the resources and tells each device what its allocation is. Each device is a master whenever uses its timeslot.

44 SCSI Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, and electrical and optical interfaces. SCSI is most commonly used for hard disks and tape drives, but it can connect a wide range of other devices, including scanners and CD drives. The SCSI standard defines command sets for specific peripheral device types; the presence of "unknown" as one of these types means that in theory it can be used as an interface to almost any device, but the standard is highly pragmatic and addressed toward commercial requirements.


Download ppt "Lecture 3 Cache Bus."

Similar presentations


Ads by Google