Presentation is loading. Please wait.

Presentation is loading. Please wait.

SoC Architecture - Lecture 2 A. Jantsch / Z. Lu / I. Sander

Similar presentations


Presentation on theme: "SoC Architecture - Lecture 2 A. Jantsch / Z. Lu / I. Sander"— Presentation transcript:

1 SoC Architecture - Lecture 2 A. Jantsch / Z. Lu / I. Sander
4/24/2017 Buses A. Jantsch / Z. Lu / I. Sander Dally: Chapter 22, 18 (c) A. Jantsch, I. Sander, Z. Lu

2 Some History…

3 ENIAC 60 years ago… ENIAC, short for Electronic Numerical Integrator and Computer, was the first large-scale, electronic, digital computer capable of being reprogrammed to solve a full range of computing problems, although earlier computers had been built with some of these properties. Source: April 24, 2017 SoC Architecture

4 ENIAC 60 years ago… ENIAC was designed and built to calculate artillery firing tables for the U.S. Army's Ballistics Research Laboratory. The first problems run on the ENIAC however, were related to the design of the hydrogen bomb. Source: April 24, 2017 SoC Architecture

5 ENIAC 60 years ago… ENIAC contained 17,468 vacuum tubes, 7,200 crystal diodes, 1,500 relays, 70,000 resistors, 10,000 capacitors and around 5 million hand-soldered joints It weighed 27 tons, was roughly 2.4 m by 0.9 m by 30 m, took up 167 m² Source: April 24, 2017 SoC Architecture

6 ENIAC 60 years ago… ENIAC consumed 150 kW of power or about the same as 2000 Pentium 4 chips. That's 500 million times the power used by some of TI's MSP430 processors when operating at 1 MHz (200 times the ENIAC's clock rate of 5kHz). It took up to 29 milliseconds to do a division. The Pentium 4 is a million times faster and twice as precise. Source: April 24, 2017 SoC Architecture

7 ENIAC 60 years ago… The machine cost $500k in 1946 dollars, equivalent to $5,134,000 today Now some variants of Microchip's PIC10 go for $0.39 Source: April 24, 2017 SoC Architecture

8 Classic Bus

9 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Digital Signal Processor Input/ Output Device Micro- controller Memory Bus April 24, 2017 SoC Architecture

10 Bus Properties Serialization
Only one component can send a message at any given time There is a total order of messages Digital Signal Processor Input/ Output Device Micro- controller Module 1 Module 2 Module 3 Module 4 Memory 1 2 Bus Bus April 24, 2017 SoC Architecture

11 Bus Properties Broadcast
A module can send a message to several other components without an extra cost Module 1 Module 2 Module 3 Module 4 Bus April 24, 2017 SoC Architecture

12 Bus Hardware Principle for hardware to access the bus
Reg ER ET Module 1 Reg ER ET Module 2 Reg ER ET Module 3 Bus Principle for hardware to access the bus Bus Transmit: ET active Bus Receive: ER active April 24, 2017 SoC Architecture

13 Bus Transmitter Interfaces
ET Bus T Bus ET ET T T Dotted emitter driver Tri-state driver Open-drain driver April 24, 2017 SoC Architecture

14 Cycles, Messages and Transactions
Buses operate in units of cycles, messages and transactions. Message: Logical unit of information (a read message contains an address and control signals for read) Cycles: A message requires a number of cycles to be sent from sender to receiver over the bus Transaction: A transaction consists of a sequence of messages which together form a transaction (a memory read requires a memory read message and a reply with the requested data) April 24, 2017 SoC Architecture

15 Synchronous Bus Includes a clock in the control lines A fixed protocol for communication that is relative to the clock Advantage: involves very little logic and can run very fast Disadvantages: Every device on the bus must run at the same clock rate To avoid clock skew, they cannot be long if they are fast CLK READ ADR DATA April 24, 2017 SoC Architecture

16 Asynchronous Bus It is not clocked
It can accommodate a wide range of devices It can be lengthened without worrying about clock skew It requires a handshaking protocol READ ADR DATA ACK Master puts address on bus and asserts READ when address is stable Memory puts data on bus and asserts ACK when data is stable Master deasserts READ when data is read Memory deasserts ACK April 24, 2017 SoC Architecture

17 Bus Arbitration

18 Bus Arbitration Since only one bus master can use the bus at a given time bus arbitration is used An arbiter collects the requests of all bus masters and gives only one module the right to access the bus (bus grant) Arbiter Req Req Req Grant Grant Grant Module 1 Module 2 Module 3 Bus April 24, 2017 SoC Architecture

19 Importance of Arbiters
Arbiters are not only used in bus-system, but everywhere where several devices request shared resources In network-on-chips arbitration is for instance needed, if two or more packets want to enter the same channel April 24, 2017 SoC Architecture

20 Arbiter Interfaces This arbiter interface can be used to give a bus grant for a fixed number of cycles (a): 1 cycle (b): 4 cycles April 24, 2017 SoC Architecture

21 Arbiter Interfaces This arbiter allows for variable length grants
The grant is hold as long as the “hold”-line (controlled by client) is asserted In cycle 2 requester 0 gets the bus for 3 cycles In cycle 5 requester 1 gets the bus for 2 cycles In cycle 7 requester 1 gets the bus for one cycle April 24, 2017 SoC Architecture

22 Fairness Fairness is a key property of an arbiter Some definitions:
Weak fairness: Every request is eventually served Strong fairness: Requests will be served equally often Weighted “strong” fairness: The number of times requester i is served is equal to its weight wi FIFO fairness: Requests are served in the order the requests have been made April 24, 2017 SoC Architecture

23 Local Fairness vs. Global Fairness
Even if an arbiter is locally fair, a system with several arbiters employing that arbiter may not be fair. Though each arbiter Ai allocate 50% of their bandwidth to its two inputs, r0 only gets 12.5% of the total bandwidth, while r3 gets 50%. April 24, 2017 SoC Architecture

24 Fixed-Priority Arbiter
A fixed-priority arbiter can be constructed as an iterative circuit Each cell receives a request input ri and a carry input ci and generates a grant output gi and a carry output ci+1 The resulting arbiter is not fair, since a continuously asserted request r0 means that none of the other requests will ever be served! April 24, 2017 SoC Architecture

25 Fair Arbiters A fair arbiter can be generated by changing the priority from cycle to cycle Depending on the priority generation, different arbitration schemes and degrees of fairness can be achieved Only one input pi has the value 1. All other inputs pj have the value 0. April 24, 2017 SoC Architecture

26 Fair Arbiters Oblivious Arbiters
If pi is generated without knowledge of ri and gi, the result is an oblivious (unconscious) arbiter Examples are: Randomly generated pi Rotating priorities (by shiftregister) Weak fairness, but not strong fairness April 24, 2017 SoC Architecture

27 Oblivious Arbiters Oblivious arbiters provide weak fairness
but not strong fairness (i.e. if r0 and r1 are constantly asserted) Request r1 wins the arbitration only when p1 is true, in all other cases r0 gets the grant 1 April 24, 2017 SoC Architecture

28 Round-Robin Arbiter A round-robin arbiter achieves strong fairness
A request that was just served gets the lowest priority 1 April 24, 2017 SoC Architecture

29 Weighted Round-Robin Arbiter
A weighted round-robin arbiter allows to give requesters a larger number of grants than other requesters in a controlled fashion If three devices have the weight 1,2,3 they get 1/6, 1/3 and 1/2 of the grants The preset line is activated periodically after N (here 6 cycles) to load the counter with its weight If some modules do not issue any requests during that interval, the shared resource will remain idle until the next preset cycle April 24, 2017 SoC Architecture

30 Matrix Arbiter A matrix arbiter implements a least recently served priority scheme by maintaining a triangular array of state bits wij for all i < j If wij is true, then request i takes priority over request j Each state bit is set on column grant and reset on row grant = a gi results in lowest priority for stage i in next cycle Only the upper triangular portion needs to be maintained The matrix arbiter has to be proper initialized The Matrix arbiter is very good suited for a small number of inputs, since it is fast, easy to implement and provides strong fairness! (Exercise Dally 18-3) gi gj wij ij gj gi wij April 24, 2017 SoC Architecture

31 Grand-Hold Circuit Allows for uninterrupted access to a resource for several cycles Extends the duration of a grant As long as hold is asserted further arbitration is disabled April 24, 2017 SoC Architecture

32 Queuing Arbiter A queuing arbiter provides FIFO fairness
It assigns each request a time stamp when it is asserted The request with the earliest time stamp receives the grant Cost is determined by size of the time stamp wi = log2 (Δt / ta) Δt = 2nTmax wi … number of bits for time stamp Δt … time stamp range ta … arrival interval n … number of inputs to arbiter Tmax … maximum service time April 24, 2017 SoC Architecture

33 Bus Bridge

34 Bus Bridges Bus bridges are used to separate high-performance devices from low-performance devices All communication from high-performance bus with the low performance device goes via the bridge April 24, 2017 SoC Architecture

35 AHB to ISA Bus Bridge April 24, 2017 SoC Architecture

36 AHB Basic Transfer April 24, 2017 SoC Architecture

37 AHB and ISA Timing April 24, 2017 SoC Architecture

38 Bridge Implementation
April 24, 2017 SoC Architecture

39 Bus Protocols

40 Low Performance Bus Protocol
Without a special bus protocol the bus is not efficiently used In the example module 2 requests the bus in cycle 2, but must wait until cycle 6 to receive the grant April 24, 2017 SoC Architecture

41 Bus Pipelining A memory access consists of several cycles (including arbitration) Since the bus is not used in all cycles, pipelining can be used to increase the performance Write Access Read Access AR ARB AG RQ ACK AR ARB AG RQ P RPLY Arb request Arb request Arbiter Arbiter Arb grant Arb grant Bus Bus Only one transaction can Receive the grant during a given cycle Use the bus during a given cycle April 24, 2017 SoC Architecture

42 Bus Pipelining Pipelining leads to an efficient use of the bus
Stalls are inserted since only one instance can use the bus Sometimes (cycle 12) two transactions can overlap However this cannot be done in cycle 5 (2. Write) since otherwise RPLY and ACK would overlap in cycle 6! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Bus busy 1. Read AR ARB AG RQ P RPLY 2. Write AR ARB AG Stall Stall RQ ACK 3. Write AR ARB Stall Stall AG Stall RQ ACK 4. Read AR Stall Stall ARB Stall AG Stall RQ P RPLY 5. Read AR Stall ARB Stall AG RQ P RPLY 6. Read AR Stall ARB AG Stall Stall RQ April 24, 2017 SoC Architecture

43 Split-Transaction Bus
In a split-transaction bus a transaction is splitted into a two transactions ”request”-transaction ”reply”-transaction Both transactions have to compete for the bus by arbitration April 24, 2017 SoC Architecture

44 Split-Transaction Buses
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1. Read AR ARB AG RQ 2. Write AR ARB AG RQ 3. Write AR ARB AG RQ 4. Read AR ARB AG RQ 1. Reply AR ARB AG RPLY 5. Read AR ARB Stall Stall Stall Stall AG RQ 2. Reply AR ARB AG RPLY 6. Read AR ARB Stall Stall Stall Stall AG RQ 3. Reply AR ARB AG RPLY 4. Reply AR ARB AG RPLY 5. Reply AR ARB AG 6. Reply AR ARB Bus busy 1 2 3 4 1 2 3 4 5 6 April 24, 2017 SoC Architecture

45 Split-Transaction Buses
The advantages of the split-transaction bus are evident, if there is a variable delay for requests. Pipelined Bus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1. Trans RQ A RP A 2. Trans RQ B RP B 3. Trans RQ C RP C Split-Transaction Bus 1. Trans RQ A RP A 2. Trans RQ B RP B 3. Trans RQ C RP C April 24, 2017 SoC Architecture

46 Burst Messages There is a considerable amount of overhead in a bus transaction Arbitration Addressing Acknowledgement ARB ARB ARB ARB Cmd Adr Data Cmd Adr Data Cmd Adr Data Cmd Adr Data Request Efficiency = Transmitted Words / Message Size = 1/3 April 24, 2017 SoC Architecture

47 Burst Messages The overhead can be reduced, if messages are sent as blocks (bursts) ARB ARB ARB ARB Cmd Adr Data Cmd Adr Data Cmd Adr Data Cmd Adr Data Request ARB Cmd Adr Data Data Data Data Burst Request Efficiency = Transmitted Words / Message Size = 2/3 April 24, 2017 SoC Architecture

48 Burst Messages The longer the burst, the better the efficiency BUT
Other bus masters have to wait, which may be unacceptable in many systems (Real-Time) Possible solution: Maximum length for a burst Interrupt of long messages Restart or Resume Arbitration Gnt A Gnt B Res A Message A Cmd Adr Data Data Data Data Adr Data Data Data Data Message B Cmd Adr Data April 24, 2017 SoC Architecture

49 Modern SoC buses

50 Embedded busses Current system-on-chips are advanced enough to need a hierarchy of busses A new set of bus standards have been defined to be used in SoCs, e.g. ARM Amba Altera Avalon OCP – Open Communication Protocol These busses allow for higher performance than traditional Tri-State busses April 24, 2017 SoC Architecture

51 Comparison: Multiplexor Bus and Tri-State Bus
Arbiter Mux avoids collision! BM1 Collision! MUX Req, Adr, Data Adr1 Adr2 BM2 BM1 BM2 Req, Adr, Data Multiplexer Bus Bus Master can send their request including address and data (for write) at the same time Arbiter selects a bus master Tri-State Bus Only one bus master can output address or data (otherwise collision) A Bus Grant is needed to output address or data April 24, 2017 SoC Architecture

52 AMBA Specification The AMBA specification defines an on-chip communications standard for designing high-performance embedded micro-controllers Three buses are defined Advanced High-Performance Bus (AHB) Advanced System Bus (ASB) Advanced Peripheral Bus (APB) A test methodology is included within AMBA which provides an infrastructure for modular macrocell test and diagnostic access April 24, 2017 SoC Architecture

53 System based on an AMBA Bus
An AMBA system typically contains a high speed bus (ASB or AHB) for CPU, fast memory and DMA and a bus for peripherals (APB), which is connected via a bridge to the high-speed bus April 24, 2017 SoC Architecture

54 AMBA Buses AMBA AHB (new standard) AMBA ASB (older standard) AMBA APB
High Performance Pipelined Operation Multiple Bus Masters Burst Transfers Split Transactions AMBA ASB (older standard) AMBA APB Low Power Latched Address and Control Simple Interface Suitable for many peripherals April 24, 2017 SoC Architecture

55 AMBA AHB System AHB Master AHB Slave
A bus master is able to initiate read and write information by providing address and control information. Only one bus master can use the bus at the same time AHB Slave A bus slave responds to a read and write operation within a given address-space range. The bus slave signals back to the active bus master the success, failure or waiting of the data transfer April 24, 2017 SoC Architecture

56 AMBA AHB System AHB Arbiter AHB Decoder
The bus arbiter ensures that only one bus master at a time is allowed to initiate data transfers. Even though the arbitration protocol is fixed, any arbitration algorithm, such as highest priority or fair access can be implemented depending on the application requirements An AHB includes only one arbiter AHB Decoder The AHB decoder is used to decode the address of each transfer and provides a select signal for the slave that is involved in the transfer A single centralized decoder is required in all AHB implementations April 24, 2017 SoC Architecture

57 AMBA AHB Bus Interconnection
April 24, 2017 SoC Architecture

58 AMBA AHB Bus Interconnection
AHB Protocol is based on a central multiplexer interconnection scheme All bus masters send their request in form of address and control signals The arbiter chooses one master. The address and control signals are routed to all slaves The decoder selects the signals from the slave that is involved in the transfer with the bus master April 24, 2017 SoC Architecture

59 AMBA ARM’s Advanced Microcontroller Bus Interface
APB (Advanced Peripheral Bus) ASB (Advanced System Bus) Multiple masters Pipelined operations AMBA : AHB (Advanced High Performance Bus) Burst transactions Split transactions, multiple outstanding transactions Single cycle master hand-over Exclusive bus control Single- centralized decoder April 24, 2017 SoC Architecture

60 AMBA 3 - 2004 Multiple parallel connections Pipelined bursts
Only 2-stage network Central n x m switch matrix April 24, 2017 SoC Architecture

61 Multi Layer AMBA Bus April 24, 2017 SoC Architecture

62 Multi Layer AMBA Bus Multiple slaves on one slave port Local Slaves
April 24, 2017 SoC Architecture

63 Multi Layer AMBA Bus Multiple masters on one layer April 24, 2017
SoC Architecture

64 Multi Layer AMBA Bus Separate AHB Subsystems April 24, 2017
SoC Architecture

65 Multi Layer AMBA Bus Example
April 24, 2017 SoC Architecture

66 AXI The new AMBA bus protocol
The objectives of the latest generation AMBA interface are to: be suitable for high-bandwidth and low-latency designs enable high-frequency operation without using complex bridges meet the interface requirements of a wide range of components be suitable for memory controllers with high initial access latency provide flexibility in the implementation of interconnect architectures be backward-compatible with existing AHB and APB interfaces. April 24, 2017 SoC Architecture

67 AXI The new AMBA bus protocol
The key features of the AXI protocol are: separate address/control and data phases support for unaligned data transfers using byte strobes burst-based transactions with only start address issued separate read and write data channels to enable low-cost Direct Memory Access (DMA) ability to issue multiple outstanding addresses out-of-order transaction completion easy addition of register stages to provide timing closure April 24, 2017 SoC Architecture

68 AXI Channels AW: Address Write Channel W: Write Data Channel
B: Write Acknowledgement Channel AR: Address Read Channel RID: Read Data Channel April 24, 2017 SoC Architecture

69 AXI Ordering Model AWID The ID tag for the write address group of signals. WID The write ID tag for a write transaction. Must match the AWID BID The ID tag for the write response; Must match the AWID and WID. ARID The ID tag for the read address group of signals. RID The read ID tag for a read transaction; Must match the ARID. The interconnect appends Master id to AWID, ARID, WID April 24, 2017 SoC Architecture

70 Ordering Rules Transactions from different masters have no ordering restrictions. They can complete in any order. Transactions from the same master, but with different ID values, have no ordering restrictions. They can complete in any order. The data for a sequence of write transactions with the same AWID value must complete in the same order that the master issued the addresses in. The data for a sequence of read transactions with the same ARID value must be returned in order that: when reads with the same ARID are from the same slave then the slave must ensure that the read data returns in the same order that the addresses are received. when reads with the same ARID are from different slaves, the interconnect must ensure that the read data returns in the same order that the master issued the addresses in. There are no ordering restrictions between read and write transactions with the same AWID and ARID. If a master requires an ordering restriction then it must ensure that the first transaction is fully completed before the second transaction is issued. April 24, 2017 SoC Architecture

71 AMBA 3 - 2004 AXI - Advanced eXtensible Interface AMBA 4 – 20??:
Abstract interface protocol Multiple parallel transactions Multiple outstanding transactions Transactions may complete out of order IDs to group transactions for ordering control Master/slave and read/write transaction based protocol AMBA 4 – 20??: More flexible and abstract protocol Support for QoS April 24, 2017 SoC Architecture More information can be found on

72 Altera Avalon Bus Features Open Standard Up to 128-bit wide data
Synchronous operation Open Standard Specification specifies communication between Master and switch-fabric Slave and switch-fabric Third party vendors can develop their Avalon devices April 24, 2017 SoC Architecture

73 Avalon Bus – Transfer Modes
The Avalon Specification allows (among others) the following transfer modes Wait-states: Fixed or variable (slave only) Pipeline: Fixed or variable latency Burst Tristate (devices with a shared read/write channel) Reference: Avalon Interface Specification, Avalon Switch Fabric April 24, 2017 SoC Architecture

74 Avalon Switch Fabric Master ports only wait to access a slave port, if another master tries to access the same slave Multi-master access is resolved weighted round-robin arbitration Designer can define shared values, which define how often a master is allowed to access a slave (relative to other masters) April 24, 2017 SoC Architecture

75 Avalon Bus and SOPC Builder
The Avalon Bus is generated automatically, when a new Nios II core with peripherals is created in SOPC-builder Changes in the design of the architecture lead to a new structure of the Avalon Switch Fabric The user does not see the bus structure or the internal structure of the Avalon Switch Fabric April 24, 2017 SoC Architecture

76 Summary

77 Summary and Outlook A bus is an excellent communication medium to connect several devices Since the bus is a shared communication medium, it is a bottleneck in the system Many different arbitration techniques exist, which lead to different behaviors of the system April 24, 2017 SoC Architecture

78 Summary and Outlook Techniques like split-transaction and bridges can increase the performance of a bus, but there is a limit Networks-on-Chip architectures aim to offer communication capabilities that are more general and flexible than buses Modern buses evolve and have more and more network-like capabilities! April 24, 2017 SoC Architecture


Download ppt "SoC Architecture - Lecture 2 A. Jantsch / Z. Lu / I. Sander"

Similar presentations


Ads by Google