Download presentation
Presentation is loading. Please wait.
1
On-Chip Communication
A SoC Design Automation School of EECS Seoul National University
2
System-on-Chip Design
Introduction Introduction System-on-Chip Design Computation for functional blocks in the application HW modules SW running on processors Communication (interface) between HW modules and SW processes HW Buses/bridges or networks Wrappers Buffers or memories Decoders DMA controllers SW Device drivers Interrupt service routines OS Memory instructions DSP ASIC Mem. Bridge P Mem. Mem.
3
On-Chip Communication
Application SW OS Device drivers Comm. Wrap. DMA Local memory w/ I/D caches Communication network (OCBs w/ bridges, Sonics, packet/circuit switch, etc.) HW IP Memory Processor local bus mP, DSP SW HW
4
On-Chip Communication
Communication thru on-chip bus or on-chip network On-chip bus AMBA Advanced High-performance Bus (AHB) (ARM Inc.) SiliconbackplaneTM III MicroNetwork (Sonics Inc.) On-chip network (or Network-on-Chip) Circuit-switch network: PROPHID (Philips) Packet-switch network: Torus, Octagon, Mesh (Aethereal (Philips))
5
On-Chip Bus AMBA AMBA 2.0 specification, http://www.amba.com
6
AMBA AHB (Advanced High Performance Bus)
On-Chip Bus AMBA AHB (Advanced High Performance Bus) One of the most popular on-chip buses High performance Pipelined operation Multiple bus masters Burst transfers Split transactions Central multiplexer interconnection scheme (cf. tri-state implementation) AHB interconnection
7
AHB bus master-slave interface
On-Chip Bus AHB bus master-slave interface HRESP[1:0]: OKAY, ERROR, RETRY, SPLIT HTRANS[1:0]: IDLE, BUSY, NONSEQ, SEQ HSIZE[2:0]: 8-bit up to 1024-bit transfer HBURST[2:0]: single, 4, 8, 16-beat burst, incrementing or wrapping HRESP: 00 OKAY When HREADY is HIGH this shows the transfer has completed successfully. The OKAY response is also used for any additional cycles that are inserted, with HREADY LOW, prior to giving one of the three other responses. 01 ERROR signaled to the bus master so that it is aware the transfer has been unsuccessful. (e.g. write to a ROM address) 10 RETRY the transfer has not yet completed, so the bus master should retry the transfer. The master should continue to retry the transfer until it completes. 11 SPLIT The transfer has not yet completed successfully. The bus master must retry the transfer when it is next granted access to the bus. The slave will request access to the bus on behalf of the master when the transfer can complete. HTRANS: 00 IDLE bus master is granted the bus, but does not wish to perform a data transfer. 01 BUSY bus master is continuing with a burst of transfers, but the next transfer cannot take place immediately (due to some reason on the master side) 10 NONSEQ Indicates the first transfer of a burst or a single transfer. 11 SEQ The remaining transfers in a burst are SEQUENTIAL transfer size=2**HSIZE bytes (up to 2**7 bytes = 1024 bits) Bursts must not cross a 1kB address boundary.
8
On-Chip Bus AHB Decoder
9
On-Chip Bus AHB Arbiter Arbiter
10
AHB operation Address phase for a single cycle
On-Chip Bus AHB operation Address phase for a single cycle Data phase may require several cycles Slave samples the address and control and then perform read/write
11
On-Chip Bus AHB operation (cont'd) Address phase can overlap with data phase of previous data transfer due to pipelining Wait state extends data phase, which effectively extends the next address phase APE HIGH: pipelined mode for AHB and ASB --> need to pipeline the address and control at the slave side APE LOW: depipelined mode for APB
12
Burst transfer Cache linefill LDM, STM (non-cached)
On-Chip Bus Burst transfer Cache linefill LDM, STM (non-cached) Use of Burst: - no interruption during the transaction except split - slave can exploit the info 4-beat wrapping burst
13
On-Chip Bus Split transaction Improves bus utilization by separating master request from slave response
14
On-Chip Bus APB bridge Latches the address and holds it valid throughout the transfer Decodes the address and generates a peripheral select Drives data onto APB for a write transfer Drives APB data onto system bus for a read transfer Generate timing strobe for the transfer
15
On-Chip Bus Interfacing APB to AHB Write transfer from AHB
16
Multi-layer AHB An interconnection scheme based on AHB protocol
On-Chip Bus Multi-layer AHB An interconnection scheme based on AHB protocol Enables parallel access paths between multiple masters and slaves in a system.
17
Sonics SiliconBackplane MicroNetwork
On-Chip Bus Sonics SiliconBackplane MicroNetwork SiliconBackplaneTM III MicroNetwork specification, On-chip bus Time-division multiple access (TDMA)
18
Pre-characterized interconnect helps timing convergence.
On-Chip Bus Pre-characterized interconnect helps timing convergence. Agents are placed near attached IP cores. Distributed multiplexed bus structure with OR-tree repeaters.
19
On-Chip Bus Two-step arbitration Originally assigned module TDMA If no bus access priority-based
20
On-Chip Bus Pipeline depth Based on memory target latency at the desired clock frequency
21
Bus Matrix Synthesis Bus Matrix Synthesis S. Pasricha, N. Dutt, and M. Ben-Romdhane, "Constraint-driven bus matrix synthesis for MPSoC," ASP-DAC, 2006 Full bus matrix Partial bus matrix
22
Communication throughput graph and synthesis flow
Bus Matrix Synthesis Communication throughput graph and synthesis flow CTG (Communication Throughput Graph): G(V,A) v: component in the system a: communication constraints on bus clock to get data traffic statistics TCP (Throughput Constraint Path) is a critical path that impacts the maintenance of a given throughput constraint remove unused buses & local slaves CTG example generate all legal solutions TCP (Throughput Constraint Path) is a sub-graph of CTG, consisting of a single master for which data throughput must be maintained and other masters, slaves and memories that are in the critical path that impacts the maintenance of the throughput Generating solutions at step 3: - discard duplicate clustering - discard if buses are not merged (no intersection between maser sets accessing the slave sets) - discard if the slave clocks are not compatible minimize design: minimize all bus clock speeds and out-of-order buffer sizes ranked by #buses Synthesis flow
23
Bus Matrix Synthesis Case Studies
24
Network-on-Chip Why NoC? Scalability Performance Design effort
connection to neighbors --> cost increases linearly with number of components while maintaining the speed Performance parallelized communications high clock rate Design effort modular distributed network independence predictable wire delay ... International Standards Organization (ISO) developed the 7-layer Open Systems Interconnection (OSI) model to describe networks.
25
Quality of Service Guaranteed service Best-effort service
Network-on-Chip Quality of Service Guaranteed service Require resource reservation for worst-case scenarios Resources are often underutilized Best-effort service Designed for average-case scenarios Unpredictable performance
26
Switching mode Circuit switch network Packet switch network
Network-on-Chip Switching mode Circuit switch network Communication path is fixed before data transmission starts. Advantage QoS guaranteed Suit for real-time system Disadvantage Lower resource utilization Connection setup overhead Packet switch network Communication path is determined dynamically depending on network traffic. Better adaptation of communication to varying network traffic Better utilization of network resource Poor QoS
27
Routing mode (packet switch network only)
Network-on-Chip Routing mode (packet switch network only) Store-and-forward An incoming packet is stored entirely before it is forwarded to the next node. Wormhole routing An incoming packet is forwarded as soon as the packet header is evaluated. In case the next hop is blocked, the packet tail remains in the network and blocks other resources. Virtual cut-through In case the next hop is blocked, the packet tail is stored in a local buffer.
28
Circuit switch network
Network-on-Chip Circuit switch network Philips PROPHID architecture J. Leijten, J. van Meerbergen, A. Timmer, and J. Jess, “Stream communication between real-time tasks in a high-performance multiprocessor,” in Proc. Design Automation and Test in Europe, 1998
29
Packet switch network Torus
Network-on-Chip Packet switch network Torus W. J. Dally and B. Towles, “Route packet, not wires: on-chip interconnection networks,” in Proc. Design Automation Conference, June 2001. - Top 2 metal layers are used for the 2D folded torus topology. - Each tile can have processor, DSP, memory, I/O, etc. - 256bit data line
30
Network-on-Chip Octagon Faraydon Karim, Anh Nguyen, Sujit Dey, and Ramesh Rao, “On-chip communication architecture for OC-768 network processors,” in Proc. Design Automation Conference, June 2001.
31
Network-on-Chip Mesh Kees Goosens, John Dielissen, and Andrei Radulescu, “Aethereal network on chip: concepts, architectures, and implementations,” IEEE Design & Test of Computers, Sept.-Oct., 2005. Combination of circuit and packet switching Circuit switching for guaranteed service Best-effort service with packet switching
32
Interface between HW modules and SW processes
Interface Design Interface Design Interface between HW modules and SW processes Interface between a HW module and a bus/network Called a communication wrapper May perform interface protocol conversion and/or system level buffering and caching SW interface Device drivers Interrupt service routine OS (communication services) Memory instructions
33
Communication wrapper design
Interface Design Communication wrapper design Wrapper architecture IP PA: Port adapter CA: Channel adapter PA PA PA PA External port Internal port Internal bus CA CA Communication Wrapper Communication Network #1 Communication Network #2 Communication Network : AMBA, MicroNetwork, …
34
Mux Port/Channel Adapters (Master) Port/Channel Adapter (Slave)
Interface Design Wrapper internal bus architecture Internal Bus Arbiter Address Decoder MasterSel nREQ nGRNT enable Mux Port/Channel Adapters (Master) Port/Channel Adapter (Slave) External Port External Port Address Data_bus status
35
Interface SW design Operating system Device driver and ISR
Interface Design Interface SW design Operating system Communication services Pipe, shared memory, semaphore, mutex, etc. Supported as OS system calls Device driver and ISR Device drivers depend on OS and processor OS Preemptive or not, interrupt or not, synchronization services (semaphore, lock var, …) Processor Bus width, register set, exception behavior, etc. Memory instructions Load/store, load multiple/store multiple instructions Cache/virtual memory instructions
36
Physical Communication Network
Interface Design Interface design flow Communication refinement Automatic generation of adapter architecture Channels M1 M3 M2 M1 M3 wrapper IP Internal Bus MA µP OS wrapper CA1 CA2 CA3 CA4 Physical Communication Network
37
Interface generation flow
Interface Design Interface generation flow W. CESARIO, Y. PAVIOT, A. BAGHDADI, L. GAUTHIER, D. LYONNARD, G. NICOLESCU, S. YOO, A.A. JERRAYA, M. DIAZ-NAVA, "HW/SW Interfaces Design of a VDSL Modem using Automatic Refinement of a Virtual Architecture Specification into a Multiprocessor SoC: a Case Study", DATE 2002, Paris, France, March 2002.
38
VCI (Virtual Component Interface)
Interface Standards Interface Standards VCI (Virtual Component Interface) Standardized by OCB DWG of VSIA (Virtual Socket Interface Alliance) OCP (Open Core Protocol) Proposed by Sonics Inc. A functional superset of VCI, adding configurable sideband control signaling and test harness signals OCP-IP has stewardship of VCI AMBA AXI (AMBA Advanced eXtensible Interface) Proposed by ARM Inc. Backward-compatible with existing AHB and APB interface
39
Interface Standards VCI Virtual Component Interface Standard version 2 (OCB2 2.0), On-Chip Bus Development Working Group, April 2001, Virtual Component Intellectual Property (IP) Standard by On Chip Bus Development Working Group (OCB DWG) of VSIA Goal Maximum portability Does not require modification of VCs Describes three different interface standards Peripheral VCI Basic VCI Advanced VCI
40
Interface Standards OCP Open Core Protocol specification version 2.0, 2003, Specification Point-to-Point synchronous interface Bus independence Pipelined operation Separate requests from responses for pipelining Burst operation Threads for concurrency and out-of-order processing Interrupts, errors, and other sideband signaling
41
OCP instances and wrapped bus
Interface Standards OCP instances and wrapped bus
42
Interface Standards OCP signals
43
Interface Standards AMBA AXI AMBA AXI protocol specification, ARM Inc. 2003, Backward-compatible with existing AHB and APB interfaces Unidirectional channel architecture for pipelined interconnect Separate address/control and data phase Support unaligned data transfers Support out-of-order transaction completion Support low power operation Single interface definition for interfaces between A master and the interconnect A slave and the interconnect A master and a slave
44
Interface Standards Channel architecture read write
45
Unaligned data transfer
Interface Standards Unaligned data transfer
46
Burst Master provides the address of the first byte only
Interface Standards Burst Master provides the address of the first byte only Overlapping read burst example
47
Out-of-order transaction completion
Interface Standards Out-of-order transaction completion All transactions with a given ID must be ordered No restriction on the ordering of transactions with different IDs IDs:
48
Interface Standards Low power operation System clock controller requests peripheral to enter low power state Peripheral performs power-down function and sends acknowledge
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.