Presentation is loading. Please wait.

Presentation is loading. Please wait.

Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.

Similar presentations


Presentation on theme: "Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C."— Presentation transcript:

1 Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C

2 Outline Abstract Introduction Motivation Case Study AdNoC Concept – Definitions Algorithm Hardware Implementation Conclusion 2

3 Abstract During run-time varying workloads and/or constraints in embedded systems require run-time adaptivity to provide a high degree of efficiency during any operation mode/scenario. We are presenting the first approach of an adaptive on-chip communication scheme. It provides an adaptive routing/path allocation algorithm to meet a required level of Quality of Services (QoS) which is guaranteed bandwidth. 3

4 Introduction (1/2) A run-time adaptive network on chip that adapts the underlying interconnection infrastructure on-demand in response to changing communication requirements imposed by an application. To provide on-demand interconnections, we present a novel adaptive routing/path allocation algorithm that meets QoS requirements (bandwidth). 4

5 Introduction (2/2) The scheme makes decisions locally at each router depending on the available bandwidth in each direction to the neighboring router. Dynamic connections are realized by re-assigning a certain number of buffer blocks to different output ports of a router on-demand. It also increases the resource utilization, especially buffer utilization, through on-demand buffer block configuration. 5

6 Motivation Case Study (1/4) We motivate the need of an adaptive NoC by means of a very simple scenario. We study an MPEG decoder [1] and an Image Processing Line (IPL) [18] application. The task graphs are shown in Figures 1a and 1b. Assume at time t 0 the NoC is running the MPEG video decoder (Fig. 1c). At time t 1, the IPL needs to be executed then it is also mapped besides the MPEG onto the processing elements. Once a mapping is performed, the routers attempt to set up meaningful routes (Fig. 1d). 6

7 Fig. 1. Motivation to use an adaptive communication architecture 7

8 Motivation Case Study (2/4) In this example, the Gauss task Gauss1 first establishes a route to its neighboring filter task Filter1. It then conducts QNoC a deterministic XY routing algorithm for Filter2. However, that will fail due to the limited bandwidth availability. Consequently, it forces the router at Gauss1 to try another route which is successful (Fig. 1e). 8

9 Motivation Case Study (3/4) With the routes, the routers supply a corresponding buffer block, allocating the buffer to output ports on-demand. The second Gauss task Gauss2 attempts to conduct the same action. However, it fails at finding a route to Filter1 and Filter2. Thus it becomes necessary to invoke a re-mapping (Fig. 1f). 9

10 Motivation Case Study (4/4) Routing needs to be implemented through an algorithm which can identify feasible routes. After path selection, appropriate buffer blocks need to be employed on-demand to that path. If path and buffer blocks are not available the mapping function sends appropriate feedback to the upper layer. Therefore, in a dynamic run-time application scenario an adaptive on-chip communication infrastructure which can build connections on-demand to provide QoS. 10

11 AdNoC Concept The AdNoC architecture is proposed to support QoS- supported on-chip communication for a network exposed to varying system constraints. As most NoCs, it utilizes packet-based communication. The architecture is pipelined and deploys wormhole routing because of its low latency in practice and small buffer space requirements. 11

12 Definitions (1/4) Definition 1: An application task graph (TG) is a directed graph G k = (T, F), – T is the set of all tasks t i used by an application – f i, j ∈ F represents the connection from task t i to t j Definition 2: Physical Network (PN) is a directed graph P = (N, V, B t, r). – N is a set of tiles n i – v i, j ∈ V represent an edge, the physical channel between n i and n j – Each tile has a current buffer configuration at time t, b i,t ∈ B t represents the state of a buffer assignment to individual output ports. – A routing function r which determines the paths taken. 12

13 Definitions (2/4) Definition 3: Logical Network (LN) at time t is a directed graph L t = (M, W) – M is a set of task groups m i – w i, j ∈ W represents the set of connections between two task groups m i and m j Definition 4: The Task Mapping Function is a function l t : T’ ⊆ T → L t which maps subset T’ of each task graph T to the logical network LN. 13

14 Definitions (3/4) Definition 5: The Network Mapping Function is a function p t : L t → S ⊆ P which maps a logical network onto a subset of the physical network. Definition 6: A Routing Function r : N × N → V, r : (n i, n k ) → v i,j returns a path v i,j away from the current PE (n i ) given the input port for each transaction and the destination n k. 14

15 Definitions (4/4) Definition 7: – The Buffer Configuration b i,t is the current buffer configuration of tile n i ∈ N. – A Virtual Channel (VC) is a unidirectional logical or virtual connection between the tile n i and n j – Each VC is realized by an independently managed pair of message buffers referred to as the Virtual Channel Buffer (VCB). 15

16 Definitions (4/4) Definition 8: The System Monitor M is an infrastructure which is used to collect, aggregate, and process system statistics. Definition 9: Our Adaptive Network on Chip AdNoC is defined as the tuple AdNoC = (P, M, L t, G i, p t, l t, r) with the parameters as given above. 16

17 Algorithm (1/11) To provide bandwidth guarantee in an adaptive NoC, the underlying communication infrastructure needs to provide an adaptive path allocation strategy. Therefore, finding a path/routing for a given logical network and physical mapping of the application is a major challenge. The run-time path allocation algorithm is given in Alg. 1. 17

18 Algorithm (2/11) 18

19 Algorithm (3/11) For a requesting transaction, the path is checked in every possible direction and the VCB is assigned accordingly on- demand. The weighted XY algorithm wXY presented in Alg. 2 assigns each output port a weight based on available bandwidth and dx or dy between the current and the destination nodes. This ideally gives the packet a maximum number of sensible routing choices along its path. The weight is also proportional to the available bandwidth. 19

20 Algorithm (4/5) 20

21 Algorithm (5/11) The wXY route allocation strategy is described as follows: given is the tuple ρ = {N, E, S, W, P}. Each i ∈ ρ has a weight w i and available bandwidth b i with b i ≤ b max, b max being the maximum line bandwidth. 21

22 Algorithm (6/11) The current router coordinates are x, y. Each packet p has destination coordinates x d, y d and a required bandwidth b p. The weights are assigned as follows: 22

23 Algorithm (7/11) The route r chosen is then: The router distribute the VCBs to any route as needed by assigning it to the according output port. 23

24 Algorithm (8/11) Our scheme to assign buffers on-demand (at runtime) is given in Alg. 3. The benefits of such on-demand assignment is evident: buffers are only allocated when needed meaning that virtual channels can be reused by different ports. 24

25 Algorithm (9/11) Fig. 3 shows an exemplary scenario to showcase the run-time behavior using different transactions in one router. 25

26 Algorithm (10/11) t0: All four directions are occupied with four different transactions; buffers are also assigned. t1: Transaction T5 requests a path and weights are calculated till t δ taking 4 hardware cycles. A buffer is also assigned to the calculated direction before t δ. t2: Transaction T1, T2, and T4 free their corresponding channels and assigned buffers. 26

27 Algorithm (11/11) t3: Four new transactions T1, T2, T4, and T6 request processing and they are granted resources. t4: Transactions T7 requests a path and buffer but due to unavailable buffer resources, the transaction cannot be granted. So, the requesting transaction has to wait or inform the upper layer through the system monitor. 27

28 Hardware Implementation Our hardware platform for the AdNoC is illustrated in Fig. 4. It consists of mainly two parts: the run-time path allocation the on-demand VCB assignment part. The path allocation part either decides based on the lookup table or by calculating the type of the flit. 28

29 29

30 Conclusion We have introduced the first approach of an adaptive on-chip communication architecture. It provides an adaptive path allocation algorithm to meet varying bandwidth guarantees. Run-time connections are realized by re-assigning a number of buffer blocks on-demand. Our buffer allocation scheme increases the buffer utilization and decreases the overall buffer use. 30


Download ppt "Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C."

Similar presentations


Ads by Google