Download presentation
Presentation is loading. Please wait.
1
Module R R RRR R RRRRR RR R R R R Technion – Israel Institute of Technology The Era of Many-Module SoC: Revisiting the NoC Mapping Problem Isask’har (Zigi) Walter, Israel Cidon, Avinoam Kolodny, Daniel Sigalov December, 2009
2
SoC Revolution PE1 R PE2PE3 RR PE4 R PE5PE6 RR PE1PE2PE3 PE4PE5PE6 2 Bus-based system NoC-based system
3
SoC Evolution RR R R R R R R R 3
4
Processor Evolution CPU Cache Single Core CPU1 Cache Dual Core CPU2 Cache CPU1 Cache Quad Core CPU3 Cache CPU2 Cache CPU4 Cache 4
5
How would such chips be like? Most likely Power still important Highly parallel IP reuse Ease of design and verification The Era of Many-Module SoC High Certainty Totally unknown 5 Large number of modules NoC Interconnect Applications
6
Special purpose cores replace general purpose processors Power considerations Future SoCs - Observation#1 General Purpose CPU Pre. Proc. DSP CPU Task 1 Task 2 MEM Task 3 Task 4 MEM Memory Task 1 Task 2 MEM Task 3 Task 4 MEM Memory Task 5 Task 5 GPU 6 Processing pipes are getting longer
7
Future SoCs - Observation#2 ? Large diversity All modules are unique Highly regular Classes of Replicated cores standard modules (DSP, HW accelerators, Cache banks, etc.) 7
8
Increased use of specialized cores Pipes are getting longer Replication of processing elements How is the design flow affected? This work – mapping of the NoC The Era of Many-Module SoC Observation#1 Observation#2 8
9
The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 9
10
Given Traffic pattern(s) a set (or sets) of pair-wise bandwidth requirements and timing constraints Routing Topology Goal Find efficient mapping of cores to tiles NoC Mapping PE4 PE1PE2 PE5 PE3 PE6 PE7PE8PE9PE4 PE1 PE2PE5 PE3 PE6PE7 PE8 PE9 PE4PE1 PE2 PE5 PE3PE6 PE7PE8 PE9 PE4 PE1 PE2 PE5 PE3 PE6 PE7 PE8 PE9 PE4 PE1PE2 PE5 PE3 PE6 PE7 PE8PE9 10
11
An important design step Mapping affects power and performance! A difficult problem! Often heuristic algorithms are used Common optimization goals Minimize (dynamic) power Minimize power + maximize performance Minimize power subject to performance constraints Mapping Optimization 11
12
Typical modeling Power and latency proportional to distance Cost function: Modeling 12
13
Calculating Mapping Cost PE1PE2 PE4PE5 PE3 PE6 100 30 Mapping π 1 Mapping π 2 13
14
Motivation - Example #1 Optimal mapping ( π 1 ): PE1PE2PE3PE4 MEM 1 MEM 2 PE1MEM2 PE2PE3 PE4 MEM1 111 1 1 1 1 14
15
Optimal mapping ( π 2 ): Let the mapping algorithm assign the flows! Motivation - Example #1 (cont.) PE1PE2PE3PE4 2*MEM PE1 MEM1 PE2 PE3PE4 MEM2PE1MEM1 PE2PE3 PE4 MEM2 Cost( π 1 )=9Cost( π 2 )=7 15
16
Motivation - Example #1 (cont.) PE1PE2PE3PE4 MEM 1 MEM 2 111 1 1 1 1 PE1 MEM1 PE2 PE3PE4 MEM2 Cost( π 2 )=7 16 The mapping algorithm should be aware of replicated modules!
17
Pair-wise point-to-point requirements For example, in a 4-module system: Classic Performance Constraints PE1 2 PE2 1 PE3 11 PE4 PE3PE2PE1 17
18
PE1PE2PE3 PE4 Motivation - Example #2 Timing Requirement PEsStream ID 4PE1 PE2 PE3 PE4Stream 1 1PE2 PE4Stream 2 Stream 1 Stream 2 18
19
Example #2 – Pair-wise req. No feasible mapping! PE1PE2PE3 PE4 PE2. PE4. PE1 PE3PE2. PE4. PE1PE3 PE2. PE4 PE1PE3 2PE2 1PE3. 11PE4 PE3PE2PE1 19 Req=2Req=1
20
PE1PE2PE3 PE4 Application-Level Requirements RequirementPEsStream ID 4PE1 PE2 PE3 PE4Stream 1 1PE2 PE4Stream 2 PE2 PE4 PE1 PE3 Stream 1 Stream 2 A feasible mapping does exist! 20 Req=1Req=2 Req=1 It’s better to work with the application level requirements
21
Find efficient mappings by extending the formulation of the mapping problem Adding degrees of freedom Degree of freedom #1 Leverage existence of replicated modules Degree of freedom #2 Replace p2p constraints with end-to-end, application-level requirements This Work 21
22
Modifying the Formulation (1) 22 Time Req. BWFlow 3100PE 1 DSP 3 12200PE 2 DSP 4 15100PE 2 SRAM 1 5100PE 3 SRAM 2 ……… Time Req. BWFlow 3100PE 1 12200PE 2 15100PE 2 5100PE 3 ……… Leverage existence of replicated modules Allow the mapping algorithm to allocate flows to the best replicated module
23
∞ ∞2 443 ∞313 ∞4773 ∞3∞24∞ 4∞723∞5 ∞∞2∞156∞ 773∞∞53∞1 ∞3∞3212∞31 Modifying the Formulation (2) 23 E2E Req. Stream’s PEsStream ID 23PE 1 PE 3 PE 9 PE 4 PE 10 1 12PE 5 PE 2 PE 3 PE 8 PE 7 PE 6 PE 10 2 15PE 5 PE 3 PE 9 3 20PE 7 PE 8 PE 2 PE 3 4 2PE 1 PE 2 5 ……… In this paper, for synthetic task graphs Did so for a real application too P2P timing req. E2E timing req. Replace p2p constraints with end-to-end, application-level requirements
24
The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 24
25
Modern optimization heuristic Good at combinatorial optimization problems Akin to evolutionary algorithms Generation of new solutions is based on sampling and estimation Inherently a global search method Reduced risk of getting trapped in a local minimum Cross Entropy Optimization 25
26
Given an initial parameter vector v=v 0, sample a random population of K solutions x 1,x 2,…,x k from the distribution given by f(x;v). Evaluate the costs S(xi),i=1,…,K. Using the ρK (0<ρ<1) elite (lowest cost) samples, obtain a new density function f(x;v) by calculating a new vector v via Maximum Likelihood (ML) estimation. Repeat steps 1-3 with the new vector v unless maximum number of iterations is reached or no improvement is obtained for a predefined number of iterations. Cross Entropy Optimization 1. Generate 10 random mappings: π 1, π 2, …, π 10 2. Find 3 lowest cost mappings: π 2, π 5, π 7 3. Examine those 3 best mappings: A. For each tile, calculate the probability core PE i is mapped to that tile B. Update probabilities accordingly For example: 26
27
Prob(TileA PE 1 )= Prob(TileA PE 2 )= Prob(TileA PE 3 )=Prob(TileA PE 4 )=0.25 Prob(TileB PE 1 )= Prob(TileB PE 2 )= Prob(TileB PE 3 )=Prob(TileB PE 4 )=0.25 Prob(TileC PE 1 )= Prob(TileC PE 2 )= Prob(TileC PE 3 )=Prob(TileC PE 4 )=0.25 Prob(TileD PE 1 )= Prob(TileD PE 2 )= Prob(TileD PE 3 )=Prob(TileD PE 4 )=0.25 CE Example PE3 PE1PE2 PE4 Tile C Tile A Tile D Tile B π1π1 PE3 PE1 PE2 PE4 π2π2 PE3 PE1PE2 PE4 π3π3 PE3 PE1PE2 PE4 π4π4 PE3 PE1PE2 PE4 π5π5 PE3 PE1 PE2 PE4 π6π6 PE3PE1 PE2PE4 π7π7 PE3 PE1 PE2 PE4 π8π8 PE3 PE1 PE2 PE4 π9π9 PE3 PE1PE2 PE4 π 10 27
28
Prob(TileA PE 1 )=1 Updating Probabilities PE1PE2 PE4PE3 PE1 PE2 PE4PE1PE2 PE4PE3 Prob(TileB PE2)=2/3 Prob(TileB PE4)=1/3 Prob(TileD PE2)=1/3 Prob(TileD PE3)=1/3 Prob(TileD PE4)=1/3 Prob(TileC PE3)=2/3 Prob(TileC PE4)=1/3 π2π2 π5π5 π3π3 Following iteration uses these updates probabilities Gradually, probabilities converge to 0/1 Tile C Tile A Tile D Tile B 28
29
The Era of Many Module SoC Revisiting the Mapping Problem Cross-Entropy Optimization Evaluation Outline 29
30
Scenario 6x6 mesh NoC Synthetic, randomized SoC Task graphs (and task-to-core mapping) Varying number of replicated modules Varying timing constraints (Real application in DATE10 paper) Compare with best cost of classic mapping Averaging multiple runs Evaluation 30
31
“Class”: a group of identical PEs Total number of replicated cores= {Number of classes}*{class size} Accounting for Replication 31 Cost Reduction [%] 1020 Total number of Replicated Modules [%] 304050
32
SoCs with a pipeline data path and background P2P traffic Varying pipeline slack Different amounts of background constraints Application-Level Requirements 32 Cost Reduction [%] Pipeline Slack
33
We are going into the era of “Many module SoC” Extend the mapping to account for Classes of replicated modules Application-level requirements Meaningful power savings But mapping is an example Routing? Task assignment? Link design? Topology selection? Conclusions and Future Work 33
34
Thank you! Questions? zigi@tx.technion.ac.il The Era of Many-Module SoC QNoC Research Group Group Research QNoC 34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.