Statistical Approach to NoC Design Itamar Cohen, Ori Rottenstreich and Isaac Keslassy Technion (Israel)

Slides:



Advertisements
Similar presentations
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Advertisements

Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.
A Novel 3D Layer-Multiplexed On-Chip Network
Quantifying Overprovisioning vs. Class-of-Service: Informing the Net Neutrality Debate Murat Yuksel (University of Nevada – Reno) K.
The 9th Israel Networking Day 2014 Scaling Multi-Core Network Processors Without the Reordering Bottleneck Alex Shpiner (Technion/Mellanox) Isaac Keslassy.
CWI PNA2, Reading Seminar, Presented by Yoni Nazarathy EURANDOM and the Dept. of Mechanical Engineering, TU/e Eindhoven September 17, 2009 An Assortment.
Weighted Random Oblivious Routing on Torus Networks Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego.
1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.
Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California,
1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown
The Cache Location Problem IEEE/ACM Transactions on Networking, Vol. 8, No. 5, October 2000 P. Krishnan, Danny Raz, Member, IEEE, and Yuval Shavitt, Member,
Distributed Algorithms for Secure Multipath Routing
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
High Performance All-Optical Networks with Small Buffers Yashar Ganjali High Performance Networking Group Stanford University
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
1 A General Introduction to Tomography & Link Delay Inference with EM Algorithm Presented by Joe, Wenjie Jiang 21/02/2004.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
A Load-Balanced Switch with an Arbitrary Number of Linecards Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
Modeling TCP in Small-Buffer Networks
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty of Electrical.
1 Evgeny Bolotin – ClubNet Nov 2003 Network on Chip (NoC) Evgeny Bolotin Supervisors: Israel Cidon, Ran Ginosar and Avinoam Kolodny ClubNet - November.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Statistical Approach to NoC Design Itamar Cohen, Ori Rottenstreich and Isaac Keslassy Technion (Israel)
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Optimal Load-Balancing Isaac Keslassy (Technion, Israel), Cheng-Shang Chang (National Tsing Hua University, Taiwan), Nick McKeown (Stanford University,
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
Load Balanced Birkhoff-von Neumann Switches
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
On the Construction of Data Aggregation Tree with Minimum Energy Cost in Wireless Sensor Networks: NP-Completeness and Approximation Algorithms National.
Shannon Lab 1AT&T – Research Traffic Engineering with Estimated Traffic Matrices Matthew Roughan Mikkel Thorup
Optimal serverless networks attacks, complexity and some approximate algorithms Carlos Aguirre Maeso Escuela Politécnica Superior Universidad Autónoma.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
Scalable Multi-Class Traffic Management in Data Center Backbone Networks Amitabha Ghosh (UtopiaCompression) Sangtae Ha (Princeton) Edward Crabbe (Google)
EE 685 presentation Utility-Optimal Random-Access Control By Jang-Won Lee, Mung Chiang and A. Robert Calderbank.
50 th Annual Allerton Conference, 2012 On the Capacity of Bufferless Networks-on-Chip Alex Shpiner, Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy.
7 sum of RVs. 7-1: variance of Z Find the variance of Z = X+Y by using Var(X), Var(Y), and Cov(X,Y)
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
6 December On Selfish Routing in Internet-like Environments paper by Lili Qiu, Yang Richard Yang, Yin Zhang, Scott Shenker presentation by Ed Spitznagel.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
1 Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound Guang Sun 1,2, Chia-Wei Chang 1, Bill Lin 1, Lieguang Zeng 2,
1 EL736 Communications Networks II: Design and Algorithms Class7: Location and Topological Design Yong Liu 10/24/2007.
Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.
1 Slides by Yong Liu 1, Deep Medhi 2, and Michał Pióro 3 1 Polytechnic University, New York, USA 2 University of Missouri-Kansas City, USA 3 Warsaw University.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Scaling Properties of the Internet Graph Aditya Akella With Shuchi Chawla, Arvind Kannan and Srinivasan Seshan PODC 2003.
The Variable-Increment Counting Bloom Filter
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Tapping Into The Unutilized Router Processing Power
ISP and Egress Path Selection for Multihomed Networks
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Coded Caching in Information-Centric Networks
Backbone Traffic Engineering
Fundamental Sampling Distributions and Data Descriptions
Presentation transcript:

Statistical Approach to NoC Design Itamar Cohen, Ori Rottenstreich and Isaac Keslassy Technion (Israel)

Problem The traffic matrix in NoCs is often-changing and unpredictable  makes NoCs hard to design The traffic matrix in NoCs is often-changing and unpredictable  makes NoCs hard to design

Road Capacities  Goal: Design road capacities between a city and its suburbs  Let’s model the traffic matrices… City center

Road Capacities  Morning peak: most traffic towards city City center

Road Capacities  Morning peak: most traffic towards city  Afternoon peak: most traffic leaving city City center

Solution (1): Average-Case  Solution (1): plan for average-case  i.e. allocate capacity of ~5 for each link.  λ < μ  Problem: traffic jam during many hours, every day. City center

Solution (2): Worst-Case Solution (2): Worst-Case  Solution (2): plan for worst-case  i.e. allocate capacity of 10 for each link. City center 10

Holidays  Problem: traffic burst in the holidays  excessive resources  Solution (3): statistical approach  Enough capacity for 99% of the time  Allow for occasional congestion City center

Back to the NoC world  Similar problems in NoC design process  City  Shared cache  Suburbs  Cores  Many possible traffic matrices: writing, reading, etc. Processor Cache Core

Motivation Problem: Traffic matrix in NoCs is often- changing and unpredictable  makes NoCs hard to design  “Easy” solution: Worst-Case Guarantee  E.g., no-congestion guarantee for 100% of the time  Objective: Statistical Guarantee  E.g., no-congestion guarantee for 99.99% of the time Better when no congestion is allowed Better when power/area are most important Tradeoff: more resources vs. better guarantee

Statistical Approach to NoC Design Given:  Traffic matrix distribution  Topology  Routing  Link capacities Compute congestion guarantee  “95% of traffic matrices will receive enough capacity” Optimal Capacity Allocation 1 2 3

CMP (Chip Multi-Processor) How can we model the future traffic matrix distribution?

 Measure on real large CMP networks?  Problems  Their future applications are unknown  Their future traffic types are unknown: Between processors? Cache accesses? Control traffic?  (Chicken-and-egg problem: traffic might depend on what architecture offers) Traffic Matrix Distribution

 General model: any core can communicate with any core - but with a bounded core input/output rate.  Example: each core may send/receive up to 1 Gbps.  Used in CMPs [Murali et al. ’07] – but also backbone networks [Dukkipati et al. ’05], interconnection networks [Towles and Dally ’02], VPN networks [Duffield et al. ’99], routers [McKeown et al. ‘96]  Reminder: just one model among many… Core 1 sends ≤ 1

Statistical Approach to NoC Design Given:  Traffic matrix distribution  Topology  Routing  Link capacities Compute congestion guarantee  “95% of traffic matrices will receive enough capacity” Optimal Capacity Allocation 1 2 3

Compute Congestion Guarantee Show that “95% of all traffic matrices will receive enough capacity on link l ”  1. Compute Traffic-load distribution Plot (or T-Plot) for link l 2. Show that the load on link l is less than its capacity for 95% of traffic matrices.

Link T-Plot 0.5 PDF Link load 110 morningafternoon 0.5 PDF Link load 110 morningafternoon 0.99 PDF 0 50 workday holiday 0.01 City center

Global T-Plot 0.5 PDF Link load 110 morningafternoon 0.5 PDF Link load 110 morningafternoon 0.99 PDF 0 50 workday holiday 0.01  We want to provide statistical guarantees on the whole network  i.e. on all links  Global T-Plot: for each traffic matrix T, measure maximum load among all links

Global T-Plot 0.5 PDF Link load 110 morningafternoon 0.5 PDF Link load 110 morningafternoon 0.99 PDF 0 50 workday holiday PDF Maximum link load

Link T-Plots in NoCs Traffic Matrix Set S l T  Given:  Traffic matrices T  S  3x4 mesh topology  XY routing  Link l  Find T-Plot on l Link Load PDF

21T-Plot Gaussian?  Close-up view: Worst-case traffic load = 2 [Towles and Dally, ’02; Towles, Dally and Boyd, ’03] 99.99% of traffic matrices bring load under % gain Link Load PDF

Are T-Plots Gaussian?  Some T-Plots may be well approximated as Gaussian.  Intuition (Central Limit Theorem): when N grows, the sum of N i.i.d. (independent and identically distributed) random variables converges to a Gaussian distribution.

Gaussian T-Plot Example  Example: n x n mesh with XY routing.  Assume i.i.d traffic from i to j  Theorem: As n grows, T-Plot for l converges to Gaussian. i j l Link Load PDF

Computing the T-Plot Traffic Matrix Set T  Theorem: for an arbitrary graph and routing, computing the T-Plot is #P-complete.  #P-complete problems are at least as hard as NP-complete problems.  NP: “Is there a solution?”  #P: “How many solutions?” Link Load PDF

Computing the T-Plot  Idea: use Monte Carlo simulations to approximate T-Plots  Plot many points, and count number of points to approximate density. Traffic Matrix Set T Link Load PDF

Link T-Plot Vs Global T-Plot l Gaussian? Higher average Link T-Plot on l Global T-Plot  Global T-Plot: for each traffic matrix T, measure maximum load among all links Model? Assume independent links… Link Load vs. Global Load PDF

Link-Independent Model: Simple Example  Assume:  If l 1 and l 2 independent: l1l1 l2l2 l1:l1: l2:l2: Global Load:

Independent-Gaussian Model 1.2 Global Load CDF

Statistical Approach to NoC Design Given:  Traffic matrix distribution  Topology  Routing  Link capacities Compute congestion guarantee  “95% of traffic matrices will receive enough capacity” Optimal Capacity Allocation 1 2 3

Capacity Allocation Intuition  Idea: given a total capacity, distribute it so that the overflow probability on each link is the same. PDF Link Load Link 1 Link 2

Capacity Allocation Intuition  Idea: given a total capacity, distribute it so that the overflow probability on each link is the same.  Theorem: if Gaussian-independent model holds, with same standard-deviation σ on all links, then this scheme is optimal.

Capacity Allocation Performances  Total needed capacity for various Capacity Allocation (CA) targets Worst-case capacity saving with different link capacities Statistical gain with 90% guarantee Worst-case approach 41% Statistical approach 37%

Average Flow Delay Model  Assume M/M/1 delay model  Average flow delay distribution over all traffic matrices, for two different Capacity Allocation (CA) schemes:

Statistical Approach to NoC Design Given:  Traffic matrix distribution  Topology  Routing  Link capacities Compute congestion guarantee  “95% of traffic matrices will receive enough capacity” Optimal Capacity Allocation 1 2 3

Comparing routing algorithms  DOR (XY) and O1TURN (½ XY, ½ YX) on 3x4 mesh: DOR: 95% cutoff O1TURN: 95% cutoff Global Load PDF

NUCA network  More complex CMP architectures show similar results  NUCA (Non-Uniform Cache Architecture) with sharing degree 4.  Traffic model: each core (cache) may only send/receive traffic to/from caches (cores) in its sub-network. Processors Caches Processors

NUCA network – Total capacity  Total capacity required for various Capacity Allocation (CA) targets. Gain of statistical approach 48%

Summary  Statistical approach to NoC design  Deals with several traffic matrices  Can optimize capacity allocation and routing  Can be applied to any  Traffic matrix distribution  Topology  Oblivious routing  Link capacities

Thank you.