Download presentation
Presentation is loading. Please wait.
1
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
2
Typical Switch Architectures IQ – Input Queued Linecards Switch Fabric CICQ – Combined Input and Crosspoint Queued Linecards Assumes Instantaneous Closed Loop
3
Single-Rack Router Instantaneous closed loop → works in a single rack Problem: multi-rack routers Linecards Switch Fabric
4
Current Router Architectures [Source: N. McKeown] Is the closed loop still instantaneous?
5
Time Trends ns
6
Hiding Propagation Delays Traditional solutions: Increase time-slot poor switch performance Hide propagation delays using buffers impractical amount of buffering Proposed solution: closed loop → open loop Performance degradation vs. instantaneous closed loop
7
Outline CQ: Open-loop switch architecture Performance Evaluation Analytical results Simulations CQ performance degradation is not significant
8
Proposed Architecture: The Crosspoint-Queued (CQ) Switch No queues in the linecards Buffering only inside the fabric Independent output schedulers Drops with full buffers Switch Core Linecards 10s of meters
9
CQ Properties Open loop No communication overhead No linecard queues No linecard queue management “Router on a chip” Buffering and switch fabric on same chip
10
Why not 10 years ago? No need: single rack No technology: SRAM density Moore’s law: density doubling every 2.5 years Aggressive 128x128 CQ switch: 4 cells of 64 bytes per crosspoint → 64 cells today Conservative buffer requirements TCP Stanford model with smaller buffer needs [Appenzeller, Keslassy and McKeown ’04]
11
Outline CQ: Our open-loop switch architecture Performance Evaluation Analytical results Simulations
12
100% Throughput as B→ Throughput bounds: OQ(2B-1) ≤ CQ(B)≤ OQ(NB) Buffer size B, LQF scheduling algorithm 100% Throughput ∞
13
Uniform Traffic, B=1 Uniform traffic model: At each time-slot, at each of the N inputs: Bernoulli IID packet arrivals with probability Each packet is destined for one of the N outputs uniformly at random Theorem: Under uniform traffic and B=1, the performance of the switch is independent of the specific work-conserving scheduling algorithm Intuition: Symmetry
14
Uniform Traffic, B=1 Theorem: The throughput and waiting time of a CQ switch, B=1 is: Proof: Based on Z-transform q=1- /N Goes to 100% as N goes to infinity
15
Models for larger buffers Approximate Performance Analysis Model for exhaustive round-robin scheduling Based on modifications to polling system with zero switch-over times Model for random scheduling algorithm Show 100% throughput as N→∞
16
Trace-Driven Simulation Buffers of size 64 suffice to ensure 99% throughput for N=32. 32x32 CQ switch with different buffer sizes (in units of 64-byte packets)
17
Conclusions CQ is open loop → allows multi-rack configuration CQ provides easy scheduling CQ is feasible to implement in a single chip CQ shows good performance in simulations
18
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.