Download presentation
Presentation is loading. Please wait.
1
Architecture for Network Hub in 2011 David Chinnery Ben Horowitz
2
Internet Model Network time-of-flight latency – Unavoidable End point latency – Limited by cheap solution for users Latency of internet nodes (hubs, gateways) – Can provide differentiated services High priority packets Other packets – If bandwidth insufficient, use multiple chips send interval of wavelengths to each
3
Internet Visualization San Fransisco, USA Perth, Australia ? hubs 2 gateways 2 end users Worst case packet journey: Halfway around the world 0.200 s tolerable latency for video conferencing
4
Maximum Nodes Packet Travels Average number of nodes traveled = log(number of nodes in internet) – Journey of 15.7 nodes average in 1996 Estimate one node/person in 2011 – Journey of 22.7 nodes average in 2011 39 nodes worst case in 1996 (1 in 1000) Scaling by ratio of averages, gives 56.3 nodes worst case in 2011 (1 in 1000) 3 4 2 1 56 5455
5
Time of Flight Optic fiber delay 5 us/km Restore signal with repeaters every 100 km – Repeater delay 0.92 us [1999] Worst case journey length ~20,100 km 20,100 × 5 + 201 × 0.92 = 100,700 us Time of flight delay of 0.101 s 0.92 us 100 km 500 us
6
Internet Visualization San Fransisco, USA Perth, Australia ?52 hubs ?2 gateways ?2 end users Worst case packet journey: 0.101 sHalfway around the world 0.200 s tolerable latency for video conferencing
7
End User Model Worst case scenario Processing intensive application – MPEG4 encoding for HDTV2 Limited silicon area, as must be low cost – Sufficient for 1920×1080 HDTV2 at 30Hz Processing latency 1/30 s End user to end user Processing latency doubled 0.033 s
8
Internet Visualization San Fransisco, USA Perth, Australia ?52 hubs ?2 gateways 0.067 s2 end users Worst case packet journey: 0.101 sHalfway around the world 0.200 s tolerable latency for video conferencing 0.033 s
9
Node Hardware Model Processing cores are Intel IXP1200 routers Conservative ASIC frequency estimate – IXP1200 speed of 166MHz in 0.28 um – Linearly scale to 0.18 um speed ×1.56 – Speed ×3.00 from 0.18 um to 0.05 um [ITRS] IXP1200 speed of 775MHz in 2011 Assume across chip speed of 775 MHz – With custom macros at 10 GHz in 2011 ITRS estimate, across chip speed of 1.5 GHz
10
Node Router Hardware For gateways or hubs – 2011 ASIC: 8 cm 2, 811 million transistors/cm 2 6500 million transistors 6.5 million transistors for IXP1200 – If 2/3 of chip is memory and wires Up to 333 IXP1200s on same chip estimate 300 IXP1200s
11
Packet Processing at Nodes Maximum onto chip bandwidth – 927 pins chip-to-package in 2011 359 Gbit/s, 695 Gbit/s Scaling IXP1200 to 2011, can process 11 million (21 million) packets/second – Can process 3.3 billion packets/s (6.3 billion) Smallest IP packet is 20 bytes (header size) – Maximum required processing of 2.2 billion packets (4.3 billion) Spare processing power available
12
Bus and I/O Overview IXP1,15 Q1 in Q1 out IXP1,1 IXP1,2 IXP2,15 Q2 in Q2 out IXP2,1 IXP2,2 IXP20,15 Q20 in Q20 out IXP20,1 IXP20,2 Q out control IXP19,15 Q19 in Q19 out IXP19,1 IXP19,2 Q in control 32 bit I/O bus 128 bit control buses 64 bit control buses 48 bit header detection 448 bit output bus 448 bit input bus
13
Header Detection Hardware Custom header detection macro runs at 13 times chip speed, 10.075 GHz – 12 cycles for comparison, 1 to send positions Forty 48-bit comparators (80 at 1.5 GHz) – Up to 6 bytes detection (Ethernet destination) – Store last 47 bits from previous 448 bit word 48 bit comparator t-1 47 bitst 448 bits 1 bit shifter
14
48-bit Comparators Set mask for comparison to 0, 1 or X (don’t care) Custom comparison circuit – Signals and their negation are available from registers – 10 transistors to implement 7 bit counter with each to set header position About 30,000 transistors total Possible 3 packets/448 bits 31 bits of bus to send positions input i mask i input i care i
15
Simulator Other simulators cumbersome for our task Wrote event driven simulator in Java – Worst case simulations: Can easily process at maximum bandwidth with no additional latency
16
Worst Case Scenario Results Worst case scenario – Minimum packet size is 20 bytes – 448 bit input bus 3 packets or less per cycle – IXP1200 time to calculate next destination 75 cycles minimum, 345 cycles average 600 cycles maximum At most 7 packets processed simultaneously on IXP1200 – IXP1200 has 6 micro-engines load handled easily
17
Conclusions from Simulation Latency of 605 cycles 0.78 us, 0.40 us Largest possible packet that could be sent after started processing is 65,536 bytes Additional 1170 cycles latency 1.51 us, 0.78 us Transceiver delay 0.05 us [1999] Additional 0.10 us/hop Total latency/hop of 2.4 us, 1.3 us 0.0000024 s0.0000024 s/hub
18
Internet Visualization San Fransisco, USA Perth, Australia 0.033 s 0.0000024 s 0.0000024 s/hub < 0.001 s 2 gateways 0.067 s2 end users Worst case packet journey: 0.101 sHalfway around the world 0.169 s tolerable latency for video conferencing < 0.001 s52 hubs (probability of 1 in 1000)
19
Conclusions Limiting factor is maximum bandwidth Average case simulations done Can easily process at maximum bandwidth with 40 IXP1200 processors (mostly longer packets) Reduce processing power to levels sufficient for bandwidth and model – Less IXP1200s on chip – Smaller chip size reduces cost – Reduced processing power increases congestion, and may require high priority packets for some communications
20
448 Bit Operation Cycles 448 bits onto chip Up to 48 bit header detection on previous 47 bits, and 401 bits of current 448 bits (48 bit comparators) – Send header positions in this 448 bit window Send to high priority and low priority in queues Packet priority detection (header) in queues Incorrect priority queue drops packet, in queue controller informed Remainder of packet sent to appropriate in queue Process packet header, send packet body to out queue Process times between 70 and 600 cycles, 345 cycles avg. Send updated packet header to out queue Inform out queue controller packet ready to send Send when output bus available 448 bits off chip
21
Maximum Throughput Node Hardware For gateways or hubs 6.5 million transistors for IXP1200 0.5 million transistors for other applications such as speech codecs, V.42bis, Huffman compression, and 3DES Up to 310 IXP1200s on the same chip
22
927 pins with I/O at clock speed Packet Processing at Nodes Maximum onto chip bandwidth Smallest IP packet is 20 bytes (header size) Maximum required processing power 927 pins with I/O at clock speed
23
Hub Cache and Main Memory Required for IXP1200s Assumed by Scott in IXP1200 simulations: – 4 MB of DRAM – 2 MB of SRAM
24
Hub Register Memory
25
Average Scenario Information Assumed normal distribution between 80 and 600 cycles to process a packet – Average of 340 cycles – 80 and 600 are two standard deviations from mean Packet sizes:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.