Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.

Similar presentations


Presentation on theme: "Architecture for Network Hub in 2011 David Chinnery Ben Horowitz."— Presentation transcript:

1 Architecture for Network Hub in 2011 David Chinnery Ben Horowitz

2 Internet Model Network time-of-flight latency – Unavoidable End point latency – Limited by cheap solution for users Latency of internet nodes (hubs, gateways) – Can provide differentiated services High priority packets Other packets – If bandwidth insufficient, use multiple chips send interval of wavelengths to each

3 Internet Visualization San Fransisco, USA Perth, Australia ? hubs 2 gateways 2 end users Worst case packet journey: Halfway around the world 0.200 s tolerable latency for video conferencing

4 Maximum Nodes Packet Travels Average number of nodes traveled = log(number of nodes in internet) – Journey of 15.7 nodes average in 1996 Estimate one node/person in 2011 – Journey of 22.7 nodes average in 2011 39 nodes worst case in 1996 (1 in 1000)  Scaling by ratio of averages, gives 56.3 nodes worst case in 2011 (1 in 1000) 3 4 2 1 56 5455

5 Time of Flight Optic fiber delay 5 us/km Restore signal with repeaters every 100 km – Repeater delay 0.92 us [1999] Worst case journey length ~20,100 km 20,100 × 5 + 201 × 0.92 = 100,700 us Time of flight delay of 0.101 s 0.92 us 100 km 500 us

6 Internet Visualization San Fransisco, USA Perth, Australia ?52 hubs ?2 gateways ?2 end users Worst case packet journey: 0.101 sHalfway around the world 0.200 s tolerable latency for video conferencing

7 End User Model Worst case scenario Processing intensive application – MPEG4 encoding for HDTV2 Limited silicon area, as must be low cost – Sufficient for 1920×1080 HDTV2 at 30Hz  Processing latency 1/30 s End user to end user  Processing latency doubled 0.033 s

8 Internet Visualization San Fransisco, USA Perth, Australia ?52 hubs ?2 gateways 0.067 s2 end users Worst case packet journey: 0.101 sHalfway around the world 0.200 s tolerable latency for video conferencing 0.033 s

9 Node Hardware Model Processing cores are Intel IXP1200 routers Conservative ASIC frequency estimate – IXP1200 speed of 166MHz in 0.28 um – Linearly scale to 0.18 um  speed ×1.56 – Speed ×3.00 from 0.18 um to 0.05 um [ITRS]  IXP1200 speed of 775MHz in 2011  Assume across chip speed of 775 MHz – With custom macros at 10 GHz in 2011 ITRS estimate, across chip speed of 1.5 GHz

10 Node Router Hardware For gateways or hubs – 2011 ASIC: 8 cm 2, 811 million transistors/cm 2  6500 million transistors 6.5 million transistors for IXP1200 – If 2/3 of chip is memory and wires  Up to 333 IXP1200s on same chip estimate 300 IXP1200s

11 Packet Processing at Nodes Maximum onto chip bandwidth – 927 pins chip-to-package in 2011  359 Gbit/s, 695 Gbit/s Scaling IXP1200 to 2011, can process 11 million (21 million) packets/second – Can process 3.3 billion packets/s (6.3 billion) Smallest IP packet is 20 bytes (header size) – Maximum required processing of 2.2 billion packets (4.3 billion)  Spare processing power available

12 Bus and I/O Overview IXP1,15 Q1 in Q1 out IXP1,1 IXP1,2 IXP2,15 Q2 in Q2 out IXP2,1 IXP2,2 IXP20,15 Q20 in Q20 out IXP20,1 IXP20,2 Q out control IXP19,15 Q19 in Q19 out IXP19,1 IXP19,2 Q in control 32 bit I/O bus 128 bit control buses 64 bit control buses 48 bit header detection 448 bit output bus 448 bit input bus

13 Header Detection Hardware Custom header detection macro runs at 13 times chip speed, 10.075 GHz – 12 cycles for comparison, 1 to send positions Forty 48-bit comparators (80 at 1.5 GHz) – Up to 6 bytes detection (Ethernet destination) – Store last 47 bits from previous 448 bit word 48 bit comparator t-1 47 bitst 448 bits 1 bit shifter

14 48-bit Comparators Set mask for comparison to 0, 1 or X (don’t care) Custom comparison circuit – Signals and their negation are available from registers – 10 transistors to implement 7 bit counter with each to set header position  About 30,000 transistors total Possible 3 packets/448 bits  31 bits of bus to send positions input i mask i input i care i

15 Simulator Other simulators cumbersome for our task Wrote event driven simulator in Java – Worst case simulations:  Can easily process at maximum bandwidth with no additional latency

16 Worst Case Scenario Results Worst case scenario – Minimum packet size is 20 bytes – 448 bit input bus  3 packets or less per cycle – IXP1200 time to calculate next destination 75 cycles minimum, 345 cycles average 600 cycles maximum  At most 7 packets processed simultaneously on IXP1200 – IXP1200 has 6 micro-engines  load handled easily

17 Conclusions from Simulation  Latency of 605 cycles 0.78 us, 0.40 us Largest possible packet that could be sent after started processing is 65,536 bytes  Additional 1170 cycles latency 1.51 us, 0.78 us Transceiver delay 0.05 us [1999]  Additional 0.10 us/hop Total latency/hop of 2.4 us, 1.3 us 0.0000024 s0.0000024 s/hub

18 Internet Visualization San Fransisco, USA Perth, Australia 0.033 s 0.0000024 s 0.0000024 s/hub < 0.001 s 2 gateways 0.067 s2 end users Worst case packet journey: 0.101 sHalfway around the world 0.169 s tolerable latency for video conferencing < 0.001 s52 hubs (probability of 1 in 1000)

19 Conclusions  Limiting factor is maximum bandwidth Average case simulations done  Can easily process at maximum bandwidth with 40 IXP1200 processors (mostly longer packets)  Reduce processing power to levels sufficient for bandwidth and model – Less IXP1200s on chip – Smaller chip size reduces cost – Reduced processing power increases congestion, and may require high priority packets for some communications

20 448 Bit Operation Cycles 448 bits onto chip Up to 48 bit header detection on previous 47 bits, and 401 bits of current 448 bits (48 bit comparators) – Send header positions in this 448 bit window Send to high priority and low priority in queues Packet priority detection (header) in queues Incorrect priority queue drops packet, in queue controller informed Remainder of packet sent to appropriate in queue Process packet header, send packet body to out queue Process times between 70 and 600 cycles, 345 cycles avg. Send updated packet header to out queue Inform out queue controller packet ready to send Send when output bus available 448 bits off chip

21 Maximum Throughput Node Hardware For gateways or hubs 6.5 million transistors for IXP1200 0.5 million transistors for other applications such as speech codecs, V.42bis, Huffman compression, and 3DES  Up to 310 IXP1200s on the same chip

22 927 pins with I/O at clock speed Packet Processing at Nodes Maximum onto chip bandwidth Smallest IP packet is 20 bytes (header size) Maximum required processing power 927 pins with I/O at clock speed

23 Hub Cache and Main Memory Required for IXP1200s Assumed by Scott in IXP1200 simulations: – 4 MB of DRAM – 2 MB of SRAM

24 Hub Register Memory

25 Average Scenario Information Assumed normal distribution between 80 and 600 cycles to process a packet – Average of 340 cycles – 80 and 600 are two standard deviations from mean Packet sizes:


Download ppt "Architecture for Network Hub in 2011 David Chinnery Ben Horowitz."

Similar presentations


Ads by Google