Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report.

Similar presentations


Presentation on theme: "Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report."— Presentation transcript:

1 Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report

2 Intel® IXP2XXX Network Processor Architecture and Programming Prof. Laxmi Bhuyan Computer Science UC Riverside

3 MEv2 6 MEv2 7 MEv2 5 MEv2 8 Intel® XScale™ Core 32K IC 32K DC Rbuf 64 @ 128B Tbuf 64 @ 128B Hash 64/48/128 Scratch 16KB QDR SRAM 1 QDR SRAM 2 DDRAM GASKETGASKET PCI (64b) 66 MHz 32b 32b 18181818 72 64b S P I 3 or C S I X E/D Q MEv2 2 MEv2 3 MEv2 1 MEv2 4 CSRs -Fast_wr-UART -Timers-GPIO -BootROM/Slow Port IXP2400 Shared Memory Architecture – SRAM is not cache, but stores frequently accessed data – Packet Header goes to ME and payload goes to DRAM – Combined and sent out after processing

4 SDRAM IXP2400 Full-Duplex OC-48 System Implementation IXF6048 Framer IXP2400 Ingress Processor IXP2400 Egress Processor Switch Fabric Gasket SDRAM QDRQDRQDRQDR Q QQDRDRQQDRDR DDR SDRAM Packet Memory QDR SRAM Queues & Tables DDR SDRAM Packet Memory QDR SRAM Queues & Tables 1x OC-48 or 4x OC-12 OC-48OC48 QDRQDRQDRQDR QDRQDRQDRQDR TCAM Classification Accelerator TCAM Host CPU (IOP or iA) SAR’ing Classification Metering Policing Initial Congestion Management Ingress Processor Traffic Shaping Flexible Choices diff serve TM 4.0 … Egress Processor

5 IXP2400 Chaining PCI 64/66 2.5Gbs CSIX-L1 IXP2400 Processor DDR Packet Memory IXP2400 Processor QDR SRAM Queues & Tables DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DDR Packet Memory 2.5 Gbs CSIX-L1 IXP2400 Processor QDR SRAM Queues & Tables DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DDR Packet Memory Control Plane Processor 2.5Gbs CSIX-L1 2.5Gbs SPI3 Limited Control Memory per ME, so pipelining is necssary Research: Parallel/Pipeline Scheduling of Application Task Graphs

6 Intel® XScale™ Core 32K IC 32K DC MEv2 10 MEv2 11 MEv2 12 MEv2 15 MEv2 14 MEv2 13 Rbuf 64 @ 128B Tbuf 64 @ 128B Hash 48/64/128 Scratch 16KB QDR SRAM 2 QDR SRAM 1 RDRAM 1 RDRAM 3 RDRAM 2 GASKETGASKET PCI (64b) 66 MHz IXP2800 16b 16b 18181818 181818 64b S P I 4 or C S I X Stripe E/D Q QDR SRAM 3 E/D Q 1818 MEv2 9 MEv2 16 MEv2 2 MEv2 3 MEv2 4 MEv2 7 MEv2 6 MEv2 5 MEv2 1 MEv2 8 CSRs -Fast_wr-UART -Timers-GPIO -BootROM/SlowPort QDR SRAM 4 E/D Q 1818

7 IXP2800 and IXP2400 Comparison Dual chip full duplex OC48Dual chip full duplex OC192 Performance 8 (MEv2)16 (MEv2)Number of MicroEngines Separate 32 bit Tx & Rx configurable to SPI-3, UTOPIA 3 or CSIX_L1 Separate 16 bit Tx & Rx configurable to SPI-4 P2 or CSIX_L1 Media Interface 2 channels QDR (or co- processor) 4 channels QDR (or co- processor) SRAM Memory 1 channel DDR DRAM - 150MHz; Up to 2GB 3 channels RDRAM 800/1066MHz; Up to 2GB DRAM Memory 600/400MHz1.4/1.0 GHz/ 650 MHzFrequency IXP2400IXP2800

8 128 GPR Control Store 4K/8K Instructions 128 GPR Local Memory 640 words 128 Next Neighbor 128 S Xfer Out 128 D Xfer Out Other Local CSRs CRC Unit 128 S Xfer In 128 D Xfer In LM Addr 1 LM Addr 0 D-Push Bus S-Push Bus D-Pull BusS-Pull Bus To Next Neighbor From Next Neighbor A_Operand B_Operand ALU_Out P-Random # 32-bit Execution Data Path Multiply Find first bit Add, shift, logical 2 per CTX CRC remain Lock 0-15 Status and LRU Logic (6-bit) TAGs 0-15 Status Entry# CAM Timers Timestamp Prev B B_op Prev A A_op MicroEngine v2

9 Microengine v2 Features – Part 1 Clock Rates –IXP2400 – 600/400 MHz –IXP2800 - 1.4/1.0 GHz/ 650 MHz Control Store –IXP2400 – 4K Instruction store –IXP2800 – 8K Instruction store Configurable to 4 or 8 threads –Each thread has its own program counter, registers, signal and wakeup events –Generalized Thread Signaling (15 signals per thread) Local Storage Options –256 GPRs –256 Transfer Registers –128 Next Neighbor Registers –640 - 32bit words of local memory

10 Microengine v2 Features – Part 2 CAM (Content Addressable Memory) –Performs parallel lookup on 16 - 32bit entries –Reports a 9-bit lookup result 4 State bits (software controlled, no impact to hardware) Hit – entry number that hit; Miss – LRU entry 4-bit index of Cam entry (Hit) or LRU (Miss) –Improves usage of multiple threads on same data CRC hardware –IXP2400 - Provides CRC_16, CRC_32 –IXP2800 - Provides CRC_16, CRC_32, iSCSI, CRC_10 and CRC_5 –Accelerates CRC computation for ATM AAL/SAR, ATM OAM and Storage applications Multiply hardware –Supports 8x24, 16x16 and 32x32 –Accelerates metering in QoS algorithms DiffServ, MPLS Pseudo Random Number generation –Accelerates RED, WRED algorithms 64-bit Time-stamp and 16-bit Profile count

11 Intel® XScale™ Core Overview High-performance, Low-power, 32-bit Embedded RISC processor Clock rate –IXP2400 600 MHz –IXP2800 700/500/325 MHz 32 Kbyte instruction cache 32 Kbyte data cache 2 Kbyte mini-data cache Write buffer Memory management unit

12 Web Server Architecture

13 Dispatching Algorithms Strategies to select the target server of the web clusters Static: Fastest solution to prevent web server bottleneck, but do not consider the current state of the servers Dynamic: Outperform static algorithms by using intelligent decisions, but collecting state information and analyzing them cause expensive overheads Requirements: (1) Low computational complexity (2) Full compatibility with web standards (3) state information must be readily available without much overhead

14

15

16

17 Cluster based Architecture Needs a Web Switch

18 Distributed Architecture

19 Two Approaches Depends on which OSI protocol layer at which the web switch routes inbound packets layer-4 switch – Determines the target server when TCP SYN packet is received. Also called content-blind routing because the server selection policy is not based on http contents at the application level layer-7 switch (Web Switch) – The switch first establishes a complete TCP connection with the client, examines http request at the application level and then selects a server. Can support sophisticated dispatching policies, but large latency for moving to application level – Also called Content-aware switches or Layer 5 switches in TCP/IP protocol.

20

21 Web Switch or Layer 5/7 Switch or Content Aware Switch Layer 4 switch –Content blind –Storage overhead –Difficult to administer Content-aware (Layer 5/7) switch –Partition the server’s database over different nodes –Increase the performance due to improved hit rate –Server can be specialized for certain types of request Switch Image Server Application Server HTML Server www.yahoo.com Internet GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com… APP. DATATCPIP

22 Latency

23 Throughput


Download ppt "Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report."

Similar presentations


Ads by Google