Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Task Partitioning for Multi-Core Network Processors Rob Ennals, Richard Sharp Intel Research, Cambridge Alan Mycroft Programming Languages Research Group,
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
A First Example: The Bump in the Wire A First Example: The Bump in the Wire 9/ INF5061: Multimedia data communication using network processors.
A First Example: The Bump in the Wire A First Example: The Bump in the Wire 8/ INF5062: Programming Asymmetric Multi-Core Processors.
Chapter 6 Computer Architecture
©UCR CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II) Instructor: L.N. Bhuyan
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
Router Architecture : Building high-performance routers Ian Pratt
Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
Load Balancing in Web Clusters CS 213 LECTURE 15 From: IBM Technical Report.
1 Improving Web Servers performance Objectives:  Scalable Web server System  Locally distributed architectures  Cluster-based Web systems  Distributed.
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
4/22/2003 Network Processor & Its Applications1 Network Processor and Applications Prof. Laxmi Bhuyan
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
TCP Splicing for URL-aware Redirection
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan
©UCB CS 162 Computer Architecture Lecture 2: Introduction & Pipelining Instructor: L.N. Bhuyan
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
DAP Spr.‘98 ©UCB 1 CS 203 A Lecture 16: Review for Test 2.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
I/O Acceleration in Server Architectures
Gigabit Routing on a Software-exposed Tiled-Microprocessor
Lecture Note on Network Processors. What Is a Network Processor? Processor optimized for processing communications related tasks. Often implemented with.
Paper Review Building a Robust Software-based Router Using Network Processors.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Network Processors : Building Block for Programmable High- Speed Networks Introduction to the.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Lecture 18 Lecture 18: Case Study of SoC Design ECE 412: Microcomputer Laboratory.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
A 50-Gb/s IP Router 참고논문 : Craig Partridge et al. [ IEEE/ACM ToN, June 1998 ]
SpliceNP: A TCP Splicer using a Network Processor Li Zhao +, Yan Luo*, Laxmi Bhuyan University of California Riverside Ravi Iyer Intel Corporation + Now.
Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview.
CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.
CPEN Digital System Design
Web Cache Redirection using a Layer-4 switch: Architecture, issues, tradeoffs, and trends Shirish Sathaye Vice-President of Engineering.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
1 TM The ARM Architecture - 1 Embedded Systems Lab./Honam University ARM Architecture SA-110 ARM7TDMI 4T 1 Halfword and signed halfword / byte support.
IXP Lab 2012: Part 1 Network Processor Brief. NCKU CSIE CIAL Lab2 Outline Network Processor Intel IXP2400 Processing Element Register Memory Interface.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
CS 4396 Computer Networks Lab Router Architectures.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Intel ® IXP2XXX Network Processor Architecture and Programming Prof. Laxmi Bhuyan Computer Science UC Riverside.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Introduction to Content-aware Switch Presented by Li Zhao.
UltraSparc IV Tolga TOLGAY. OUTLINE Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion Introduction History.
Supercharged PlanetLab Platform, Control Overview
ARM Architecture T 5TE 5TEJ Improved ARM/Thumb Interworking
CS 31006: Computer Networks – The Routers
An NP-Based Router for the Open Network Lab Hardware
Supercharged PlanetLab Platform, Control Overview
IXP Based Router for ONL: Architecture
Lec 11 – Multicore Architectures and Network Processors
IXP Based Router for ONL: Architecture
Project proposal: Questions to answer
Instructor: L.N. Bhuyan CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L.N. Bhuyan.
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
ADSP 21065L.
Presentation transcript:

Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report

Intel® IXP2XXX Network Processor Architecture and Programming Prof. Laxmi Bhuyan Computer Science UC Riverside

MEv2 6 MEv2 7 MEv2 5 MEv2 8 Intel® XScale™ Core 32K IC 32K DC Rbuf 128B Tbuf 128B Hash 64/48/128 Scratch 16KB QDR SRAM 1 QDR SRAM 2 DDRAM GASKETGASKET PCI (64b) 66 MHz 32b 32b b S P I 3 or C S I X E/D Q MEv2 2 MEv2 3 MEv2 1 MEv2 4 CSRs -Fast_wr-UART -Timers-GPIO -BootROM/Slow Port IXP2400 Shared Memory Architecture – SRAM is not cache, but stores frequently accessed data – Packet Header goes to ME and payload goes to DRAM – Combined and sent out after processing

SDRAM IXP2400 Full-Duplex OC-48 System Implementation IXF6048 Framer IXP2400 Ingress Processor IXP2400 Egress Processor Switch Fabric Gasket SDRAM QDRQDRQDRQDR Q QQDRDRQQDRDR DDR SDRAM Packet Memory QDR SRAM Queues & Tables DDR SDRAM Packet Memory QDR SRAM Queues & Tables 1x OC-48 or 4x OC-12 OC-48OC48 QDRQDRQDRQDR QDRQDRQDRQDR TCAM Classification Accelerator TCAM Host CPU (IOP or iA) SAR’ing Classification Metering Policing Initial Congestion Management Ingress Processor Traffic Shaping Flexible Choices diff serve TM 4.0 … Egress Processor

IXP2400 Chaining PCI 64/66 2.5Gbs CSIX-L1 IXP2400 Processor DDR Packet Memory IXP2400 Processor QDR SRAM Queues & Tables DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DDR Packet Memory 2.5 Gbs CSIX-L1 IXP2400 Processor QDR SRAM Queues & Tables DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DDR Packet Memory Control Plane Processor 2.5Gbs CSIX-L1 2.5Gbs SPI3 Limited Control Memory per ME, so pipelining is necssary Research: Parallel/Pipeline Scheduling of Application Task Graphs

Intel® XScale™ Core 32K IC 32K DC MEv2 10 MEv2 11 MEv2 12 MEv2 15 MEv2 14 MEv2 13 Rbuf 128B Tbuf 128B Hash 48/64/128 Scratch 16KB QDR SRAM 2 QDR SRAM 1 RDRAM 1 RDRAM 3 RDRAM 2 GASKETGASKET PCI (64b) 66 MHz IXP b 16b b S P I 4 or C S I X Stripe E/D Q QDR SRAM 3 E/D Q 1818 MEv2 9 MEv2 16 MEv2 2 MEv2 3 MEv2 4 MEv2 7 MEv2 6 MEv2 5 MEv2 1 MEv2 8 CSRs -Fast_wr-UART -Timers-GPIO -BootROM/SlowPort QDR SRAM 4 E/D Q 1818

IXP2800 and IXP2400 Comparison Dual chip full duplex OC48Dual chip full duplex OC192 Performance 8 (MEv2)16 (MEv2)Number of MicroEngines Separate 32 bit Tx & Rx configurable to SPI-3, UTOPIA 3 or CSIX_L1 Separate 16 bit Tx & Rx configurable to SPI-4 P2 or CSIX_L1 Media Interface 2 channels QDR (or co- processor) 4 channels QDR (or co- processor) SRAM Memory 1 channel DDR DRAM - 150MHz; Up to 2GB 3 channels RDRAM 800/1066MHz; Up to 2GB DRAM Memory 600/400MHz1.4/1.0 GHz/ 650 MHzFrequency IXP2400IXP2800

128 GPR Control Store 4K/8K Instructions 128 GPR Local Memory 640 words 128 Next Neighbor 128 S Xfer Out 128 D Xfer Out Other Local CSRs CRC Unit 128 S Xfer In 128 D Xfer In LM Addr 1 LM Addr 0 D-Push Bus S-Push Bus D-Pull BusS-Pull Bus To Next Neighbor From Next Neighbor A_Operand B_Operand ALU_Out P-Random # 32-bit Execution Data Path Multiply Find first bit Add, shift, logical 2 per CTX CRC remain Lock 0-15 Status and LRU Logic (6-bit) TAGs 0-15 Status Entry# CAM Timers Timestamp Prev B B_op Prev A A_op MicroEngine v2

Microengine v2 Features – Part 1 Clock Rates –IXP2400 – 600/400 MHz –IXP /1.0 GHz/ 650 MHz Control Store –IXP2400 – 4K Instruction store –IXP2800 – 8K Instruction store Configurable to 4 or 8 threads –Each thread has its own program counter, registers, signal and wakeup events –Generalized Thread Signaling (15 signals per thread) Local Storage Options –256 GPRs –256 Transfer Registers –128 Next Neighbor Registers – bit words of local memory

Microengine v2 Features – Part 2 CAM (Content Addressable Memory) –Performs parallel lookup on bit entries –Reports a 9-bit lookup result 4 State bits (software controlled, no impact to hardware) Hit – entry number that hit; Miss – LRU entry 4-bit index of Cam entry (Hit) or LRU (Miss) –Improves usage of multiple threads on same data CRC hardware –IXP Provides CRC_16, CRC_32 –IXP Provides CRC_16, CRC_32, iSCSI, CRC_10 and CRC_5 –Accelerates CRC computation for ATM AAL/SAR, ATM OAM and Storage applications Multiply hardware –Supports 8x24, 16x16 and 32x32 –Accelerates metering in QoS algorithms DiffServ, MPLS Pseudo Random Number generation –Accelerates RED, WRED algorithms 64-bit Time-stamp and 16-bit Profile count

Intel® XScale™ Core Overview High-performance, Low-power, 32-bit Embedded RISC processor Clock rate –IXP MHz –IXP /500/325 MHz 32 Kbyte instruction cache 32 Kbyte data cache 2 Kbyte mini-data cache Write buffer Memory management unit

Web Server Architecture

Dispatching Algorithms Strategies to select the target server of the web clusters Static: Fastest solution to prevent web server bottleneck, but do not consider the current state of the servers Dynamic: Outperform static algorithms by using intelligent decisions, but collecting state information and analyzing them cause expensive overheads Requirements: (1) Low computational complexity (2) Full compatibility with web standards (3) state information must be readily available without much overhead

Cluster based Architecture Needs a Web Switch

Distributed Architecture

Two Approaches Depends on which OSI protocol layer at which the web switch routes inbound packets layer-4 switch – Determines the target server when TCP SYN packet is received. Also called content-blind routing because the server selection policy is not based on http contents at the application level layer-7 switch (Web Switch) – The switch first establishes a complete TCP connection with the client, examines http request at the application level and then selects a server. Can support sophisticated dispatching policies, but large latency for moving to application level – Also called Content-aware switches or Layer 5 switches in TCP/IP protocol.

Web Switch or Layer 5/7 Switch or Content Aware Switch Layer 4 switch –Content blind –Storage overhead –Difficult to administer Content-aware (Layer 5/7) switch –Partition the server’s database over different nodes –Increase the performance due to improved hit rate –Server can be specialized for certain types of request Switch Image Server Application Server HTML Server Internet GET /cgi-bin/form HTTP/1.1 Host: APP. DATATCPIP

Latency

Throughput