Intel ® IXP2XXX Network Processor Architecture and Programming Prof. Laxmi Bhuyan Computer Science UC Riverside
MEv2 6 MEv2 7 MEv2 5 MEv2 8 Intel® XScale™ Core 32K IC 32K DC Rbuf 128B Tbuf 128B Hash 64/48/128 Scratch 16KB QDR SRAM 1 QDR SRAM 2 DDRAM GASKETGASKET PCI (64b) 66 MHz 32b 32b b S P I 3 or C S I X E/D Q MEv2 2 MEv2 3 MEv2 1 MEv2 4 CSRs -Fast_wr-UART -Timers-GPIO -BootROM/Slow Port IXP2400
IXP2400 Bandwidths 600 MHz Operation 4.8+ GOPs 2.5 Gb/s Full Duplex Media Interface POS-PHY Utopia CSIX-L1 2.4 GBs DDR Memory Bandwidth at 300 MTs 1.6 GBs QDR Memory Bandwidth with 200 MHz QDRII devices
IXP2400 Resources Summary Half Duplex OC-48 / 2.5 Gb/sec Network Processor (8) Multi-Threaded Microengines Intel ® XScale ™ Core Media / Switch Fabric Interface PCI interface 2 QDR SRAM interface controllers 1 DDR SDRAM interface controller 8 bit asynchronous port Flash and CPU bus Additional integrated feature Hardware Hash Unit 16 KByte Scratchpad Memory,Serial UART port 8 general purpose I/O pins Four 32-bit timers JTAG Support
SDRAM IXP2400 Full-Duplex OC-48 System Implementation IXF6048 Framer IXP2400 Ingress Processor IXP2400 Egress Processor Switch Fabric Gasket SDRAM QDRQDRQDRQDR Q QQDRDRQQDRDR DDR SDRAM Packet Memory QDR SRAM Queues & Tables DDR SDRAM Packet Memory QDR SRAM Queues & Tables 1x OC-48 or 4x OC-12 OC-48OC48 QDRQDRQDRQDR QDRQDRQDRQDR TCAM Classification Accelerator TCAM Host CPU (IOP or iA) SAR’ing Classification Metering Policing Initial Congestion Management Ingress Processor Traffic Shaping Flexible Choices diff serve TM 4.0 … Egress Processor
IXP2400 Chaining PCI 64/66 2.5Gbs CSIX-L1 IXP2400 Processor DDR Packet Memory IXP2400 Processor QDR SRAM Queues & Tables DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DDR Packet Memory 2.5 Gbs CSIX-L1 IXP2400 Processor QDR SRAM Queues & Tables DRAMQ QQDRDRQQDRDRQ QQDRDRQQDRDR DDR Packet Memory Glueless Interface between IXP2400 Devices using CSIX-L1 Control Plane Processor 2.5Gbs CSIX-L1 2.5Gbs SPI3
Intel® XScale™ Core 32K IC 32K DC MEv2 10 MEv2 11 MEv2 12 MEv2 15 MEv2 14 MEv2 13 Rbuf 128B Tbuf 128B Hash 48/64/128 Scratch 16KB QDR SRAM 2 QDR SRAM 1 RDRAM 1 RDRAM 3 RDRAM 2 GASKETGASKET PCI (64b) 66 MHz IXP b 16b b S P I 4 or C S I X Stripe E/D Q QDR SRAM 3 E/D Q 1818 MEv2 9 MEv2 16 MEv2 2 MEv2 3 MEv2 4 MEv2 7 MEv2 6 MEv2 5 MEv2 1 MEv2 8 CSRs -Fast_wr-UART -Timers-GPIO -BootROM/SlowPort QDR SRAM 4 E/D Q 1818
IXP2800 Bandwidths 1.4 GHz Operation 20+ GOPs 10Gbs Full Duplex Media Interface SPI-4.2 CSIX-L1 1.9 GB/s QDR SRAM Memory Bandwidth/Channel 2.1 GB/s RDRAM Memory Bandwidth/Channel
IXP2800 Resources Summary Half Duplex OC-192 / 10 Gb/sec Network Processor (16) Multi-Threaded Microengines Intel ® XScale ™ Core Media / Switch Fabric Interface PCI interface 4 QDR SRAM Interface Controllers 3 Rambus* DRAM Interface Controllers 8 bit asynchronous port Flash and CPU bus Additional integrated features Hardware Hash Unit for generating of 48-, 64-, or 128-bit adaptive polynomial hash keys 16 KByte Scratchpad Memory Serial UART port for debug 8 general purpose I/O pins Four 32-bit timers JTAG Support
IXP2800 and IXP2400 Comparison Dual chip full duplex OC48Dual chip full duplex OC192 Performance 8 (MEv2)16 (MEv2)Number of MicroEngines Separate 32 bit Tx & Rx configurable to SPI-3, UTOPIA 3 or CSIX_L1 Separate 16 bit Tx & Rx configurable to SPI-4 P2 or CSIX_L1 Media Interface 2 channels QDR (or co- processor) 4 channels QDR (or co- processor) SRAM Memory 1 channel DDR DRAM - 150MHz; Up to 2GB 3 channels RDRAM 800/1066MHz; Up to 2GB DRAM Memory 600/400MHz1.4/1.0 GHz/ 650 MHzFrequency IXP2400IXP2800
128 GPR Control Store 4K/8K Instructions 128 GPR Local Memory 640 words 128 Next Neighbor 128 S Xfer Out 128 D Xfer Out Other Local CSRs CRC Unit 128 S Xfer In 128 D Xfer In LM Addr 1 LM Addr 0 D-Push Bus S-Push Bus D-Pull BusS-Pull Bus To Next Neighbor From Next Neighbor A_Operand B_Operand ALU_Out P-Random # 32-bit Execution Data Path Multiply Find first bit Add, shift, logical 2 per CTX CRC remain Lock 0-15 Status and LRU Logic (6-bit) TAGs 0-15 Status Entry# CAM Timers Timestamp Prev B B_op Prev A A_op MicroEngine v2
Microengine v2 Features – Part 1 Clock Rates IXP2400 – 600/400 MHz IXP /1.0 GHz/ 650 MHz Control Store IXP2400 – 4K Instruction store IXP2800 – 8K Instruction store Configurable to 4 or 8 threads Each thread has its own program counter, registers, signal and wakeup events Generalized Thread Signaling (15 signals per thread) Local Storage Options 256 GPRs 256 Transfer Registers 128 Next Neighbor Registers bit words of local memory
Microengine v2 Features – Part 2 CAM (Content Addressable Memory) Performs parallel lookup on bit entries Reports a 9-bit lookup result 4 State bits (software controlled, no impact to hardware) Hit – entry number that hit; Miss – LRU entry 4-bit index of Cam entry (Hit) or LRU (Miss) Improves usage of multiple threads on same data CRC hardware IXP Provides CRC_16, CRC_32 IXP Provides CRC_16, CRC_32, iSCSI, CRC_10 and CRC_5 Accelerates CRC computation for ATM AAL/SAR, ATM OAM and Storage applications Multiply hardware Supports 8x24, 16x16 and 32x32 Accelerates metering in QoS algorithms DiffServ, MPLS Pseudo Random Number generation Accelerates RED, WRED algorithms 64-bit Time-stamp and 16-bit Profile count
Intel ® XScale ™ Core Overview High-performance, Low-power, 32-bit Embedded RISC processor Clock rate IXP MHz IXP /500/325 MHz 32 Kbyte instruction cache 32 Kbyte data cache 2 Kbyte mini-data cache Write buffer Memory management unit
IXA Software Framework XScale™ Core Programming Model External Processors Resource Manager Library Control Plane PDK Control Plane Protocol Stacks Core Components Core Component Infrastructure Library OSSL Microengine Programming Model Hardware Abstraction Library Utility LibraryProtocol Library Micro block Micro block Micro block Microblock Infrastructure Library
IXA Software Framework - Goals Accelerate software development for the IXP family of network processors Provide a simple and consistent infrastructure to write networking applications Enable reuse of components across applications Improve portability of code across the IXP family
Microengine Programming Model Hardware Abstraction Library Utility LibraryProtocol Library Micro block Micro block Micro block Microblock Infrastructure Library Dispatch Loop Hardware Abstraction Library Utility LibraryProtocol Library Microblock Infrastructure Library Micro block Micro block Micro block Microblock Group SourceSink
Microblock Programming Model Data Plane Libraries Libraries for commonly used functions Microblock Infrastructure Library Used by the Microblocks and the DL to manage packet meta data and DL variables Microblocks Enable development of modular code building blocks Define the data flow model, common data structures, state sharing between code blocks etc. Ensures consistency and improves reuse across the different reference applications Dispatch Loop (DL) The Glue code that binds Microblocks together to form Microblock Group
Microblocks A combined set of macros/functions that perform a data plane network processing function Each Microblock performs a major function on a packet 5-Tuple Classification, IPv4 Forwarding, NAT Written independent of each other Reusable across applications Use the infrastructure library Access and modify packet meta data and DL variables Use data plane libraries Hardware abstraction and code reusability
Microblock Architecture Dispatch Loop Hardware Abstraction Library Utility LibraryProtocol Library Microblock Infrastructure Library Micro block Micro block Micro block Microblock Group SourceSink Microblock Group bounded by a DL SourceSink ClassEncapIPv4 LM GPRs Packet Meta Data IP Header DL Variables Source Sink LMGPRsLMGPRs Rxtx Driver Microblocks Packet Processing Microblocks