CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Based upon presentations from Raj Yavatkar, Intel and Wu-Chang Feng, OGI Introduction to Network.
Layer 3 Switching. Routers vs Layer 3 Switches Both forward on the basis of IP addresses But Layer 3 switches are faster and cheaper However, Layer 3.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
External perimeter of secure network public Internet SNMPdata transaction data control commands July 2003 Firewall Network Processor™: basic concept and.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.
Architectural Considerations for CPU and Network Interface Integration C. D. Cranor; R. Gopalakrishnan; P. Z. Onufryk IEEE Micro Volume: 201, Jan.-Feb.
1 K. Salah Module 4.0: Network Components Repeater Hub NIC Bridges Switches Routers VLANs.
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Introduction to Switched Networks Routing and Switching.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Router Architectures An overview of router architectures.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Router Architectures An overview of router architectures.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
General FPGA Architecture Field Programmable Gate Array.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Gigabit Routing on a Software-exposed Tiled-Microprocessor
Paper Review Building a Robust Software-based Router Using Network Processors.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Introduction Introduction 2/ INF5061: Multimedia data communication using network processors.
Computers Central Processor Unit. Basic Computer System MAIN MEMORY ALUCNTL..... BUS CONTROLLER Processor I/O moduleInterconnections BUS Memory.
Design and Characterization of TMD-MPI Ethernet Bridge Kevin Lam Professor Paul Chow.
Repeaters and Hubs Repeaters: simplest type of connectivity devices that regenerate a digital signal Operate in Physical layer Cannot improve or correct.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Department of Computer and IT Engineering University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri.
J. Christiansen, CERN - EP/MIC
CCNA 2 Week 1 Routers and WANs. Copyright © 2005 University of Bolton Welcome Back! CCNA 2 deals with routed networks You will learn how to configure.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Networks and Protocols CE Week 7b. Routing an Overview.
IXP Lab 2012: Part 1 Network Processor Brief. NCKU CSIE CIAL Lab2 Outline Network Processor Intel IXP2400 Processing Element Register Memory Interface.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
CS 4396 Computer Networks Lab Router Architectures.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
What is a Microprocessor ? A microprocessor consists of an ALU to perform arithmetic and logic manipulations, registers, and a control unit Its has some.
Lecture Note on Switch Architectures. Function of Switch.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
CS 31006: Computer Networks – The Routers
Control Unit Introduction Types Comparison Control Memory
Constructing a system with multiple computers or processors
Apparao Kodavanti Srinivasa Guntupalli
Network Processors for a 1 MHz Trigger-DAQ System
Project proposal: Questions to answer
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Presentation transcript:

CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang

About the course ● Prerequisite: CSE 524 or the equivalent ● Implementation-focused course – Intel's IXA network processor platform ● Contents – Brief lecture material on network processors and the IXP – 5 weeks of designed laboratories – 3 weeks of final projects

Modern router architectures ● Split into a fast path and a slow path ● Control plane – High-complexity functions – Route table management – Network control and configuration – Exception handling ● Data plane – Low complexity functions – Fast-path forwarding

Router functions ● RFC 1812 plus... – Error detection and correction – Traffic measurement and policing – Frame and protocol demultiplexing – Address lookup and packet forwarding – Segmentation, fragmentation, reassembly – Packet classification – Traffic shaping – Timing and scheduling – Queuing – Security

Design choices for network products ● General purpose processors ● Embedded RISC processors ● Network processors ● Field-programmable gate arrays (FPGAs) ● Application-specific integrated circuits (ASICs)

General purpose processors (GPP) ● Programmable ● Mature development environment ● Typically used to implement control plane ● Too slow to run data plane effectively – Sequential execution – CPU/Network 50x increase over last decade – Memory latencies 2x decrease over last decade ● Gigabit ethernet: 333 nanosecond per packet budget ● Cache miss: ~ nanoseconds

Embedded RISC processors (ERP) ● Same as GPP, but – Slower – Cheaper – Smaller (require less board space) – Designed specifically for network applications ● Typically used for control plane functions

Application-specific integrated circuits (ASIC) ● Custom hardware ● Long time to market ● Expensive ● Difficult to develop and simulate ● Not programmable ● Not reusable ● But, the fastest of the bunch ● Suitable for data plane

Field Programmable Gate Arrays (FPGA) ● Flexible re-programmable hardware ● Less dense and slower than ASICs ● Cheaper than ASICs ● Good for providing fast custom functionality ● Suitable for data plane

Network processors ● The speed of ASICs/FPGAs ● The programmability and cost of GPPs/ERPs ● Flexible ● Re-usable components ● Lower cost ● Suitable for data plane

Network processors ● Common features – Small, fast, on-chip instruction stores (no caching) – Custom network-specific instruction set programmed at assembler level ● What instructions are needed for NPs? Open question. ● Minimality, Generality – Multiple processing elements – Multiple thread contexts per element – Multiple memory interfaces to mask latency – Fast on-chip memory (headers) and slow off-chip memory (payloads) – No OS, hardware-based scheduling and thread switching

Why network processors? ● The propaganda ● Take the current vertical network device market ● Commoditize horizontal slices of it ● PC market – Initially, an IBM custom vertical – Now, a commodity market with Intel providing the chip-set ● Network device market – Draw your own conclusions

Network processing approaches Programming/Development Ease Speed ASIC Network processor FPGA GPP Embedded RISC Processor

Network processor architectures ● Packet path – Store and forward ● Packet payload completely stored in and forwarded from off-chip memory ● Allows for large packet buffers ● Re-ordering problems with multiple processing elements ● Intel IXP, Motorola C5 – Cut-through ● Packet held in an on-chip FIFO and forwarded through directly ● Small packet buffers ● Built-in packet ordering ● AMCC

Network processor architectures ● Processing architecture – Parallel ● Each element independently performs entire processing function ● Packet re-ordering problems ● Larger instruction store needed per element – Pipelined ● Each element performs one part of larger processing function ● Communicates result to next processing element in pipeline ● Smaller code space ● Packet ordering retained ● Deterministic behavior (no memory thrashing) – Hybrid

Network processor architectures ● Processing hierarchy – ASICs – Embedded RISC processors – Specialized co-processors – See figure 13.7 in book

Network processor architectures ● Memory hierarchy – Small on-chip memory ● Control/Instruction store ● Registers ● Cache ● RAM – Large off-chip memory ● Cache ● Static RAM ● Dynamic RAM

Network processor architectures ● Internal interconnect – Bus – Cross-bar – FIFO – Transfer registers

Network processor architectures ● Concurrency – Hardware support for multiple thread contexts – Operating system support for multiple thread contexts – Pre-emptiveness – Migration support

Increasing network processor performance ● Processing hierarchy – Increase clock speed – Increase elements ● Memory hierarchy – Increase size – Decrease latency – Pipelining – Add hierachies – Add memory bandwidth (parallel stores) – Add functional memory (CAMs)

Focus of this class... ● Network processors – Intel IXA

IXP 1200 features ● One embedded RISC processor (StrongARM) – Runs control plane (Linux) ● 6 programmable packet processors (  -engines) – Runs data plane (  -engine assembler or  -engine C) ● Central hash unit ● Multiple, bus interconnects – IXBus (4.4Gbps) to overcome PCI's 2.2Gbps limit ● Small on-board memory ● Serial interface for control ● External interfaces for memory

IXP12xx  -engine

IXP2xxx  -engine

 -engine functions ● Packet ingress from physical layer interface ● Checksum verification ● Header processing and classification ● Packet buffering in memory ● Table lookup and forwarding ● Header modification ● Checksum computation ● Packet egress to physical layer interface

 -engine characteristics ● Programmable microcontroller – Custom RISC instruction set – Private 2048 instruction store per  -engine (loaded by StrongARM) – 5-stage execution pipeline ● Hardware support for 4 threads and context switching – Each  -engine has 4 hardware contexts (mask memory latency)

 -engine characteristics ● 128 general purpose registers – Can be partitioned or shared – Absolute or context-relative ● 128 transfer registers – Staging registers for memory transfers – 4 blocks of 32 registers ● SDRAM or SRAM ● Read or Write ● Local Control and Status Registers (CSRs) – USTORE instructions, CTX, etc. (p. 315)

 -engine characteristics ● FBI unit – Scratchpad memory – Hash unit – FBI CSRs – IXBus control – IXBus FIFOs ● Transmit and Receive FIFOs to external line cards

 -engine opcodes ● ALU instructions – ALU, ALU_SHF, DBL_SHIFT ● Branch/Jump instructions – BR, BR=0, BR!=0, BR_BSET, BR=BYTE, BR=CTX, BR_INP_STATE, BR_!SIGNAL, JUMP, RTN, etc. ● Reference instructions – CSR, FAST_WR, LOCAL_CSR_RD, R_FIFO_RD, PCI_DMA, SCRATCH, SDRAM, SRAM, T_FIFO_WR, etc. ● Local register instructions – FIND_BST, IMMED, LD_FIELD, LOAD_ADDR, LOAD_BSET_RESULT1, etc.

 -engine functions ● Miscellaneous – CTX_ARB – NOP – HASH1_48, HASH1_64, etc.

1. Packet received on physical interface (MAC) 2. Ready-bus sequencer polls MAC for mpacket Updates receive-ready upon a full mpacket 3.  -engine polls for receive-ready 4.  -engine instructs FBI to move mpacket from MAC to RFIFO 5.  -engine moves mpacket directly from RFIFO to SDRAM 6. Repeat 1-5 until full packet received 7.  -engine or StrongARM processing 8. Packet header read from SDRAM or RFIFO into m-engine and classified (via SRAM tables) 9. Packet headers modified 10. mpackets sent to interface 11. Poll for space on MAC Update transmit-ready if room for mpacket 12. mpackets transferred to MAC

Programming the IXP ● Focus of this course on steps 7, 8, and 9 ● 2 programming frameworks – Command-line, IXA Active Computing Engine (ACE) framework – Graphical microengine C development environment

Programming the IXP ● Command-line, IXA Active Computing Engine (ACE) framework – Re-usable function blocks chained together to build an application (Chapters 22-24) – New functions implemented as new blocks in chain ● Core ACEs (StrongARM) – Written in C ● Microblock ACEs (microengines) – Written in assembler

Programming the IXP ● Graphical microengine C development environment – Monolithic microengine C code (can not be used on IXP1200 hardware) – Demos forthcoming