Design of a Diversified Router: Project Management

Slides:

Advertisements

Similar presentations

Engineering Patrick Crowley, John DeHart, Mart Haitjema, Fred Kuhns, Jyoti Parwatikar, Ritun Patney, Jon Turner, Charlie Wiseman, Mike Wilson, Ken Wong,

Advertisements

Supercharging PlanetLab A High Performance,Multi-Alpplication,Overlay Network Platform Reviewed by YoungSoo Lee CSL.

Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.

Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University

Paper Review Building a Robust Software-based Router Using Network Processors.

John DeHart ONL NP Router Block Design Review: Lookup (Part of the PLC Block)

David M. Zar Applied Research Laboratory Computer Science and Engineering Department ONL Stats Block.

Michael Wilson Block Design Review: ONL Header Format.

1 - Charlie Wiseman - 05/11/07 Design Review: XScale Charlie Wiseman ONL NP Router.

Michael Wilson Block Design Review: Line Card Key Extract (Ingress and Egress)

Block Design Review: Queue Manager and Scheduler Amy M. Freestone Sailesh Kumar.

David M. Zar Applied Research Laboratory Computer Science and Engineering Department ONL Freelist Manager.

John DeHart Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress.

Brandon Heller Block Design Review: Substrate Decap and IPv4 Parse.

Queue Manager and Scheduler on Intel IXP John DeHart Amy Freestone Fred Kuhns Sailesh Kumar.

1 - Charlie Wiseman, Shakir James - 05/11/07 Design Review: Plugin Framework Charlie Wiseman and Shakir James ONL.

John DeHart An NP-Based Router for the Open Network Lab Memory Map.

David M. Zar Block Design Review: PlanetLab Line Card Header Format.

Mart Haitjema Block Design Review: ONL NP Router Multiplexer (MUX)

John DeHart Netgames Plugin Issues. 2 - JDD - 6/13/2016 SRAM ONL NP Router Rx (2 ME) HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) TCAM SRAM Mux (1 ME) Tx.

Transmitter Interrupts Review of Receiver Interrupts How to Handle Transmitter Interrupts? Critical Regions Text: Tanenbaum

Introduction to threads

Supercharged PlanetLab Platform, Control Overview

CS 6560: Operating Systems Design

Flow Stats Module James Moscola September 12, 2007.

Design of a High Performance PlanetLab Node

Design of a Diversified Router: Memory Usage

Design of a Diversified Router: TCAM Usage

An NP-Based Router for the Open Network Lab

Design of a Diversified Router: Packet Formats

Design of a Diversified Router: Common Router Framework

ONL NP Router Plugins Shakir James, Charlie Wiseman, Ken Wong, John DeHart {scj1, cgw1, kenw,

Design of a Diversified Router: Dedicated CRF for IPv4 Metarouter

Design of a Diversified Router: Packet Formats

Design of a Diversified Router: IPv4 MR (Dedicated NP)

Chapter 2: The Linux System Part 2

Flow Stats Module James Moscola September 6, 2007.

Documentation for Each Block

Design of a Diversified Router: Line Card

Design of a Diversified Router: Monitoring

An NP-Based Router for the Open Network Lab Overview by JST

ONL Stats Engine David M. Zar Applied Research Laboratory Computer Science and Engineering Department.

Supercharged PlanetLab Platform, Control Overview

Next steps for SPP & ONL 2/6/2007

IXP Based Router for ONL: Architecture

Design of a Diversified Router: Project Assignments and Status Updates

QM Performance Analysis

Design of a Diversified Router: Project Assignments and Status Updates

SPP V1 Memory Map John DeHart Applied Research Laboratory Computer Science and Engineering Department.

SDK Demo/Tutorial John DeHart.

Planet Lab Memory Map David M. Zar Applied Research Laboratory Computer Science and Engineering Department.

Design of a Diversified Router: Dedicated CRF plus IPv4 Metarouter

Design of a Diversified Router: November 2006 Demonstration Plans

Code Review for IPv4 Metarouter Header Format

Code Review for IPv4 Metarouter Header Format

An NP-Based Router for the Open Network Lab Meeting Notes

Design of a Diversified Router: Memory Usage

An NP-Based Router for the Open Network Lab Project Information

Implementing an OpenFlow Switch on the NetFPGA platform

Transmitter Interrupts

SPP Router Plans and Design

IXP Based Router for ONL: Architecture

Multithreaded Programming

Design of a High Performance PlanetLab Node: Line Card

SPP Version 1 Router QM Design

Project proposal: Questions to answer

Process Description and Control in Unix

Design of a Diversified Router: Project Management

Process Description and Control in Unix

Presentation transcript:

Design of a Diversified Router: Project Management John DeHart jdd@arl.wustl.edu http://www.arl.wustl.edu/arl

Revision History 5/xx/06 (JDD): 6/03/06 (JDD): Created Added information about packet/buffer dropping

What Needs to be Defined? SDK Version? 4.0 vs. 4.2 System-wide project file for IXA SDK Developers Workbench Source code file headers: ARL specific copyright File, Author, Email address, Organization, Creation date, Modification history, etc. Microengine assignments Scratch and Next Neighbor Ring usage dl_system.h stuff SRAM Channel definitions Scratch rings Buffer sizes Block IDs Source Code Control cvs Using local disks vs. Server disks Backups!!! Directory structure Interactions between Control Plane and Data Plane Initialization data needed by each Module Modifications while running Where do “slow path” packets go? How are packets dropped by different modules? Stubs for each module (except Rx and Tx) Pass the pkt along with default values for any data needed by the next module. Tests a lot of system level things Builds a system level testbench that each module could use for a first level of integration. Testbenches

Microengine Usage: LC Ingress Phy Int Rx1 Key Extract Common Lookup TCAM Hdr Format QM/Schd Tx 1-5 ME 0:2 ME 0:4 ME 0:6 ME 1:0 ME 1:2 ME 1:3 Splitter Port Phy Int Rx2 Key Extract Specific Lookup Memory QM/Schd Tx 6-10 ME 0:3 ME 0:5 ME 0:7 ME 1:1 ME 1:4 ME 1:5 12 Microengines used. Two scratch rings needed Port Splitter  QM/Sched (one for each)

Microengine Usage: LC Egress Switch Rx1 Key Extract Lookup TCAM Hdr Format QM/Schd Tx 1-5 ME 0:2 ME 0:5 ME 0:6 ME 1:0 ME 1:2 ME 1:3 Splitter Port Switch Rx2 Lookup Memory QM/Schd Tx 6-10 ME 0:3 ME 0:7 ME 1:1 ME 1:4 ME 1:5 11 Microengines used. Two scratch rings needed Port Splitter  QM/Sched (one for each)

Microengine Usage: IPv4 MR Phy Int Rx1 Demux Lookup TCAM Hdr Format QM/Schd Tx 1-5 ME 0:2 ME 0:4 ME 0:6 ME 1:0 ME 1:2 ME 1:3 Splitter Port Phy Int Rx2 Parse Lookup Memory QM/Schd Tx 6-10 ME 0:3 ME 0:5 ME 0:7 ME 1:1 ME 1:4 ME 1:5 12 Microengines used. Parse and Hdr Format still being sized but probably fit in one ME each. Two scratch rings needed Port Splitter  QM/Sched (one for each)

Directory Structure IXA_SDK_4.0/src/ include library applications building_blocks techX/Diversified_Router/src IDT_src/ LC_ingress Build <workbench project files> src key_extractor lookup hdr_format LC_egress build IPv4_MR parse demux packet_rx_10port packet_tx_5port qm_sched_5port port_splitter If we are going to use any file from the IXA_SDK src tree either unmodified or modified, we first copy into the similar place in our src/IXA_SDK_4.0 tree and check it into our cvs repository. If we modify any of these files, subsequent cvs commits will include our changes. This also gives us a cvs record of our changes to Intel files. Our build and include paths will not include the standard IXA_SDK paths. Forces us to really understand what we are using from Intel Gives us a self-contained directory tree of the files for our project. Each individual module will probably have a directory structure something like this: Src Build Testbench Stub

Dropping Packets In the library code, there appears to be two methods for dropping buffers: Using a Freelist_Manager Any block that wants to drop a buffer puts it on a scratch ring and the Freelist_Manager pulls them off and frees them. Using a direct call to dl_buf_free. Any block that wants to drop a buffer calls dl_buf_drop which calls dl_buf_free. Sample app does not appear to #define FREELIST_MANAGER which implies that it takes the direct call to dl_buf_free method of dropping buffers In the sample app, packet dropping is initiated in two places: dl_qm_sink This is the dl_sink for the packet processing (dl_sink to qm) Makes a call to dl_buf_drop or dl_buf_drop_chain Queue_manager Puts the packet to be dropped in a DROP_QUEUE Scheduler then dequeues the packet to be dropped and calls dl_buf_drop

Dropping Packets (continued) Why drop packets/buffers in dl_qm_sink? All the context ordering mechanisms are implemented in dl_source and dl_qm_sink. If a block, drops a packet/buffer and does not call dl_qm_sink, then it gets out of synchronization with the context ordering mechanisms. What causes a packet/buffer to be dropped in dl_qm_sink? If dl_next_block is set to IX_DROP, then dl_sink will drop the packet. And everything stays in the correct order.

Dropping Packets (continued) Macros involved in packet/buffer dropping dl_buf_drop calls dl_buf_free dl_buf_drop: located in src/library/microblocks_library/microcode/dl_buf.uc dl_buf_fre calls buf_free dl_buf_free: located in src/library/microblocks_library/microcode/dl_buf.uc buf_free puts buffer back on Freelist buf_free: located in src/library/dataplane_library/microcode/buf.uc Freelist is implemented as an SRAM Queue Freelist SRAM queue is created at initialization time and loaded into the Q-Array and never unloaded. Should not interfere with Q-Array operations of QM since the QM uses the 16 CAM entries to manage 16 of the 64 Q-Array entries. The other 48 Q-Array entries would never be touched by the QM. I don’t see any reason why we can’t use the same scheme.

Stubs How many kinds would we need: Operational Rx and Tx One ME , 8 parallel threads, In NN, Out NN Everything except QM and Port Splitter? One ME , 8 parallel threads, In Scratch Ring, Out NN QM One ME , 8 parallel threads, In NN, Out 2 Scratch Rings Port Splitter only Stub for this is probably VERY close to finished block! Probably only needs 1 thread. Two ME , 16 parallel threads, In Scratch Ring, Out Scratch Ring Not needed, yet. We currently don’t have any blocks that require two parallel MEs. The two ME blocks we have either: run two MEs in series, each running different code (Rx, Lookup, Key Extract) OR Run two MEs in parallel, but their input and output rings are separate (Tx and QM) These may not be exactly how each block needs to be implemented but it should give a starting point to most blocks. For example, QM will not operate as 8 parallel threads.

(In NN, Out NN) Stub . CTX-0 CTX-1 . . . CTX-2 . . . CTX-7 KEY KEY KEY In NN Ring Out NN Ring . . . CTX-2 . . . KEY KEY KEY KEY Result Result Result Result . CTX-7

(In NN, Out NN) Stub CTX-x In NN !Empty Out NN !Full Next_Ctx Start Input NN Ring is not empty, something for us to read. Out NN !Full Output NN Ring is not full, space for us to write to it. Next_Ctx Start Our turn to read from the In NN Ring. Next_Ctx Done Our turn to write to the Out NN Ring. Need: dl_source_NN_#words One for each number of words? dl_source_NN( dl_sink_NN_#words Next_Ctx Start Next_Ctx Done CTX-x In NN !Empty Out NN !Full Next_Ctx Start Next_Ctx Done

Pseudocode for (In NN, Out NN) Stub Initialization Phase Initialize registers for holding data from In NN Initialize registers for sending data out to Out NN Start Wait on ((Next_Ctx Start signal) and (In NN Ring !Empty signal)) Phase 1 Assert Next_Ctx Start signal Read In NN Ring into registers Set registers for sending data out to Out NN Wait for ((Next_Ctx Done signal) and (Out NN Ring !Full signal)) Phase 2 Assert Next_Ctx Done signal Write to Out NN Ring GoTo Phase 1

(In Scratch, Out NN) Stub CTX-0 CTX-1 In Scratch Ring Out NN Ring . . . CTX-2 . . . Data Data Data Data Data Data Data Data . CTX-7

(In Scratch, Out NN) Stub In Scratch !Empty Input Scratch Ring is not empty, something for us to read. Out NN !Full Output NN Ring is not full, space for us to write to it. Next_Ctx Start Our turn to read from the In NN Ring. Next_Ctx Done Our turn to write to the Out NN Ring. Next_Ctx Start Next_Ctx Done CTX-x In Scratch !Empty Out NN !Full Next_Ctx Start Next_Ctx Done

Pseudocode for (In Scratch, Out NN) Stub Initialization Phase Initialize registers for holding data from In Scratch Initialize registers for sending data out to Out NN Start Wait on ((Next_Ctx Start signal) and (In Scratch Ring !Empty signal)) Phase 1 Assert Next_Ctx Start signal Read In Scratch Ring into registers Set registers for sending data out to Out NN Wait for ((Next_Ctx Done signal) and (Out NN Ring !Full signal)) Phase 2 Assert Next_Ctx Done signal Write to Out NN Ring GoTo Phase 1

Extra The next set of slides are for templates or extra information if needed

Text Slide Template

Image Slide Template