Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design of a Diversified Router: Project Management

Similar presentations


Presentation on theme: "Design of a Diversified Router: Project Management"— Presentation transcript:

1 Design of a Diversified Router: Project Management
John DeHart

2 Revision History 5/xx/06 (JDD): 6/03/06 (JDD): Created
Added information about packet/buffer dropping

3 What Needs to be Defined?
SDK Version? 4.0 vs. 4.2 System-wide project file for IXA SDK Developers Workbench Source code file headers: ARL specific copyright File, Author, address, Organization, Creation date, Modification history, etc. Microengine assignments Scratch and Next Neighbor Ring usage dl_system.h stuff SRAM Channel definitions Scratch rings Buffer sizes Block IDs Source Code Control cvs Using local disks vs. Server disks Backups!!! Directory structure Interactions between Control Plane and Data Plane Initialization data needed by each Module Modifications while running Where do “slow path” packets go? How are packets dropped by different modules? Stubs for each module (except Rx and Tx) Pass the pkt along with default values for any data needed by the next module. Tests a lot of system level things Builds a system level testbench that each module could use for a first level of integration. Testbenches

4 Microengine Usage: LC Ingress
Phy Int Rx1 Key Extract Common Lookup TCAM Hdr Format QM/Schd Tx 1-5 ME 0:2 ME 0:4 ME 0:6 ME 1:0 ME 1:2 ME 1:3 Splitter Port Phy Int Rx2 Key Extract Specific Lookup Memory QM/Schd Tx 6-10 ME 0:3 ME 0:5 ME 0:7 ME 1:1 ME 1:4 ME 1:5 12 Microengines used. Two scratch rings needed Port Splitter  QM/Sched (one for each)

5 Microengine Usage: LC Egress
Switch Rx1 Key Extract Lookup TCAM Hdr Format QM/Schd Tx 1-5 ME 0:2 ME 0:5 ME 0:6 ME 1:0 ME 1:2 ME 1:3 Splitter Port Switch Rx2 Lookup Memory QM/Schd Tx 6-10 ME 0:3 ME 0:7 ME 1:1 ME 1:4 ME 1:5 11 Microengines used. Two scratch rings needed Port Splitter  QM/Sched (one for each)

6 Microengine Usage: IPv4 MR
Phy Int Rx1 Demux Lookup TCAM Hdr Format QM/Schd Tx 1-5 ME 0:2 ME 0:4 ME 0:6 ME 1:0 ME 1:2 ME 1:3 Splitter Port Phy Int Rx2 Parse Lookup Memory QM/Schd Tx 6-10 ME 0:3 ME 0:5 ME 0:7 ME 1:1 ME 1:4 ME 1:5 12 Microengines used. Parse and Hdr Format still being sized but probably fit in one ME each. Two scratch rings needed Port Splitter  QM/Sched (one for each)

7 Directory Structure IXA_SDK_4.0/src/ include library applications building_blocks techX/Diversified_Router/src IDT_src/ LC_ingress Build <workbench project files> src key_extractor lookup hdr_format LC_egress build IPv4_MR parse demux packet_rx_10port packet_tx_5port qm_sched_5port port_splitter If we are going to use any file from the IXA_SDK src tree either unmodified or modified, we first copy into the similar place in our src/IXA_SDK_4.0 tree and check it into our cvs repository. If we modify any of these files, subsequent cvs commits will include our changes. This also gives us a cvs record of our changes to Intel files. Our build and include paths will not include the standard IXA_SDK paths. Forces us to really understand what we are using from Intel Gives us a self-contained directory tree of the files for our project. Each individual module will probably have a directory structure something like this: Src Build Testbench Stub

8 Dropping Packets In the library code, there appears to be two methods for dropping buffers: Using a Freelist_Manager Any block that wants to drop a buffer puts it on a scratch ring and the Freelist_Manager pulls them off and frees them. Using a direct call to dl_buf_free. Any block that wants to drop a buffer calls dl_buf_drop which calls dl_buf_free. Sample app does not appear to #define FREELIST_MANAGER which implies that it takes the direct call to dl_buf_free method of dropping buffers In the sample app, packet dropping is initiated in two places: dl_qm_sink This is the dl_sink for the packet processing (dl_sink to qm) Makes a call to dl_buf_drop or dl_buf_drop_chain Queue_manager Puts the packet to be dropped in a DROP_QUEUE Scheduler then dequeues the packet to be dropped and calls dl_buf_drop

9 Dropping Packets (continued)
Why drop packets/buffers in dl_qm_sink? All the context ordering mechanisms are implemented in dl_source and dl_qm_sink. If a block, drops a packet/buffer and does not call dl_qm_sink, then it gets out of synchronization with the context ordering mechanisms. What causes a packet/buffer to be dropped in dl_qm_sink? If dl_next_block is set to IX_DROP, then dl_sink will drop the packet. And everything stays in the correct order.

10 Dropping Packets (continued)
Macros involved in packet/buffer dropping dl_buf_drop calls dl_buf_free dl_buf_drop: located in src/library/microblocks_library/microcode/dl_buf.uc dl_buf_fre calls buf_free dl_buf_free: located in src/library/microblocks_library/microcode/dl_buf.uc buf_free puts buffer back on Freelist buf_free: located in src/library/dataplane_library/microcode/buf.uc Freelist is implemented as an SRAM Queue Freelist SRAM queue is created at initialization time and loaded into the Q-Array and never unloaded. Should not interfere with Q-Array operations of QM since the QM uses the 16 CAM entries to manage 16 of the 64 Q-Array entries. The other 48 Q-Array entries would never be touched by the QM. I don’t see any reason why we can’t use the same scheme.

11 Stubs How many kinds would we need:
Operational Rx and Tx One ME , 8 parallel threads, In NN, Out NN Everything except QM and Port Splitter? One ME , 8 parallel threads, In Scratch Ring, Out NN QM One ME , 8 parallel threads, In NN, Out 2 Scratch Rings Port Splitter only Stub for this is probably VERY close to finished block! Probably only needs 1 thread. Two ME , 16 parallel threads, In Scratch Ring, Out Scratch Ring Not needed, yet. We currently don’t have any blocks that require two parallel MEs. The two ME blocks we have either: run two MEs in series, each running different code (Rx, Lookup, Key Extract) OR Run two MEs in parallel, but their input and output rings are separate (Tx and QM) These may not be exactly how each block needs to be implemented but it should give a starting point to most blocks. For example, QM will not operate as 8 parallel threads.

12 (In NN, Out NN) Stub . CTX-0 CTX-1 . . . CTX-2 . . . CTX-7 KEY KEY KEY
In NN Ring Out NN Ring . . . CTX-2 . . . KEY KEY KEY KEY Result Result Result Result . CTX-7

13 (In NN, Out NN) Stub CTX-x In NN !Empty Out NN !Full Next_Ctx Start
Input NN Ring is not empty, something for us to read. Out NN !Full Output NN Ring is not full, space for us to write to it. Next_Ctx Start Our turn to read from the In NN Ring. Next_Ctx Done Our turn to write to the Out NN Ring. Need: dl_source_NN_#words One for each number of words? dl_source_NN( dl_sink_NN_#words Next_Ctx Start Next_Ctx Done CTX-x In NN !Empty Out NN !Full Next_Ctx Start Next_Ctx Done

14 Pseudocode for (In NN, Out NN) Stub
Initialization Phase Initialize registers for holding data from In NN Initialize registers for sending data out to Out NN Start Wait on ((Next_Ctx Start signal) and (In NN Ring !Empty signal)) Phase 1 Assert Next_Ctx Start signal Read In NN Ring into registers Set registers for sending data out to Out NN Wait for ((Next_Ctx Done signal) and (Out NN Ring !Full signal)) Phase 2 Assert Next_Ctx Done signal Write to Out NN Ring GoTo Phase 1

15 (In Scratch, Out NN) Stub
CTX-0 CTX-1 In Scratch Ring Out NN Ring . . . CTX-2 . . . Data Data Data Data Data Data Data Data . CTX-7

16 (In Scratch, Out NN) Stub
In Scratch !Empty Input Scratch Ring is not empty, something for us to read. Out NN !Full Output NN Ring is not full, space for us to write to it. Next_Ctx Start Our turn to read from the In NN Ring. Next_Ctx Done Our turn to write to the Out NN Ring. Next_Ctx Start Next_Ctx Done CTX-x In Scratch !Empty Out NN !Full Next_Ctx Start Next_Ctx Done

17 Pseudocode for (In Scratch, Out NN) Stub
Initialization Phase Initialize registers for holding data from In Scratch Initialize registers for sending data out to Out NN Start Wait on ((Next_Ctx Start signal) and (In Scratch Ring !Empty signal)) Phase 1 Assert Next_Ctx Start signal Read In Scratch Ring into registers Set registers for sending data out to Out NN Wait for ((Next_Ctx Done signal) and (Out NN Ring !Full signal)) Phase 2 Assert Next_Ctx Done signal Write to Out NN Ring GoTo Phase 1

18 Extra The next set of slides are for templates or extra information if needed

19 Text Slide Template

20 Image Slide Template


Download ppt "Design of a Diversified Router: Project Management"

Similar presentations


Ads by Google