Queue Manager and Scheduler on Intel IXP John DeHart Amy Freestone Fred Kuhns Sailesh Kumar.

Slides:



Advertisements
Similar presentations
I/O Management and Disk Scheduling Chapter 11. I/O Driver OS module which controls an I/O device hides the device specifics from the above layers in the.
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Ceng-112 Data Structures I Chapter 5 Queues.
Performance of Cache Memory
Data Structures Lecture 13: QUEUES Azhar Maqsood NUST Institute of Information Technology (NIIT)
Spring 2006CS 685 Network Algorithmics1 Principles in Practice CS 685 Network Algorithmics Spring 2006.
Module R2 Overview. Process queues As processes enter the system and transition from state to state, they are stored queues. There may be many different.
1 CNPA B Nasser S. Abouzakhar Queuing Disciplines Week 8 – Lecture 2 16 th November, 2009.
Virtual Memory Management G. Anuradha Ref:- Galvin.
Fundamentals of Python: From First Programs Through Data Structures
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Cache Table. ARP Modules Output Module Sleep until IP packet is received from IP Software Check cache table for entry corresponding to the destination.
Informationsteknologi Friday, November 16, 2007Computer Architecture I - Class 121 Today’s class Operating System Machine Level.
12/13/99 Page 1 IRAM Network Interface Ioannis Mavroidis IRAM retreat January 12-14, 2000.
Page Replacement Algorithms
7/15/2015HY220: Ιάκωβος Μαυροειδής1 HY220 Schedulers.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Multicore Navigator: Queue Manager Subsystem (QMSS)
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
Virtual Memory.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CHP-4 QUEUE.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
6 Memory Management and Processor Management Management of Resources Measure of Effectiveness – On most modern computers, the operating system serves.
Ethernet Driver Changes for NET+OS V5.1. Design Changes Resides in bsp\devices\ethernet directory. Source code broken into more C files. Native driver.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CE Operating Systems Lecture 14 Memory management.
Multiprogramming. Readings r Silberschatz, Galvin, Gagne, “Operating System Concepts”, 8 th edition: Chapter 3.1, 3.2.
Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Intro  Scratchpad rings and queues.  First – In – Firs – Out (FIFO) data structure.  Rings are fixed-sized, circular FIFO.  Queues not fixed-size.
Block Design Review: Queue Manager and Scheduler Amy M. Freestone Sailesh Kumar.
Processor Architecture
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.
4330/6310 FIRST ASSIGNMENT Spring 2015 Jehan-François Pâris
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 11 : Priority and Per-Flow Queuing in Machine Problem 3 (Revision 2) Washington.
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 10 : MP3 Working Draft Washington University Fall 2002
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
WINLAB Open Cognitive Radio Platform Architecture v1.0 WINLAB – Rutgers University Date : July 27th 2009 Authors : Prasanthi Maddala,
Multiprogramming. Readings r Chapter 2.1 of the textbook.
Review Array Array Elements Accessing array elements
Queues.
Multilevel Memories (Improving performance using alittle “cash”)
Lectures Queues Chapter 8 of textbook 1. Concepts of queue
Design of a Diversified Router: Memory Usage
Superscalar Processors & VLIW Processors
CPU Scheduling Basic Concepts Scheduling Criteria
Chapter 6: CPU Scheduling
Flow Stats Module James Moscola September 6, 2007.
QM Performance Analysis
Design of a Diversified Router: Project Assignments and Status Updates
Design of a Diversified Router: Memory Usage
Yiannis Nikolakopoulos
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Process Scheduling Decide which process should run and for how long
SPP Version 1 Router QM Design
Process Description and Control
CS 3410, Spring 2014 Computer Science Cornell University
VIRTIO 1.1 FOR HARDWARE Rev2.0
Mr. M. D. Jamadar Assistant Professor
Presentation transcript:

Queue Manager and Scheduler on Intel IXP John DeHart Amy Freestone Fred Kuhns Sailesh Kumar

2 - Sailesh Kumar - 2/5/2016 Overview n Both QM and Scheduler runs on a single ME n Packet discard policy also runs here »There is a separate interface for discarded packets n Deficit round robin scheduling policy n Uses Q-array hardware and exploits LRU based eviction policy n Aggregated scheduling based architecture »Runs on a single thread »Designed to operate in batch mode to hide memory latency »Issues a batch of up to 8 memory requests at a time »Data structures are also designed to support batch mode

3 - Sailesh Kumar - 2/5/2016 Overall Queuing Subsystem n A set of parallel QM+SCH modules n Each set handles different sets of meta-links

4 - Sailesh Kumar - 2/5/2016 Queue Data Structure

5 - Sailesh Kumar - 2/5/2016 Queue Caching Structure The Q-array and Local memory data shown above are parallel data structures with the same index SRAM Local

6 - Sailesh Kumar - 2/5/2016 Scheduling Data Structure

7 - Sailesh Kumar - 2/5/2016 Enqueue and Dequeue Threads n Enqueue and dequeue runs on two threads. n Enqueue and Dequeue are synchronized using signals. n All data structures, like CAM, Q-array, Local memory space are shared between the enqueue and dequeue threads n Enqueue and dequeue processes consist of multiple phases »At the end of each phase, a batch of 8 commands are dispatched to SRAM/Q-array n Expectation is that, generation of the batch of 8 commands will take as much time as the SRAM read latency »Else MEs will have idle cycles

8 - Sailesh Kumar - 2/5/2016 Enqueue Process Enqueue process phase 1 1. Grab up to 8 requests from enqueue command FIFO 2. Filter out the queues whose tail is already present in the Q-array 3. For queues whose head is already cached, send a rd_qdesc_other command at the same Q-array entry 4. For other queues, eviction of the LRU entry from Q-array is needed. However, make sure that evict a queue entry from Q- array which is not being dequeued currently. 5. While doing the eviction, make sure that queue length is written back from local memory to SRAM. Also update CAM bits. 6. Send a rd_qdesc_tail command at this entry. 7. Read in the queue length/max length from SRAM into local memory at index = Q-array entry. 8. Update CAM (set bits of tail valid in the Q-array) 9. Switch to dequeue process until the Q-array and queue length in local memory are loaded

9 - Sailesh Kumar - 2/5/2016 Enqueue Process Enqueue process phase Check if packet is discarded. 11. If admitted, send an enqueue command to the Q-array for all queues for which an enqueue command was received 12. Subsequent actions are based upon four situations a. Queue is inactive (count = 0) b. Queue is active and presently being dequeued by the dequeue process (count > 0 and dequeue flag set in CAM) c. Queue is active but is not being dequeued d. Queue is active and packet is discarded

10 - Sailesh Kumar - 2/5/2016 Case I – Queue is inactive Note that the next pointer of the tail segment will be set to the free segment just allocated. Newly allocated segment tail is always kept in the local memory SRAM Read SRAM Write Here we assume that free segments in local memory are not available

11 - Sailesh Kumar - 2/5/2016 Case II – Active but not dequeued Do Nothing

12 - Sailesh Kumar - 2/5/2016 Case III – Active and being dequeued Update the queue length stored in the local memory indexed by the Q-array index, by adding the current packet’s length into it.

13 - Sailesh Kumar - 2/5/2016 Enqueue Process Enqueue process phase 2 continued. 13. Update CAM bits accordingly. 14. Switch to the dequeue process. 15. Start over again.

14 - Sailesh Kumar - 2/5/2016 Dequeue Process Dequeue process phase 1 1. Begin from the head segment of the active list. If head segment has some queues then start as follows. 2. Skip queues whose head descriptor is already cached in Q-array 3a. For queues, whose tail is already cached, send a rd_qdesc_other command at the same Q-array entry 3b. For other queues, eviction of the LRU entry from Q-array is needed. However, make sure that evict a queue entry from Q- array which is not being enqueued currently. 3c. While doing the eviction, make sure that queue length is also written back from local memory to SRAM. Also update CAM bits. 4a. Send a rd_qdesc_head command at this entry. This will supply the queue entry in active list with credit, and weight. 4b. Read in the queue length/max length from SRAM into local memory at index = Q-array entry. 5. Update CAM (set bits of head valid in Q-array) 6. Switch to enqueue process until the Q-array is loaded.

15 - Sailesh Kumar - 2/5/2016 Dequeue Process Dequeue process phase 2 8. Send up to 8 dequeue requests for the queues in head segment 9. Switch to enqueue process Dequeue process phase After dequeue are complete, send SRAM reads to read the packet lengths. 11. Switch to enqueue process

16 - Sailesh Kumar - 2/5/2016 Dequeue Process Dequeue process phase Once length of dequeued packets is known, update the credits. 13. Queues which becomes inactive, send a wr_qdesc_count command and evict from Q-array if it is not being enqueued. 14. Queues whose credit gets over, are moved from the head segment to the tail segment. If tail segment doesn’t have any space left, allocate a new segment from free list as described in the enqueue process. 15. Update the CAM bits. 16. If head segment becomes empty a. Add it to the tail of the free segment list. b. Read the next head of the active list into local memory. c. Switch to enqueue process. d. Go to step Else a. Switch to enqueue process. b. Go to step 8.

17 - Sailesh Kumar - 2/5/2016 Enqueue Dequeue synchronization Phase I : Send up to 8 SRAM read Queue descriptors (tail) into Q-array. Write back LRU entry from Q-array and write back the associated queue lengths. Special treatment for queues whose head is already cached. Update CAM entries. Phase II : Send enqueue command to the Q-array. Update the queue length in local memory indexed by the Q-array index. Update the scheduling data structure (may involve one SRAM read and one SRAM write) Phase I : Send up to 8 SRAM read to read Queue descriptor of queues at the head of active list. Write back LRU entries and queue lengths to SRAM. Update CAM entries. Phase II : Send up to 8 dequeue requests for the cached queues. Phase III : Read the length of the dequeued packets Phase IV : Update queue credits and the scheduling data structure. May involve 1 SRAM read and 1 SRAM write access. Enqueue Dequeue

18 - Sailesh Kumar - 2/5/2016 Life of a single active queue Free segment HeadTail0 Head 8, w, c, x... HeadTail Lets say 8 enqueue requests for an inactive queue, x arrives (weight w ) 1 Free list Active list Enqueue process SRAM Local

19 - Sailesh Kumar - 2/5/2016 Life of a single active queue Free segment HeadTail1 Head 3, w, c, x... HeadTail After enqueue, dequeue process sends 5 packets and credit is exhausted Free list Active list Dequeue process Credit is exhausted Allocate a free segment SRAM Local

20 - Sailesh Kumar - 2/5/2016 Life of a single active queue Free segment HeadTail1 Head 3, w, c, x Tail... HeadTail Put queue in the allocated tail segment Make head of the active list = the next head Move the now empty head segment to free segment pool Free segment

21 - Sailesh Kumar - 2/5/2016 Life of multiple active queues Free segment HeadTail100 x, y, z, w … … … Head a, w, c, x … … … Tail a, b, c, d … … …... Let’s say all queue’s credits are exhausted Free segment x, y, z, w … … … Free segment