Honnappa Nagarahalli Principal Software Engineer Arm

Slides:



Advertisements
Similar presentations
USERSPACE I/O Reporter: R 張凱富.
Advertisements

Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Module R2 Overview. Process queues As processes enter the system and transition from state to state, they are stored queues. There may be many different.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
CS 206 Introduction to Computer Science II 10 / 20 / 2008 Instructor: Michael Eckmann.
Real-time Systems Lab, Computer Science and Engineering, ASU Linux Input Systems (ESP – Fall 2014) Computer Science & Engineering Department Arizona State.
Ethernet Driver Changes for NET+OS V5.1. Design Changes Resides in bsp\devices\ethernet directory. Source code broken into more C files. Native driver.
Optimised Memory Transfer & Flow Control for High Speed Networks - Codito Technologies Pvt. Ltd. - D Y Patil College of Engineering.
Queue Manager and Scheduler on Intel IXP John DeHart Amy Freestone Fred Kuhns Sailesh Kumar.
KeyStone SoC Training SRIO Demo: Board-to-Board Multicore Application Team.
Learners Support Publications Constructors and Destructors.
Lecture 1 Data Structures Aamir Zia. Introduction Course outline Rules and regulations Course contents Good Programming Practices Data Types and Data.
Software Reuse. Objectives l To explain the benefits of software reuse and some reuse problems l To discuss several different ways to implement software.
Event Sources and Realtime Actions
FPGA Support in the upstream kernel Alan Tull
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Linux Device Model A device model after 2.5
Memory – Caching: Writes
Constructors and Destructors
NFV Compute Acceleration APIs and Evaluation
Exposing Link-Change Events to Applications
Linux Kernel Development - Robert Love
Processes and threads.
GPU Computing CIS-543 Lecture 10: Streams and Events
Data Structures and Algorithms
Multi Threading.
Sockets and Beginning Network Programming
University of Central Florida COP 3330 Object Oriented Programming
Lesson Objectives Aims Key Words
Atomic Operations in Hardware
EECE 315: Operating Systems
Atomic Operations in Hardware
CSE451 I/O Systems and the Full I/O Path Autumn 2002
Texas Instruments TDA2x and Vision SDK
Parallel Algorithm Design
University of Central Florida COP 3330 Object Oriented Programming
CS203 – Advanced Computer Architecture
Software Product Lines
Threads and Cooperation
Virtual Private Servers – Types of Virtualization platforms Virtual Private ServersVirtual Private Servers, popularly known as VPS is considered one of.
Top Reasons to Choose Angular. Angular is well known for developing robust and adaptable Single Page Applications (SPA). The Application structure is.
Introduction to cosynthesis Rabi Mahapatra CSCE617
Circular Buffers, Linked Lists
Automating Profitable Growth™
Virtio Keith Wiles July 11, 2016.
Synchronization Issues
Chapter 2: The Linux System Part 3
Producer-Consumer Problem
Constructors and Destructors
Information Technology
Computer communications
Classes and Objects.
Threads Chapter 4.
Deflate your Data with DPDK
Top Half / Bottom Half Processing
Interrupts Hardware Software.
Automating Profitable Growth™
Automating Profitable Growth
Operating Systems: A Modern Perspective, Chapter 6
Software interoperability in the NGN Service layer
Outline Chapter 2 (cont) Chapter 3: Processes Virtual machines
GPU Scheduling on the NVIDIA TX2:
Implementation of page-replacement algorithms and Belady’s anomaly
Operating Systems Concepts
Unsolicited Block ACK Extension
DPACC API Guidelines 2019/10/12.
Update Summary of DPACC docs
Figure 3-2 VIM-NFVI acceleration management architecture
Honnappa Nagarahalli Arm
Presentation transcript:

Honnappa Nagarahalli Principal Software Engineer Arm A case for Queue APIs Honnappa Nagarahalli Principal Software Engineer Arm

Agenda Motivation Queue APIs/driver design APIs Problem we are trying to solve – Why do we need Queue APIs and driver framework? How do Queue APIs and driver framework solve the problem If time permits – discuss the RFC. RFC will be on the mailing list to review further. This discussion will touch upon thinking behind the APIs and few important features/fields.

Motivation DPDK supports Pipeline Model Uses rte_ring to exchange packets between cores rte_event_ring APIs - are backed with software implementation Even on a platform that supports HW queues, SW implementation is used Separation of the APIs and implementation is required DPDK supports pipeline model of packet processing. In this model, different parts of packet processing logic runs on dedicated cores. Each core, after processing the packets, hands over the packets to the next core/processing block in the pipeline. Many applications exist today that make use of this model. DPDK documentation clearly describes some of the advantages of this model. This model uses rte_ring to exchange packets between the cores. rte_ring is a software implementation of queue. DPDK also have rte_event_ring APIs as a means to exchange events between cores. However, these APIs are again tied to the software implementation (rte_ring) Essentially, rte_ring is the only way to do core-core communication currently in the pipeline model. Even when these applications are run on SoCs with queues implemented in hardware, they still continue to use rte_ring, instead of using hardware implementations. Hence the APIs and implementation need to be separated.

Queue APIs and Driver Framework Follow already established API and driver model Allows for separation of APIs and devices/drivers Allows for choosing the device depending on the platform Device is represented as a ‘Queue Manager’ for allocation/freeing of queues After queue allocation – Enqueue and Dequeue operations can happen on the queue Separate the APIs and queue implementation using the well established API and driver model used by other devices like crypto. This will allow implementation of multiple queue drivers which can be enabled depending on the underlying platform. The device itself is represented as a ‘Queue Manager’ which can allocate and free the queues. Once the queues are allocated, the queue handle can be used to enqueue and dequeue objects.

APIs Should support application portability DPDK application should be portable across different platforms Should address the needs of both software and hardware implementations struct rte_queue *rte_queue_create(struct rte_queue_ctx *instance, const char *name, unsigned int count, int socket_id, unsigned int flags); void rte_queue_free(struct rte_queue_ctx *instance, struct rte_queue *q) struct rte_queue { union { void *private_data; /**< Queue implementation pvt data */ uintptr_t queue_handle; /**< Queue handle */ } }; Key to making the applications portable. RTE_QUEUE_SP_ENQ /**< single-producer */ RTE_QUEUE_SC_DEQ /**< single-consumer */ RTE_QUEUE_NON_BLOCKING /**< non-blocking */ struct rte_queue_ctx { void *device; /**< Queue device attached */ const struct rte_queue_ops *ops; /**< Queue ops for the device */ }; Application portability is important. Customers would like to have a single application that can run across different platforms. Platforms may use software algorithms or hardware implementations. To guarantee seamless portability, where possible, it is important to make sure different implementations behave the same. If not, the characteristics of the implementations should be brought out clearly in the APIs. This will help the application writer aware of differing characteristics and help choose the right characteristics required. The APIs for Queue address the needs of both software and hardware implementations. Currently, I have added just the basic APIs.

APIs unsigned int rte_queue_enqueue_burst(struct rte_queue_ctx *instance, struct rte_queue *q, void * const *obj_table, unsigned int n) unsigned int rte_queue_dequeue_burst(struct rte_queue_ctx *instance, struct rte_queue *q, void * const *obj_table, unsigned int n) unsigned int rte_queue_get_size(struct rte_queue_ctx *instance, const struct rte_queue *q); unsigned int rte_queue_get_capacity(struct rte_queue_ctx *instance, const struct rte_queue *q); Removed ‘free_space’ Removed ‘available’

Q & A