Report from Netconf 2009 Jesper Dangaard Brouer

Slides:

Advertisements

Similar presentations

Office of Science U.S. Department of Energy Bassi/Power5 Architecture John Shalf NERSC Users Group Meeting Princeton Plasma Physics Laboratory June 2005.

Advertisements

High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.

CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.

Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.

OPERATING SYSTEMS Introduction

Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Achieving 10 Gb/s Using Xen Para-virtualized.

MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.

March 9, 2015 San Jose Compute Engineering Workshop.

Srihari Makineni & Ravi Iyer Communications Technology Lab

Guangdeng Liao, Xia Zhu, Steen Larsen, Laxmi Bhuyan, Ram Huggahalli University of California, Riverside Intel Labs.

Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.

Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.

PHP Performance w/APC + thaicyberpoint.com thaithinkpad.com thaihi5.com.

PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Leading in the compute.

Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.

L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,

Remigius K Mommsen Fermilab CMS Run 2 Event Building.

Challenges and experiences with IPTV from a network point of view Challenges and experiences with IPTV from a network point of view by Jesper Dangaard.

10Gbit/s Bi-Directional Routing on standard hardware running Linux 10Gbit/s Bi-Directional Routing on standard hardware running Linux by Jesper Dangaard.

Status from the IPTV probe project Status from the IPTV probe project Bifrost Workshop 2010 d.27/ ComX Networks A/S by Jesper Dangaard Brouer Master.

Generic page-pool recycle facility?

Bifrost och 10Gbit routing Software Freedom Day /Stockholm Robert Olsson Uppsala Universitet och KTH.

Multiqueue & Linux Networking Robert Olsson UU/KTH.

Multiqueue Networking David S. Miller Red Hat Inc.

Status from Martin Josefsson Hardware level performance of Status from Martin Josefsson Hardware level performance of by Jesper Dangaard Brouer.

Lucas De Marchi sponsors: co-authors: Liria Matsumoto Sato

Open-source routing at 10Gb/s Olof Hagsand (KTH) Robert Olsson (Uppsala U) Bengt Görden (KTH) SNCNW May 2009 Project grants: Internetstiftelsen (IIS) Equipment:

Discussion subject: The missing conntrack garbage collector Netfilter Workshop 2011 d.23/ by Jesper Dangaard Brouer Master of Computer Science ComX.

Bifrost KTH/CSD course kick-off Fall 2010 Robert Olsson.

Memory vs. Networking LSF/MM-summit March, 2017 Jesper Dangaard Brouer

Ran Liu (Fudan Univ. Shanghai Jiaotong Univ.)

Ingress bandwidth shaping vs. Netfilter

Balazs Voneki CERN/EP/LHCb Online group

New Approach to OVS Datapath Performance

TLDK overview Konstantin Ananyev 05/08/2016.

NFV Compute Acceleration APIs and Evaluation

KTH/CSD course kick-off Summer 2010 Robert Olsson

CMSC 611: Advanced Computer Architecture

Kernel/Hardware for bifrost

XDP – eXpress Data Path Intro and future use-cases

Modifying libiptc: Making large iptables rulesets scale

XDP Infrastructure Development

Next steps for Linux Network stack approaching 100Gbit/s

Ideas for SLUB allocator

TX bulking and qdisc layer

CS111 Computer Programming

Concurrent Data Structures for Near-Memory Computing

Memory vs. Networking Provoking and fixing memory bottlenecks

Towards 10Gb/s open-source routing

Open Source 10g Talk at KTH/Kista

NUMA scaling issues in 10GbE NetConf 2009 PJ Waskiewicz Intel Corp.

Multi-PCIe socket network device

Slave cores Etherbone Accessible device Etherbone Accessible device E

KTH/CSD course kick-off Spring 2010 Robert Olsson

KTH/CSD course kick-off Fall 2009 Robert Olsson

CSE 153 Design of Operating Systems Winter 2018

NSH_SFC Performance Report FD.io NSH_SFC and CSIT Team

Boost Linux Performance with Enhancements from Oracle

Enabling the NVMe™ CMB and PMR Ecosystem

Get the best out of VPP and inter-VM communications.

Introduction to Makeflow and Work Queue

Xen Network I/O Performance Analysis and Opportunities for Improvement

DPDK Accelerated Load Balancer

Virtio/Vhost Status Quo and Near-term Plan

Fast Packet Processing In Linux with AF_XDP

Enrich your NIC's capabilities with DPDK Soft NIC

Fast Userspace OVS with AF_XDP

CSE 153 Design of Operating Systems Winter 2019

Presentation transcript:

Report from Netconf 2009 Jesper Dangaard Brouer <hawk@comx.dk> Bifrost Workshop 2010 d.27/1-2010 by Jesper Dangaard Brouer <hawk@comx.dk> Master of Computer Science Linux Kernel Developer ComX Networks A/S ComX Networks A/S

What is netconf Summit of the Linux network developers invitation-only main maintainers and developers of the Linux networking subsystem In 2009 Held in: USA, Oregon, Troutdale Dates: 18/9 to 20/9-2009 Patriot Part number: PVT36G1600LLK

How to get invited Know: David Stephen Miller Basics: Top1 comitter Top3 Sign-off'er Maintainer of netdev sparc IDE Postmaster Don't piss him off NHL: San Jose Sharks Don't send patches if they lost ;-) http://www.linuxfoundation.org/publications/whowriteslinux.pdf

Homepage http://vger.kernel.org/netconf2009.html first day, played golf (Linus didn't show up) Paul E. McKenney (RCU inventor) made really good summaries http://vger.kernel.org/netconf2009_notes_1.html http://vger.kernel.org/netconf2009_day2.html

Its all Roberts fault! Paul E. McKenney presentation on RCU Funny slide: "its all Robert Olssons fault" why we have RCU_bh() (show his slide)

Intel and 10GbE Performance peak of 5.7 Mpps (packets per sec) Intel,three optimizations, all needed, for perf boost. NUMA aware buffer allocation, buffer alignment (to cache lines) and removing of all references to shared vars in driver Performance peak of 5.7 Mpps (packets per sec) Nehalem CPUs 2x, 2.53 Ghz, DDR3-1066MHz 1x Dual Port Niantic/82599, 8 queue pairs per port Stock NUMA system (no optimization): peak at 1.5 Mpps NUMA hurt performance, without NUMA aware allocs My 10GbE test: peak of 3.8 Mpps single CPU Core i7-920 (DDR3-1666Mhz) http://vger.kernel.org/netconf2009_slides/netconf2009_numa_discussion.odp http://vger.kernel.org/netconf2009_slides/update.pdf Performance peak of 5.7 Mpps (packets per sec) CPUs 2x, 2.53 GHz DDR3, 1066MHz, 2 x 3 x 2GB per DIMM = 12 GB RHEL 5.1, Linux 2.6.30.1 SMP+NUMA+(SLUB) IxGbE Driver Version: 2.0.34.3 + ECG Optimizations 1x Dual Port Niantic, 8 queue pairs per port enabled L3 Forwarding with 8 Flows (1 flow per queue and per core)

Multiqueue Linux Network stack scales with number of CPUs Only for NIC with multi-hardware queues Almost all 10G NIC For 1GbE recommend Intel 82576 SW: Avoid locking and cache misses across CPUs TX path: DaveM's multiqueue TX qdisc hack now the default queue Each hardware queue (both RX and TX) has their own IRQ. multiqueue NIC, lot of IRQs. Try looking in /proc/interrupts

Traffic Control vs. Multiqueue Problem: Advanced Traffic Control Kills multiqueue scalability TX path will stop to scale and returns to single CPU scaling Possible solution by: Paul E. McKenny simply having "pools" per CPU (of e.g. tokens), only contact other CPUs when the local pool empty (idea taken from how NUMA mem manager works)

The End Better stop here Have not covered everything Look at the website: http://vger.kernel.org/netconf2009.html Thanks for listening even though I used too much time...