Report from Netconf 2009 Jesper Dangaard Brouer

Slides:



Advertisements
Similar presentations
Office of Science U.S. Department of Energy Bassi/Power5 Architecture John Shalf NERSC Users Group Meeting Princeton Plasma Physics Laboratory June 2005.
Advertisements

High Speed Total Order for SAN infrastructure Tal Anker, Danny Dolev, Gregory Greenman, Ilya Shnaiderman School of Engineering and Computer Science The.
CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
OPERATING SYSTEMS Introduction
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Achieving 10 Gb/s Using Xen Para-virtualized.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
March 9, 2015 San Jose Compute Engineering Workshop.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Guangdeng Liao, Xia Zhu, Steen Larsen, Laxmi Bhuyan, Ram Huggahalli University of California, Riverside Intel Labs.
Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
PHP Performance w/APC + thaicyberpoint.com thaithinkpad.com thaihi5.com.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Leading in the compute.
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
Challenges and experiences with IPTV from a network point of view Challenges and experiences with IPTV from a network point of view by Jesper Dangaard.
10Gbit/s Bi-Directional Routing on standard hardware running Linux 10Gbit/s Bi-Directional Routing on standard hardware running Linux by Jesper Dangaard.
Status from the IPTV probe project Status from the IPTV probe project Bifrost Workshop 2010 d.27/ ComX Networks A/S by Jesper Dangaard Brouer Master.
Generic page-pool recycle facility?
Bifrost och 10Gbit routing Software Freedom Day /Stockholm Robert Olsson Uppsala Universitet och KTH.
Multiqueue & Linux Networking Robert Olsson UU/KTH.
Multiqueue Networking David S. Miller Red Hat Inc.
Status from Martin Josefsson Hardware level performance of Status from Martin Josefsson Hardware level performance of by Jesper Dangaard Brouer.
Lucas De Marchi sponsors: co-authors: Liria Matsumoto Sato
Open-source routing at 10Gb/s Olof Hagsand (KTH) Robert Olsson (Uppsala U) Bengt Görden (KTH) SNCNW May 2009 Project grants: Internetstiftelsen (IIS) Equipment:
Discussion subject: The missing conntrack garbage collector Netfilter Workshop 2011 d.23/ by Jesper Dangaard Brouer Master of Computer Science ComX.
Bifrost KTH/CSD course kick-off Fall 2010 Robert Olsson.
Memory vs. Networking LSF/MM-summit March, 2017 Jesper Dangaard Brouer
Ran Liu (Fudan Univ. Shanghai Jiaotong Univ.)
Ingress bandwidth shaping vs. Netfilter
Balazs Voneki CERN/EP/LHCb Online group
New Approach to OVS Datapath Performance
TLDK overview Konstantin Ananyev 05/08/2016.
NFV Compute Acceleration APIs and Evaluation
KTH/CSD course kick-off Summer 2010 Robert Olsson
CMSC 611: Advanced Computer Architecture
Kernel/Hardware for bifrost
XDP – eXpress Data Path Intro and future use-cases
Modifying libiptc: Making large iptables rulesets scale
XDP Infrastructure Development
Next steps for Linux Network stack approaching 100Gbit/s
Ideas for SLUB allocator
TX bulking and qdisc layer
CS111 Computer Programming
Concurrent Data Structures for Near-Memory Computing
Memory vs. Networking Provoking and fixing memory bottlenecks
Towards 10Gb/s open-source routing
Open Source 10g Talk at KTH/Kista
NUMA scaling issues in 10GbE NetConf 2009 PJ Waskiewicz Intel Corp.
Multi-PCIe socket network device
Slave cores Etherbone Accessible device Etherbone Accessible device E
KTH/CSD course kick-off Spring 2010 Robert Olsson
KTH/CSD course kick-off Fall 2009 Robert Olsson
CSE 153 Design of Operating Systems Winter 2018
NSH_SFC Performance Report FD.io NSH_SFC and CSIT Team
Boost Linux Performance with Enhancements from Oracle
Enabling the NVMe™ CMB and PMR Ecosystem
Get the best out of VPP and inter-VM communications.
Introduction to Makeflow and Work Queue
Xen Network I/O Performance Analysis and Opportunities for Improvement
DPDK Accelerated Load Balancer
Virtio/Vhost Status Quo and Near-term Plan
Fast Packet Processing In Linux with AF_XDP
Enrich your NIC's capabilities with DPDK Soft NIC
Fast Userspace OVS with AF_XDP
CSE 153 Design of Operating Systems Winter 2019
Presentation transcript:

Report from Netconf 2009 Jesper Dangaard Brouer <hawk@comx.dk> Bifrost Workshop 2010 d.27/1-2010 by Jesper Dangaard Brouer <hawk@comx.dk> Master of Computer Science Linux Kernel Developer ComX Networks A/S ComX Networks A/S

What is netconf Summit of the Linux network developers invitation-only main maintainers and developers of the Linux networking subsystem In 2009 Held in: USA, Oregon, Troutdale Dates: 18/9 to 20/9-2009 Patriot Part number: PVT36G1600LLK

How to get invited Know: David Stephen Miller Basics: Top1 comitter Top3 Sign-off'er Maintainer of netdev sparc IDE Postmaster Don't piss him off NHL: San Jose Sharks Don't send patches if they lost ;-) http://www.linuxfoundation.org/publications/whowriteslinux.pdf

Homepage http://vger.kernel.org/netconf2009.html first day, played golf (Linus didn't show up) Paul E. McKenney (RCU inventor) made really good summaries http://vger.kernel.org/netconf2009_notes_1.html http://vger.kernel.org/netconf2009_day2.html

Its all Roberts fault! Paul E. McKenney presentation on RCU Funny slide: "its all Robert Olssons fault" why we have RCU_bh() (show his slide)

Intel and 10GbE Performance peak of 5.7 Mpps (packets per sec) Intel,three optimizations, all needed, for perf boost. NUMA aware buffer allocation, buffer alignment (to cache lines) and removing of all references to shared vars in driver Performance peak of 5.7 Mpps (packets per sec) Nehalem CPUs 2x, 2.53 Ghz, DDR3-1066MHz 1x Dual Port Niantic/82599, 8 queue pairs per port Stock NUMA system (no optimization): peak at 1.5 Mpps NUMA hurt performance, without NUMA aware allocs My 10GbE test: peak of 3.8 Mpps single CPU Core i7-920 (DDR3-1666Mhz) http://vger.kernel.org/netconf2009_slides/netconf2009_numa_discussion.odp http://vger.kernel.org/netconf2009_slides/update.pdf Performance peak of 5.7 Mpps (packets per sec) CPUs 2x, 2.53 GHz DDR3, 1066MHz, 2 x 3 x 2GB per DIMM = 12 GB RHEL 5.1, Linux 2.6.30.1 SMP+NUMA+(SLUB) IxGbE Driver Version: 2.0.34.3 + ECG Optimizations 1x Dual Port Niantic, 8 queue pairs per port enabled L3 Forwarding with 8 Flows (1 flow per queue and per core)

Multiqueue Linux Network stack scales with number of CPUs Only for NIC with multi-hardware queues Almost all 10G NIC For 1GbE recommend Intel 82576 SW: Avoid locking and cache misses across CPUs TX path: DaveM's multiqueue TX qdisc hack now the default queue Each hardware queue (both RX and TX) has their own IRQ. multiqueue NIC, lot of IRQs. Try looking in /proc/interrupts

Traffic Control vs. Multiqueue Problem: Advanced Traffic Control Kills multiqueue scalability TX path will stop to scale and returns to single CPU scaling Possible solution by: Paul E. McKenny simply having "pools" per CPU (of e.g. tokens), only contact other CPUs when the local pool empty (idea taken from how NUMA mem manager works)

The End Better stop here Have not covered everything Look at the website: http://vger.kernel.org/netconf2009.html Thanks for listening even though I used too much time...