RHEL6 tuning guide for mellanox ethernet card.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Jim Chen ( 14/06/2011 Hytec Electronics Ltd.
Ethernet Over PCI Express Presented by Kallol Biswas
Anders Magnusson TCP Tuning and E2E Performance TREFpunkt - October 20, 2004.
TCP Performance over IPv6 Yoshinori Kitatsuji KDDI R&D Laboratories, Inc.
Floating Cloud Tiered Internet Architecture Current: Rochester Institute of Technology, Rensselaer Polytechnic Institute, University of Nevada, Reno Level.
1 Peripheral Component Interconnect (PCI). 2 PCI based System.
Digital Computer Fundamentals
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
Module 3 Configuring Hardware on a Computer Running Windows XP Professional.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
1 Operating Systems Input/Output Management. 2 What is the I/O System A collection of devices that different sub- systems of a computer use to communicate.
The Journey of a Packet Through the Linux Network Stack
Copyright © 2014 EMC Corporation. All Rights Reserved. Linux Host Installation and Integration for Block Upon completion of this module, you should be.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 3 Configuring the Windows Server 2008 Environment.
Diagnostics. Module Objectives By the end of this module participants will be able to: Use diagnostic commands to troubleshoot and monitor performance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 2: Managing Hardware Devices.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 2: Managing Hardware Devices.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
Hands-On Microsoft Windows Server 2003 Networking Chapter 1 Windows Server 2003 Networking Overview.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
Module 12 MXL DCB <Place supporting graphic here>
Hands-On Microsoft Windows Server 2008
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 2: Managing Hardware Devices.
Day 4 Understanding Hardware Partitions Linux Boot Sequence.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
VSP1999 esxtop for Advanced Users Name, Title, Company.
© 2014 VMware Inc. All rights reserved My Slides from VMware vSphere: Optimize and Scale.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Stephen Berard Program Manager Windows Platform Architecture Team.
10GE network tests with UDP
Srihari Makineni & Ravi Iyer Communications Technology Lab
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 3 v3.0 Module 8 Virtual LANs.
Embedded System Lab. 김해천 The TURBO Diaries: Application-controlled Frequency Scaling Explained.
January 10, Kits Workshop 1 Washington WASHINGTON UNIVERSITY IN ST LOUIS A Smart Port Card Tutorial --- Software John DeHart Washington University.
Versatile Low Power Media Access for Wireless Sensor Networks Sarat Chandra Subramaniam.
Interrupt driven I/O. MIPS RISC Exception Mechanism The processor operates in The processor operates in user mode user mode kernel mode kernel mode Access.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
GPRS functionality overview in Horner OCS. GPRS functionality – Peer to Peer communication over GPRS – CSCAPE connectivity over GPRS – Data exchange using.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 3 v3.1 Module 6 Switch Configuration.
Managing Hardware Devices Facilitator: Suleiman Mohammed(mcpn, mncs) Institute of Computing & ICT, Ahmadu Bello University, Zaria.
User-Space-to-Kernel Interface
Quick guide to ASIMON configuration For version 3.0 or greater SAFETY AT WORK Date: 3/18/2009.
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
Ladebug Kernel Debugging Tutorial Bob Lidral. Introduction Kinds of kernel debugging How to use Ladebug for kernel debugging Not how to debug a kernel.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
Interrupts and Interrupt Handling David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
1 Setup and Compile Linux Kernel Speaker: Yi-Ji Jheng Date:
Matthew Locke November 2007 A Linux Power Management Architecture.
LISA Linux Switching Appliance Radu Rendec Ioan Nicu Octavian Purdila Universitatea Politehnica Bucuresti 5 th RoEduNet International Conference.
InterVLAN Routing 1. InterVLAN Routing 2. Multilayer Switching.
Multiprocessing.
Chapter Objectives In this chapter, you will learn:
Data Transfer Node Performance GÉANT & AENEAS
Open Source 10g Talk at KTH/Kista
CS 286 Computer Organization and Architecture
Multi-PCIe socket network device
Intel’s Core i7 Processor
IRQ, DMA and I/O Ports - Introduction -
AT91RM9200 Boot strategies This training module describes the boot strategies on the AT91RM9200 including the internal Boot ROM and the U-Boot program.
Get the best out of VPP and inter-VM communications.
SUPPORTING MULTIMEDIA COMMUNICATION OVER A GIGABIT ETHERNET NETWORK
Boot Process Mark Stanovich COP 5641 / CIS 4930.
Presentation transcript:

RHEL6 tuning guide for mellanox ethernet card. 2012년 10월 유니원아이앤씨주식회사

IBM X3650 M4 Performance Tuning 1.1 X3650 BIOS Setting. 구분 항목 변경값 General Power Profile/Operating Modes Max Performance Processor C-States Disabled Turbo mode Enabled /Performance Optimized Hyper-Threading CPU frequency select Max performance C-states limit : C값에 따라 전력 소모를 제어 하는 것 을 말합니다. C값이 0일 경우 가장많은 전력을 소모하며 지속적으로 working 상태 입니다. C값이 높을수록 전력소모는 낮아지며, sleep time 상태로 되며 working 상태로 돌아오기 까지 많은 시간이 소요됩니다. Disable 상태일 경우, 전력 소모를 제어하지 않습니다. Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.2 X3650 BIOS Setting. 구분 항목 변경값 Memory Memory speed Max performance Memory channel mode Independent Socket Interleaving NUMA / Disabled Memory Node Interleaving OFF Patrol Scrubbing Disabled Demand Scrubbing Enabled Thermal Mode Performance Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.3 RHEL6 OS tuning.(Networking) 구분 항목 변경값 Network Disable the TCP timestamps sysctl -w net.ipv4.tcp_timestamps=0 Disable the TCP selective acks sysctl -w net.ipv4.tcp_sack=0 processor input queues sysctl -w net.core.netdev_max_backlog=250000 TCP buffer sizes using setsockopt(): sysctl -w net.core.rmem_max=16777216 sysctl -w net.core.wmem_max=16777216 sysctl -w net.core.rmem_default=16777216 sysctl -w net.core.wmem_default=16777216 sysctl -w net.core.optmem_max=16777216 Increase memory thresholds to prevent packet dropping sysctl -w net.ipv4.tcp_mem="16777216 16777216 16777216" auto-tuning of TCP buffer limits sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216" Low latency mode for TCP Rebooting 후에도 변경 값을 적용하기 위해 /etc/sysctl.conf 파일을 수정합니다. ex) net.ipv4.tcp_timestamps = 0 #해당 항목 추가 Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.4 RHEL6 OS tuning.(Power management) Check that the output CPU frequency for each core is equal to the maximum supported and that all core frequencies are consistent. Check the maximum supported CPU frequency using: #cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq Check that core frequencies are consistent using: #cat /proc/cpuinfo | grep "cpu MHz" Check that the output frequencies are the same as the maximum supported. If the CPU frequency is not at the maximum, check the BIOS settings according to table in Recommended BIOS Settings. Check the current CPU frequency to check whether echo performance is implemented using: #cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq OS에서 CPU frequency 현재 값과 MAX값을 비교합니다. 만약 현재 값이 MAX값과 다를 경우, BIOS에서 CPU frequency 값을 변경합니다. Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.5 RHEL6 OS tuning.(Kernel Idle Loop Tuning ) The mlx4_en kernel module has an optional parameter that can tune the kernel idle loop for better latency. This will improve the CPU wakeup time but may result in higher power consumption. To tune the kernel idle loop, set the following options in /etc/modprobe.d/mlx4.conf: options mlx4_en enable_sys_tune=1 CPU wakeup time 성능이 향상됩니다. 하지만 전력 소모량이 증가합니다. Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.6 RHEL6 OS tuning.(OS Controlled Power Management ) Some operating systems can override BIOS power management configuration and enable c-states by default, which results in a higher latency. To resolve the high latency issue, please follow the instructions below: 1. Edit the /boot/grub/grub.conf file or any other bootloader configuration file. 2. Add the following kernel parameters to the bootloader command. intel_idle.max_cstate=0 processor.max_cstate=1 3. Reboot the system. Ex) title RH6.2x64 root (hd0,0) kernel /vmlinuz-RH6.2x64-2.6.32-220.el6.x86_64 root=UUID=817c207b-c0e8-4ed9-9c33-c589c0bb566f console=tty0 console=ttyS0,115200n8 rhgb intel_idle.max_cstate=0 processor.max_cstate=1 OS의 Power management setting이 BIOS의 setting 값보다 우선시 되는 경우가 있습니다. 그럴 경우에, /boot/grub/grub.conf 파일을 수정해서 적용을 해야 합니다. Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.6 RHEL6 OS tuning.(Interrupt Moderation ) Interrupt moderation is used to decrease the frequency of network adapter interrupts to the CPU. Mellanox network adapters use an adaptive interrupt moderation algorithm by default. The algorithm checks the transmission (Tx) and receive (Rx) packet rates and modifies the Rx interrupt moderation settings accordingly. To manually set Tx and/or Rx interrupt moderation, use the ethtool utility. For example, the following commands first show the current (default) setting of interrupt moderation on the interface eth1, then turns off Rx interrupt moderation, and last shows the new setting. # ethtool -c eth1 Coalesce parameters for eth1: Adaptive RX: on TX: off ... pkt-rate-low: 400000 pkt-rate-high: 450000 rx-usecs: 16 rx-frames: 88 rx-usecs-irq: 0 rx-frames-irq: 0 Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.6.1 RHEL6 OS tuning.(Interrupt Moderation ) ethtool -C eth1 adaptive-rx off rx-usecs 0 rx-frames 0 #ethtool -c eth1 Coalesce parameters for eth1: Adaptive RX: off TX: off ... pkt-rate-low: 400000 pkt-rate-high: 450000 rx-usecs: 0 rx-frames: 0 rx-usecs-irq: 0 rx-frames-irq: 0 Interrupt Moderation 은 CPU에 네트워크 어댑터의 인터럽트 빈도를 감소하는데 사용됩니다. 수동으로 TX / RX Interrupt Moderation 을 설정하려면 ethtool 커맨드를 사용합니다. Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.7 RHEL6 OS tuning.(Tuning for NUMA Architecture ) Tuning for Intel® Microarchitecture Code name Sandy Bridge The Intel Sandy Bridge processor has an integrated PCI express controller. Thus every PCIe adapter OS is connected directly to a NUMA node. On a system with more than one NUMA node, performance will be better when using the local NUMA node to which the PCIe adapter is connected. In order to identify which NUMA node is the adapter's node the system BIOS should support the proper ACPI feature. to see if your system supports PCIe adapter's NUMA node detection run the following command: # cat /sys/devices/[PCI root]/[PCIe function]/numa_node Or # cat /sys/class/net/[interface]/device/numa_node Example for supported system: # cat /sys/devices/pci0000\:00/0000\:00\:05.0/numa_node Example for unsupported system: -1 Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.7 RHEL6 OS tuning.(IRQ Affinity ) The affinity of an interrupt is defined as the set of processor cores that service that interrupt. To improve application scalability and latency, it is recommended to distribute interrupt requests (IRQs) between the available processor cores. To prevent the Linux IRQ balancer application from interfering with the interrupt affinity scheme, the IRQ balancer must be turned off. The following command turns off the IRQ balancer: > /etc/init.d/irqbalance stop The following command assigns the affinity of a single interrupt vector: > echo <hexadecimal bit mask> > /proc/irq/<irq vector>/smp_affinity where bit i in <hexadecimal bit mask> indicates whether processor core i is in <irq vector>’s affinity or not. Application의 scalability 와 latency의 향상을 위해서 IRQ가 가용한 processor core들에게 분산되는 것을 권장합니다. LINUX IRQ balancer 와 Interrupt affinity scheme 의 충돌을 방지하기 위해서, IRQ balancer는 반드시 turn off 상태여야 합니다. Copyright 2007 FUJITSU LIMITED

IBM X3650 M4 Performance Tuning 1.7.1 RHEL6 OS tuning.(IRQ Affinity Configuration ) For Intel Sandy Bridge systems set the irq affinity to the adapter's NUMA node:  For optimizing single-port traffic, run: # set_irq_affinity_bynode.sh <numa node> <interface> For optimizing dual-port traffic, run: # set_irq_affinity_bynode.sh <numa node> <interface1> <interface2> To show the current irq affinity settings, run: # show_irq_affinity.sh <interface> 위의 스크립트는 Mellanox 웹 사이트(www.mellanox.com )에서 다운로드 가능합니다. Copyright 2007 FUJITSU LIMITED