OS Tuning Database Tuning - Spring 2015. Operating System Software that provide services to applications and abstracts hardware resources: – Processing.

Slides:



Advertisements
Similar presentations
Tuning the Dennis Shasha and Philippe Bonnet, 2013.
Advertisements

Virtualisation From the Bottom Up From storage to application.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
G Robert Grimm New York University Disco.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
OS and Hardware Tuning. Tuning Considerations Hardware  Storage subsystem Configuring the disk array Using the controller cache  Components upgrades.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
European Organization for Nuclear Research Virtualization Review and Discussion Omer Khalid 17 th June 2010.
Virtualization for Cloud Computing
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Virtualization A way To Begin with Virtual Reality… - Rahul Khanwani.
E Virtual Machines Lecture 4 Device Virtualization
1 Some Context for This Session…  Performance historically a concern for virtualized applications  By 2009, VMware (through vSphere) and hardware vendors.
1 CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska.
Tanenbaum 8.3 See references
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
CS 149: Operating Systems April 21 Class Meeting
Computer System Architectures Computer System Software
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.
Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 2 (Part II) Operating System Overview.
Processes Introduction to Operating Systems: Module 3.
Consolidation and Optimization Best Practices: SQL Server 2008 and Hyper-V Dandy Weyn | Microsoft Corp. Antwerp, March
Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413.
Introduction to virtualization
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Full and Para Virtualization
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Troubleshooting Dennis Shasha and Philippe Bonnet, 2013.
Introduction to Operating Systems Concepts
Virtualization for Cloud Computing
Virtualization.
Virtual Machine Monitors
Xen and the Art of Virtualization
Virtualization overview
Software Architecture in Practice
OS Virtualization.
Operating Systems : Overview
Operating Systems : Overview
Page Replacement.
CPSC 457 Operating Systems
Virtualization Techniques
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Virtualization Dr. S. R. Ahmed.
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Presentation transcript:

OS Tuning Database Tuning - Spring 2015

Operating System Software that provide services to applications and abstracts hardware resources: – Processing – Memory – Communication Micro-kernel vs. monolithic kernel – e.g., L4 vs. Linux – “The real issue, and it's really fundamental, is the issue of sharing address spaces. Nothing else really matters. Everything else ends up flowing from that fundamental question: do you share the address space with the caller, or put in slightly different terms: can the callee look at and change the callers state as if it were its own (and the other way around)?” Linus TorvaldLinus Torvald

Virtual Machine Hypervisor (or Virtual Machine Monitor) presents guest OS with interfaces that abstract the underlying HW – Same HW resources can be shared by several guest OS – Native hypervisors vs. hosted hypervisors Processor virtualization techniques: – Paravirtualization: Virtual interfaces are identical, but not similar to underlying HW interfaces Guest OS is ported to paravirtualized interface Ex: Xen, L4 – Full virtualization: Virtual interfaces are identical to underlying HW interfaces Guest OS runs unchanged on native interfaces Binary translation of application requests Ex: VMWare Fusion, Virtual Box, Parallels, Hyper-V – Hardware assisted virtualization: Full virtualization that relies on processor features to support hypervisor operations (and minimize CPU overhead) IO virtualization technique – Routing of IO requests/responses – IOMMU virtualization: AMD Vi, Intel VT-d – SRIOV: support for PCIe devices virtualization

Tuning the Guts Virtual Storage RAID levels Controller cache Partitioning Priorities Number of DBMS threads Processor/interrupt affinity Throwing hardware at a Dennis Shasha and Philippe Bonnet, 2013

Virtual Storage No Host Caching. – The database system expects that the IO it submits are transferred to disk as directly as possible. It is thus critical to disable file system caching on the host for the IOs submitted by the virtual machine. Hardware supported IO Virtualization. – A range of modern CPUs incorporate components that speed up access to IO devices through direct memory access and interrupt remapping (i.e., Intel’s VT-d and AMD’s AMD-vi)). This should be activated. Statically allocated disk space – Static allocation avoids overhead when the database grows and favors contiguity in the logical address space (good for Dennis Shasha and Philippe Bonnet, 2013

RAID Levels Log File – RAID 1 is appropriate Fault tolerance with high write throughput. Writes are synchronous and sequential. No benefits in striping. Temporary Files – RAID 0 is appropriate – No fault tolerance. High throughput. Data and Index Files – RAID 5 is best suited for read intensive apps. – RAID 10 is best suited for write intensive apps.

Controller Cache Read-ahead: – Prefetching at the disk controller level. – No information on access pattern. – Not recommended. Write-back vs. write through: – Write back: transfer terminated as soon as data is written to cache. Batteries to guarantee write back in case of power failure Fast cache flushing is a priority – Write through: transfer terminated as soon as data is written to disk.

Partitioning There is parallelism (i) across servers, and (ii) within a server both at the CPU level and throughout the IO stack. To leverage this parallelism – Rely on multiple instances/multiple partitions per instance A single database is split across several instances. Different partitions can be allocated to different CPUs (partition servers) / Disks (partition). Problem#1: How to control overall resource usage across instances/partitions? – Fix: static mapping vs. Instance caging – Control the number/priority of threads spawned by a DBMS instance Problem#2: How to manage priorities? Problem#3: How to map threads onto the available cores – Fix: processor/interrupt Dennis Shasha and Philippe Bonnet, 2013

Instance Caging Allocating a number of CPU (core) or a percentage of the available IO bandwidth to a given DBMS Instance Two policies: – Partitioning: the total number of CPUs is partitioned across all instances – Over-provisioning: more than the total number of CPUs is allocated to all Dennis Shasha and Philippe Bonnet, 2013 LOOK UP: Instance CagingInstance Caging # Cores Instance A (2 CPU) Instance A (2 CPU) Instance B (2 CPU) Instance B (2 CPU) Instance C (1 CPU) Instance C (1 CPU) Instance D (1 CPU) Instance D (1 CPU) Max #core PartitioningOver-provisioning Instance A (2 CPU) Instance A (2 CPU) Instance B (3 CPU) Instance B (3 CPU) Instance C (2 CPU) Instance C (2 CPU) Instance C (1 CPU) Instance C (1 CPU)

Number of DBMS Threads Given the DBMS process architecture – How many threads should be defined for Query agents (max per query, max per instance) – Multiprogramming level (see tuning the writes and index tuning) Log flusher – See tuning the writes Page cleaners – See tuning the writes Prefetcher – See index tuning Deadlock detection – See lock tuning Fix the number of DBMS threads based on the number of cores available at HW/VM level Partitioning vs. Over-provisioning Provisioning for monitoring, back-up, expensive stored Dennis Shasha and Philippe Bonnet, 2013

Priorities Mainframe OS have allowed to configure thread priority as well as IO priority for some time. Now it is possible to set IO priorities on Linux as well: – Threads associated to synchronous IOs (writes to the log, page cleaning under memory pressure, query agent reads) should have higher priorities than threads associated to asynchronous IOs (prefetching, page cleaner with no memory pressure) – see tuning the writes and index tuning – Synchronous IOs should have higher priority than asynchronous Dennis Shasha and Philippe Bonnet, 2013 LOOK UP: Getting Priorities Straight, Linux IO prioritiesGetting Priorities StraightLinux IO priorities

The Priority Inversion Dennis Shasha and Philippe Bonnet, 2013 lock X request X Priority #1 T3 T1 T2 Priority #2 Priority #3 running waiting Transaction states Three transactions: T1, T2, T3 in priority order (high to low) – T3 obtains lock on x and is preempted – T1 blocks on x lock, so is descheduled – T2 does not access x and runs for a long time Net effect: T1 waits for T2 Solution: – No thread priority – Priority inheritance

Processor/Interrupt Affinity Mapping of thread context or interrupt to a given core – Allows cache line sharing between application threads or between application thread and interrupt (or even RAM sharing in NUMA) – Avoid dispatch of all IO interrupts to core 0 (which then dispatches software interrupts to the other cores) – Should be combined with VM options – Specially important in NUMA context – Affinity policy set at OS level or DBMS Dennis Shasha and Philippe Bonnet, 2013 LOOK UP: Linux CPU tuningLinux CPU tuning

Processor/Interrupt Affinity IOs should complete on the core that issued them – I/O affinity in SQL server – Log writers distributed across all NUMA nodes Locking of a shared data structure across cores, and specially across NUMA nodes – Avoid multiprogramming for query agents that modify data – Query agents should be on the same NUMA node DBMS have pre-set NUMA affinity Dennis Shasha and Philippe Bonnet, 2013 LOOK UP: Oracle and NUMA, SQL Server and NUMAOracle and NUMASQL Server and NUMA

Transferring large files (with TCP) With the advent of compute cloud, it is often necessary to transfer large files over the network when loading a database. To speed up the transfer of large files: – Increase the size of the TCP buffers – Increase the socket buffer size (Linux) – Set up TCP large windows (and timestamp) – Rely on selective Dennis Shasha and Philippe Bonnet, 2013 LOOK UP: TCP TuningTCP Tuning

Throwing hardware at a problem More Memory – Increase buffer size without increasing paging More Disks – Log on separate disk – Mirror frequently read file – Partition large files More Processors – Off-load non-database applications onto other CPUs – Off-load data mining applications to old database copy – Increase throughput to shared data Shared memory or shared disk Dennis Shasha and Philippe Bonnet, 2013

Virtual Storage Dennis Shasha and Philippe Bonnet, 2013 Intel 710 (100GN, 2.5in SATA 3Gb/s, 25nm, MLC) 170 MB/s sequential write 85 usec latency write iops Random reads (full range; 32 iodepth) 75 usec latency reads Core i5 CPU 2.67GHz Host: Ubuntu noop scheduler VM: VirtualBox 4.2 (nice -5) all accelerations enabled 4 CPUs 8 GB VDI disk (fixed) SATA Controller Guest: Ubuntu noop scheduler 4 cores Experiments with flexible I/O tester (fio):flexible I/O tester (fio) - sequential writes (seqWrites.fio) - random reads (randReads.fio) [seqWrites] ioengine=libaio Iodepth=32 rw=write bs=4k,4k direct=1 Numjobs=1 size=50m directory=/tmp [randReads] ioengine=libaio Iodepth=32 rw=read bs=4k,4k direct=1 Numjobs=1 size=50m directory=/tmp

Virtual Storage - Dennis Shasha and Philippe Bonnet, 2013 Performance on Host Performance on Guest Default page size is 4k, default iodepth is 32

Virtual Storage - Dennis Shasha and Philippe Bonnet, 2013 Page size is 4k, iodepth is 32

Virtual Storage - Dennis Shasha and Philippe Bonnet, 2013 Page size is 4k* * experiments with 32k show negligible difference