Some less known facts about log file sync and other LGWR-related waits

Slides:



Advertisements
Similar presentations
IT253: Computer Organization
Advertisements

Database Tuning. Objectives Describe the roles associated with database tuning. Describe the dependency between tuning in different development phases.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Backup and Recovery Copyright System Managers LLC 2008 all rights reserved.
1 Transaction Management Database recovery Concurrency control.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Backup and Recovery Part 1.
Oracle9i Database Administrator: Implementation and Administration
Redo Waits Kyle Hailey #.2 Copyright 2006 Kyle Hailey Log File Waits  Redo is written to disk when  User commits  Log Buffer.
Redo Waits Kyle Hailey #.2 Copyright 2006 Kyle Hailey Redo REDO Lib Cache Buffer Cache Locks Network I/O.
Introduction to Oracle Backup and Recovery
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
2 Copyright © 2006, Oracle. All rights reserved. Performance Tuning: Overview.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
7202ICT – Database Administration
Oracle Tuning Considerations. Agenda Why Tune ? Why Tune ? Ways to Improve Performance Ways to Improve Performance Hardware Hardware Software Software.
Chapterb19 Transaction Management Transaction: An action, or series of actions, carried out by a single user or application program, which reads or updates.
© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
The Amiga Operating System: Past and Present Aaron Hensley Kayla Zinn Brad Campbell Gregory Mathurin Josh Benson.
Chapter 1Oracle9i DBA II: Backup/Recovery and Network Administration 1 Chapter 1 Backup and Recovery Overview MSCD642 Backup and Recovery.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
Transactional Recovery and Checkpoints. Difference How is this different from schedule recovery? It is the details to implementing schedule recovery –It.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.
COMP 430 Intro. to Database Systems Transactions, concurrency, & ACID.
Speculative execution Landon Cox April 13, Making disk accesses tolerable Basic idea Remove disk accesses from critical path Transform disk latencies.
ESS Timing System Plans Timo Korhonen Chief Engineer, Integrated Control System Division Nov.27, 2014.
Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Online/offline replication via Oracle Streams WLCG Collaboration.
Use Cases for In-Memory OLTP Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.
WLCG Collaboration Workshop CMS online/offline replication
File System Consistency
CS 540 Database Management Systems
You Inherited a Database Now What?
Free Transactions with Rio Vista
Recovery Control (Chapter 17)
Transactions and Reliability
Transactional Recovery and Checkpoints
Kyle Hailey Redo Waits Kyle Hailey
FileSystems.
Operating Systems (CS 340 D)
Informatica PowerCenter Performance Tuning Tips
CS703 - Advanced Operating Systems
Swapping Segmented paging allows us to have non-contiguous allocations
Operating Systems (CS 340 D)
The Tail At Scale Dean and Barroso, CACM 2013, Pages 74-80
Oracle9i Database Administrator: Implementation and Administration
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Free Transactions with Rio Vista
Secondary Storage Management Brian Bershad
Persistence: hard disk drive
Troubleshooting Techniques(*)
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
You Inherited a Database Now What?
Operating Systems : Overview
Speculative execution and storage
Operating Systems : Overview
STRUCTURE OF PRESENTATION :
Secondary Storage Management Hank Levy
CSE 153 Design of Operating Systems Winter 19
Device Mgmt © 2004, D. J. Foreman.
Chapter 5 The Redo Log Files.
Device Mgmt © 2004, D. J. Foreman.
Chapter 13: I/O Systems.
Using wait stats to determine why my server is slow
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Some less known facts about log file sync and other LGWR-related waits Nikolay Savvinov, snr. database performance specialist, Deutsche Bank TechCentre

A few words about me 10 years with Oracle databases was doing particle physics before last 5 years focus on performance optimization Twitter: oradiag Blog: savvinov.com 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Outline Well known stuff about LGWR Not-so-well-known stuff about LGWR log parallelism and how it can backfire contention-related log file sync waits excessive commits aren’t always excessive 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

What is log file sync (LFS)? Changes generate redo (redo makes changes recoverable) Redo needs to be written to persistent storage For performance reasons, this is done asynchronously When user commits changes, he wants to be sure they’re protected LFS is an delay between the commit and the confirmation that redo is on the disk Normally, the main part of LFS is log file parallel write (LFPW) Other components: latch manipulation, inter-process communication, CPU etc. 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Why care about LFS? LFS measures the delay introduced by a commit I.e. systems that commit a lot, can spend a lot of time on LFS LFS is critical for low-latency OLTP systems LFS is one of major sources of replication delays LFS is responsible for “commit gaps” => errors in logic 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

How redo generation works redo is simple (flows in, flow out) log buffer is small log files are written to in a circular manner redo incoming rate ≈ rate of change can be affected by hot backup flush triggered by: commit rollback log buffer 1/3 full log buffer 1MB full every 3 seconds no balance between redo in- and out-flow => delays Redo generation rate Log buffer Redo flush Redo write speed To log files 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Piggyback commit: mechanics 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Averages lie BIG TIME (1 ms x 10 + 10 ms x 1)/(10 + 1) ≈1.8 ms LFPW, (1 ms x 10 + 5 ms x 10) / (10 + 10) = 3 ms LFS the higher (and the more frequent) the outliers are, the bigger gap between LFS and LFPW 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

I/O related LFS most common scenario mark of I/O related LFS: LGWR waiting almost exclusively on LFPW I/O performance cannot be judged by time alone, need redo size as well must take into account RAID write penalties synchronous storage-level replication – another common scenario a (rather common) special case: storage-level contention 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

“log file parallel write” => “log file serialized write” on the database level, LFPW is a simple single event on the OS level: a bunch of write requests to several destinations (multiplexity!) LFPW parameters: select p1, p1text, p2, p2text, p3, p3text from v$active_session_history where event = ‘log file parallel write’ =============================================== 1 files 2050 blocks 2 requests when requests > multiplexing, we see log parallelism in action introduced to reduce latching, has nasty side effects when many CPUs _log_parallelism_max, _log_parallelism_dynamic (note 34583.1) 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

CPU-related LFS waits redo logging is NOT CPU intensive still, it does need CPU CPU starvation, priority inversion etc. can lead to CPU-related LFS identified by % of time spend “ON CPU” by LGWR (or big outliers) sometimes can be fixed by changing LGWR priority in OS, or by using database parameter _high_priority_processes on the plot: spikes correspond to CPU scheduler quanta 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Contention-related LFS Apart from I/O and CPU, another scenario for LFS is contention e.g. writes to control file required when switching log files Signature: LGWR spending significant % of time on “enq: CF – contention” (or big outliers) Small log files increase the risk of this problem! Rather than relying on reducing log switch frequency alone, the best approach is to identify the root cause (e.g. RMAN issues) causing excessive (or slow) writes to control file that lead to the contention 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Excessive commits? “You’re committing too much” is one of the most popular responses when complaining about LFS Record: the smallest commit frequency that was declared “excessive” by investigating DBA was … 30 commits per second In reality, an Oracle database can handle thousands, and even tens of thousands of commits per second Excessive commits are primarily a problem for transactional integrity Excessive commits slow down the process that issues them Removing them can LFS => LBS or cause a bottleneck elsewhere 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Bad reasons to worry about LFS because without LFS everything would go faster (no it won’t) because it’s in top events in AWR (so what?) because LFS is contributing to locking/contention (no it doesn’t) because LFS is increasing CPU consumption (no it doesn’t) 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR

Summary LFS is a common problem, more common for OLTP not always a performance issues as such, could also be a “staleness” issue, or cause errors in logic % DB time in AWR is rarely a useful measure for LFS impact caused by redo I/O, CPU/scheduling issues, or contention (e.g. for CF) I/O problems can be related to “serialized parallelism” issue, workaround: disable log parallelism reducing commits can transform LFS into LBS when troubleshooting LFS, understanding the scope is of key importance (or performance can be made worse instead of better) 10 June 2015, HARMONY – 2015, TALLINN N. Savvinov, Some less known facts about LGWR