Understanding and Handling Database Corruption

Slides:



Advertisements
Similar presentations
Chapter 14 Handling Online Redo Log Failures. Background RMAN doesn’t back up online redo logs You don’t use RMAN to recover from online redo log failures.
Advertisements

DataBase Administration Scheduling jobs Backing up and restoring Performing basic defragmentation and index rebuilding Using alerts Archiving.
Database Optimization & Maintenance Tim Richard ECM Training Conference#dbwestECM Agenda SQL Configuration OnBase DB Planning Backups Integrity.
Monday, 08 June 2015Dr. Mohamed Osman1 What is Database Administration A high level function (technical Function) that is responsible for ► physical DB.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
Backup and Recovery Part 1.
Backup Concepts. Introduction Backup and recovery procedures protect your database against data loss and reconstruct the data, should loss occur. The.
7 Copyright © 2006, Oracle. All rights reserved. Dealing with Database Corruption.
Troubleshooting SQL Server Enterprise Geodatabase Performance Issues
Chapter 11: Designing a Data Recovery Solution for a Database MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design Study.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Rajib Kundu Agenda Definitions Failover Cluster Database Snapshots Log shipping Database Mirroring.
Module 9 Planning a Disaster Recovery Solution. Module Overview Planning for Disaster Mitigation Planning Exchange Server Backup Planning Exchange Server.
18 Copyright © Oracle Corporation, All rights reserved. Workshop.
Module 16: Performing Ongoing Database Maintenance
Week 7 : Chapter 7 Agenda SQL 710 Maintenance Plan:
11 Copyright © 2004, Oracle. All rights reserved. Dealing with Database Corruption.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Connect with life Vinod Kumar Technology Evangelist - Microsoft
Session 1 Module 1: Introduction to Data Integrity
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Digging Out From Corruption Eddie Wuerch, MCM - Principal, Database Performance - Salesforce Marketing Cloud Data protection and loss recovery with SQL.
SQL SERVER MAINTENANCE PLANS Kat
Backing Up and Restoring Databases Advanced Database Dr. AlaaEddin Almabhouh.
Microsoft SQL is known as RDMS (Relational Database Management System) which is developed by Microsoft and is highly used at corporate and enterprise.
SQL Server High Availability Introduction to SQL Server high availability solutions.
Dealing with Database Corruption DBA 911. Who am I? 2 David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsqltwitter.com/dmmaxwelltwitter.com/upsearchsql.
WHEN DATABASE CORRUPTION STRIKES Presented by Steve Stedman Founder/Owner of Stedman Solution, LLC.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Database recovery contd…
Planning for Application Recovery
Database Administration
Providing Application High Availability
Inside transaction logging
Managing Multi-User Databases
Business Directory REST API
Antonio Abalos Castillo
Database Corruption Advanced Recovery Techniques|
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Designing Database Solutions for SQL Server
Swapping Segmented paging allows us to have non-contiguous allocations
Contained DB? Did it do something wrong?
Introduction to SQL Server Management for the Non-DBA
Database Administration for the Non-DBA
SQL Recovery Freeware is the top notch tool to recover damaged MS SQL database.
Database Corruption Advanced Recovery Techniques
What’s new in SQL Server 2016 Availability Groups
Migrating your SQL Server Instance
Peter Shore SQL Saturday Cleveland 2016
Database Corruption Advanced Recovery Techniques
Inside transaction logging
Turbo-Charged Transaction Logs
Database Corruption Advanced Recovery Techniques
Targeting Wait Statistics with Extended Events
Workshop.
11 Simplex or Multiplex?.
Introduction to Operating Systems
Transaction Log Internals and Performance David M Maxwell
Lecture 20: Intro to Transactions & Logging II
Performing Database Recovery
Governing Your Enterprise with Policy-Based Management
Advanced Recovery Techniques
Michael Wall Senior DBA, Great Western Malting
Disaster Recovery Done Dirt Cheap Founder Curnutt Data Solutions
and Forecasting Resources
The DBA Quit and now you’re it:
Presentation transcript:

Understanding and Handling Database Corruption David Maxwell Welcome to the session. (Tell story about client who had 90 days of corrupt backups.) IF this is a SQL Saturday: Pitch speaking. Please thank the sponsors. Please thank the volunteers.

David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql Who am I? David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql dmmaxwell@gmail.com or david.maxwell@upsearch.com dmmaxwell.wordpress.com or upsearch.com Scripts / Slides available here. Visit https://upsearch.com/connect-with-us/ and find your ideal way to stay in touch. Basic contact stuff and intro.

Planning and Practicing About CHECKDB Demonstrations Agenda What is Corruption? Planning and Practicing About CHECKDB Demonstrations Agenda: First, we’ll define what exactly corruption is, as it can take many different forms. We’ll talk about key concepts related to planning for, and practicing for corruption scenarios. We’ll dig into the CHECKDB DBCC command a bit and cover what it does, and finally walk through a few sample corrupted databases.

What are Consistency and Corruption? Consistency is an ACID property of transactions. Corruption is the loss of Consistency. SQL cannot read the data at rest. May be delayed, but not prevented. Consistency is one of the ACID principles of database transactions, and means that transactions must follow the rules in place. Rules could be user defined, such as data types, indexes or constraints. Rules could also be system-defined, such as the physical layout of the data within SQL Server’s data files. If these rules are violated, such as having data that does not match the data type of the column or data that is in the wrong place within a file, that is considered corruption. The end result of corruption is that SQL Server is not able to read the data it requires, from wherever that data is sitting, whether that data is on disk, or in SQL Server’s buffer pool memory. This will cause queries that rely on that data to fail, and can even lead to permanent loss of data. Causes of corruption can be anything from faulty hardware, to firmware bugs, or misbehaving storage software. Somewhere between SQL Server’s process, and the data at rest – something went wrong. While corruption is something that can be planned for, it is not something that can be absolutely prevented. Proactive measures for mitigating corruption are things like replacing disks before they go bad or ensuring that all software, including hardware firmware, is patched up to its most current version. Some High Availability features also offer additional protection against corruption. But sooner or later, it will happen.

Planning for Corruption Determine the appropriate SLA – then exceed it. Monitor for corruption with Agent jobs and alerts. Test your backups. Document and test your plan regularly. Communication is always key. Planning for corruption starts with determining what your SLA is for the application and database you are trying to protect. Think not just RPO/RTO of the database, but consider the application as well. Sit down with the application owner, establish the SLA and a plan to meet it. Consider multiple paths to restoring service. (i.e., What if plan A fails?) A good rule of thumb is to back up your data with a frequency smaller than your recovery SLA. Consider performance SLAs as well. Go “looking for trouble” with SQL Server Agent alerts and jobs. Alerts are there to monitor for specific error messages related to corruption, and to inform you when those errors occur. Jobs are there to check your databases for consistency on a regular basis. How regular that basis is depends on how much additional load you can place on your SQL Server system at any given time. CHECKDB is a very intense operation, reading the entire DB, performing checks on logical constructs within the database… It is most definitely a performance concern. Run off of production hours, or in another location with a restored database. Test your backups. Please, if you do nothing else, test your backups. If you’ve sat down with your application administrator or data owner, you should have a plan to recover your database and application. Or do you? As Allan Hirt said, “If you haven’t tested your plan, you don’t have a plan. You have a document.” First, have your plan tested, but not by the person who wrote it. This is the best way to find holes in your plan. Second, ensure you test your plan on a regular basis. Technology changes over time, and so will your environment. Testing on a regular schedule will mitigate problems caused by changes in your environment. In any plan, during any test, and in any actual corruption situation, communication is paramount. Communicate with the people in your plan who can assist you directly, and limit your communication to those people and your boss or whoever is responsible for communication with everyone else. Consider having contact information in your plan for on call numbers or departments.

Checks Catalogs, Allocations, Tables and Logical Consistency. About CHECKDB Checks Catalogs, Allocations, Tables and Logical Consistency. Requires a DB Snapshot. Faster means either more intense, or less comprehensive. CHECKDB is a console command, or DBCC command, that can check the logic and physical consistency of any given database. You should run CHECKDB as often as it’s practical for you to do on your databases, but bear in mind that CHECKDB is a very resource-intensive process. First, CHECKDB will read the basic catalog information of your database to determine what objects exist. Next it will verify the allocation data for those objects, so that it knows where to look on your storage media for those objects. CHECKDB then reads the data for each object and performs any logical checks that are required. (An example of a logical check would be; Does the nonclustered index on this table have a row for each corresponding row in the heap or clustered index?) If either of the Catalog or Allocation phases of CHECKDB fail, then CHECKDB cannot continue. At this point, the database is damaged beyond repair, and you should either recover from backup or extract the data to a new database, whichever is your best option. If those phases pass, then any corruption that is found may be repairable, but may still involve data loss. We’ll talk about that more when we get to the demos. In order to ensure the database remains in a consistent state for the duration of CHECKDB, SQL creates a snapshot of the database. Note that this incurs additional IO and storage overhead. This is another reason to run consistency checks at times when database activity is low, and little change is expected. There are some options such as trace flags and parallelism that can make CHECKDB faster. Many of those options cause CHECKDB to check objects in parallel, or to further tax the IO subsystem to read more things faster. You can also make CHECKDB faster by skipping specific kinds of checks. You may have to experiment with your systems to see what works for you, but remember that making CHECKDB faster is a trade off.

Nonclustered Index Corruption Single Page Repair via Restore Demonstrations DBCC Commands Data Purity Error Nonclustered Index Corruption Single Page Repair via Restore Repair with Data Loss Now time, for demonstrations. Examining useful DBCC commands and data related to integrity. Sample corruption scenarios: Data Purity, Nonclustered Index corruption, and then a damaged pages fixed by either a restore, or a repair.

Test your corruption skills: Resources All About CHECKDB http://technet.microsoft.com/en-us/library/ms176064.aspx http://www.sqlskills.com/blogs/paul/category/checkdb-from-every-angle/ Running CheckDB http://ola.hallengren.com/ http://minionware.net/checkdb/ Test your corruption skills: http://stevestedman.com/2015/04/introducing-the-database-corruption-challenge-dbcc-week-1-challenge/ Emergency Mode Repair Playtime http://dmmaxwell.wordpress.com/2012/11/06/emergency-mode-repair-playtime/

David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql Thanks! David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql dmmaxwell@gmail.com or david.maxwell@upsearch.com dmmaxwell.wordpress.com or upsearch.com Scripts / Slides available here. Visit https://upsearch.com/connect-with-us/ and find your ideal way to stay in touch. Thank you, and enjoy the rest of your day.