Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding and Handling Database Corruption

Similar presentations


Presentation on theme: "Understanding and Handling Database Corruption"— Presentation transcript:

1 Understanding and Handling Database Corruption
David Maxwell Welcome to the session. (Tell story about client who had 90 days of corrupt backups.) IF this is a SQL Saturday: Pitch speaking. Please thank the sponsors. Please thank the volunteers.

2 David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql
Who am I? David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql or dmmaxwell.wordpress.com or upsearch.com Scripts / Slides available here. Visit and find your ideal way to stay in touch. Basic contact stuff and intro.

3 Planning and Practicing About CHECKDB Demonstrations
Agenda What is Corruption? Planning and Practicing About CHECKDB Demonstrations Agenda: First, we’ll define what exactly corruption is, as it can take many different forms. We’ll talk about key concepts related to planning for, and practicing for corruption scenarios. We’ll dig into the CHECKDB DBCC command a bit and cover what it does, and finally walk through a few sample corrupted databases.

4 What are Consistency and Corruption?
Consistency is an ACID property of transactions. Corruption is the loss of Consistency. SQL cannot read the data at rest. May be delayed, but not prevented. Consistency is one of the ACID principles of database transactions, and means that transactions must follow the rules in place. Rules could be user defined, such as data types, indexes or constraints. Rules could also be system-defined, such as the physical layout of the data within SQL Server’s data files. If these rules are violated, such as having data that does not match the data type of the column or data that is in the wrong place within a file, that is considered corruption. The end result of corruption is that SQL Server is not able to read the data it requires, from wherever that data is sitting, whether that data is on disk, or in SQL Server’s buffer pool memory. This will cause queries that rely on that data to fail, and can even lead to permanent loss of data. Causes of corruption can be anything from faulty hardware, to firmware bugs, or misbehaving storage software. Somewhere between SQL Server’s process, and the data at rest – something went wrong. While corruption is something that can be planned for, it is not something that can be absolutely prevented. Proactive measures for mitigating corruption are things like replacing disks before they go bad or ensuring that all software, including hardware firmware, is patched up to its most current version. Some High Availability features also offer additional protection against corruption. But sooner or later, it will happen.

5 Planning for Corruption
Determine the appropriate SLA – then exceed it. Monitor for corruption with Agent jobs and alerts. Test your backups. Document and test your plan regularly. Communication is always key. Planning for corruption starts with determining what your SLA is for the application and database you are trying to protect. Think not just RPO/RTO of the database, but consider the application as well. Sit down with the application owner, establish the SLA and a plan to meet it. Consider multiple paths to restoring service. (i.e., What if plan A fails?) A good rule of thumb is to back up your data with a frequency smaller than your recovery SLA. Consider performance SLAs as well. Go “looking for trouble” with SQL Server Agent alerts and jobs. Alerts are there to monitor for specific error messages related to corruption, and to inform you when those errors occur. Jobs are there to check your databases for consistency on a regular basis. How regular that basis is depends on how much additional load you can place on your SQL Server system at any given time. CHECKDB is a very intense operation, reading the entire DB, performing checks on logical constructs within the database… It is most definitely a performance concern. Run off of production hours, or in another location with a restored database. Test your backups. Please, if you do nothing else, test your backups. If you’ve sat down with your application administrator or data owner, you should have a plan to recover your database and application. Or do you? As Allan Hirt said, “If you haven’t tested your plan, you don’t have a plan. You have a document.” First, have your plan tested, but not by the person who wrote it. This is the best way to find holes in your plan. Second, ensure you test your plan on a regular basis. Technology changes over time, and so will your environment. Testing on a regular schedule will mitigate problems caused by changes in your environment. In any plan, during any test, and in any actual corruption situation, communication is paramount. Communicate with the people in your plan who can assist you directly, and limit your communication to those people and your boss or whoever is responsible for communication with everyone else. Consider having contact information in your plan for on call numbers or departments.

6 Checks Catalogs, Allocations, Tables and Logical Consistency.
About CHECKDB Checks Catalogs, Allocations, Tables and Logical Consistency. Requires a DB Snapshot. Faster means either more intense, or less comprehensive. CHECKDB is a console command, or DBCC command, that can check the logic and physical consistency of any given database. You should run CHECKDB as often as it’s practical for you to do on your databases, but bear in mind that CHECKDB is a very resource-intensive process. First, CHECKDB will read the basic catalog information of your database to determine what objects exist. Next it will verify the allocation data for those objects, so that it knows where to look on your storage media for those objects. CHECKDB then reads the data for each object and performs any logical checks that are required. (An example of a logical check would be; Does the nonclustered index on this table have a row for each corresponding row in the heap or clustered index?) If either of the Catalog or Allocation phases of CHECKDB fail, then CHECKDB cannot continue. At this point, the database is damaged beyond repair, and you should either recover from backup or extract the data to a new database, whichever is your best option. If those phases pass, then any corruption that is found may be repairable, but may still involve data loss. We’ll talk about that more when we get to the demos. In order to ensure the database remains in a consistent state for the duration of CHECKDB, SQL creates a snapshot of the database. Note that this incurs additional IO and storage overhead. This is another reason to run consistency checks at times when database activity is low, and little change is expected. There are some options such as trace flags and parallelism that can make CHECKDB faster. Many of those options cause CHECKDB to check objects in parallel, or to further tax the IO subsystem to read more things faster. You can also make CHECKDB faster by skipping specific kinds of checks. You may have to experiment with your systems to see what works for you, but remember that making CHECKDB faster is a trade off.

7 Nonclustered Index Corruption Single Page Repair via Restore
Demonstrations DBCC Commands Data Purity Error Nonclustered Index Corruption Single Page Repair via Restore Repair with Data Loss Now time, for demonstrations. Examining useful DBCC commands and data related to integrity. Sample corruption scenarios: Data Purity, Nonclustered Index corruption, and then a damaged pages fixed by either a restore, or a repair.

8 Test your corruption skills:
Resources All About CHECKDB Running CheckDB Test your corruption skills: Emergency Mode Repair Playtime

9 David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql
Thanks! David M Maxwell twitter.com/dmmaxwell or twitter.com/upsearchsql or dmmaxwell.wordpress.com or upsearch.com Scripts / Slides available here. Visit and find your ideal way to stay in touch. Thank you, and enjoy the rest of your day.


Download ppt "Understanding and Handling Database Corruption"

Similar presentations


Ads by Google