Module 4 Designing Databases for Optimal Performance

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Module 12: Auditing SQL Server Environments
Tables Lesson 6. Skills Matrix Tables Tables store data. Tables are relational –They store data organized as row and columns. –Data can be retrieved.
Module 6 Implementing Table Structures in SQL Server ®2008 R2.
Indexes Rose-Hulman Institute of Technology Curt Clifton.
Concepts of Database Management Sixth Edition
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Overview of the Database Development Process
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Module 8 Improving Performance through Nonclustered Indexes.
Module 3: Managing Database Files. Overview Introduction to Data Structures Creating Databases Managing Databases Placing Database Files and Logs Optimizing.
Chapter 4 The Relational Model 3: Advanced Topics Concepts of Database Management Seventh Edition.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Databases Lesson 5.
Module 4 Designing Databases for Optimal Performance.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Overview – Chapter 11 SQL 710 Overview of Replication
Module 16: Performing Ongoing Database Maintenance
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
1 Administering Shared Folders Understanding Shared Folders Planning Shared Folders Sharing Folders Combining Shared Folder Permissions and NTFS Permissions.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Views Lesson 7.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
Methodology – Physical Database Design for Relational Databases.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Module 5: Implementing Merge Replication. Overview Understanding Merge Replication Architecture Implementing Conflict Resolution Planning and Deploying.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
SQL Triggers, Functions & Stored Procedures Programming Operations.
SQL Basics Review Reviewing what we’ve learned so far…….
Module 6: Creating and Maintaining Indexes. Overview Creating Indexes Understanding Index Creation Options Maintaining Indexes Introducing Statistics.
Views / Session 3/ 1 of 40 Session 3 Module 5: Implementing Views Module 6: Managing Views.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
CS4222 Principles of Database System
Practical Database Design and Tuning
Indexing Structures for Files and Physical Database Design
Using Partitions and Fragments
Module 4: Creating and Tuning Indexes
Designing Database Solutions for SQL Server
Methodology – Physical Database Design for Relational Databases
Physical Database Design for Relational Databases Step 3 – Step 8
Modern Systems Analysis and Design Third Edition
Database Performance Tuning and Query Optimization
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Physical Database Design
Practical Database Design and Tuning
Microsoft SQL Server 2014 for Oracle DBAs Module 7
Unit I-2.
Chapter 11 Database Performance Tuning and Query Optimization
Chapter 11 Managing Databases with SQL Server 2000
IST 318 Database Administration
A – Pre Join Indexes.
Presentation transcript:

Module 4 Designing Databases for Optimal Performance Course 50401A Module 4: Designing Databases for Optimal Performance Module 4 Designing Databases for Optimal Performance Presentation: 100 minutes Lab: 60 minutes After completing this module, students will be able to: Design indexes Design scalable databases Design a plan guide Design a partitioning strategy Required materials To teach this module, you need the Microsoft® Office PowerPoint® file 50401A-ENU_Powerpnt_04.ppt. Important: It is recommended that you use PowerPoint 2002 or a later version to display the slides for this course. If you use PowerPoint Viewer or an earlier version of PowerPoint, all the features of the slides might not be displayed correctly. Preparation tasks To prepare for this module: Read all of the materials for this module. Practice performing the demonstrations and the lab exercises. Work through the Module Review and Takeaways section, and determine how you will use this section to reinforce student learning and promote knowledge transfer to on-the-job performance. Make sure that students are aware that there are additional information and resources for the module on the Course Companion CD.

Module Overview Guidelines for Designing Indexes Course 50401A Module Overview Module 4: Designing Databases for Optimal Performance Guidelines for Designing Indexes Designing a Partitioning Strategy Designing a Plan Guide Designing Scalable Databases

Lesson 1: Guidelines for Designing Indexes Course 50401A Lesson 1: Guidelines for Designing Indexes Module 4: Designing Databases for Optimal Performance Guidelines for Selecting a Clustered Index Guidelines for Selecting a Nonclustered Index Guidelines for Selecting a Filtered Index Guidelines for Selecting a Computed Column Index Guidelines for Selecting a Strategy for Index Compression Discussion: Using Indexing

Guidelines for Selecting a Clustered Index Course 50401A Guidelines for Selecting a Clustered Index Module 4: Designing Databases for Optimal Performance Create a clustered index on the frequently used columns ü Mention the following technical facts describing the clustered index. Clustered indexes store data rows at the leaf node of the B-tree structure that speed retrieval of rows from tables or views. Clustered indexes determine the physical order of records and pages in a table or indexed view. One clustered index represents one table or indexed view because the actual data rows can be sorted only in one order. The total width of the column values included in the index determines how many levels the index tree will require. The more levels in an index, the less efficient is the index. However, the largest Microsoft® SQL Server™ indexes do not have more than three levels; you should minimize the index key width. The width of each nonclustered index key value includes the width of the clustered key value. This is because nonclustered indexes reference the clustered index if one exists, and they use it to reach data rows. The corresponding clustered index key is copied into the leaf node of every nonclustered index. Discuss the following guidelines for selecting a clustered index. Create a clustered index on the frequently used columns. Clustered indexes are efficient when they are created on columns that are frequently used in the queries that return large numbers of contiguous rows. When designing clustered indexes, check the columns used as search arguments (SARGs) in JOIN conditions or in WHERE conditions and that use the equal to sign (=), greater than sign (>), less than sign (<), and the BETWEEN operator. Primary keys that are used in JOINs and WHERE conditions are good candidates for clustered indexes. Columns with date values are good clustered indexes because they are often used as search ranges in WHERE conditions. Consider clustered index data types and column widths. The data type of columns and column widths determines the total index width. Avoid choosing a clustered index with a wide key because it takes more resources for maintenance of the clustered index and all nonclustered indexes that rely on it. Consider the frequency of data changes. Avoid using clustered indexes on columns that undergo frequent changes. When the columns involved in a clustered index undergo data modification, SQL Server updates the clustered index structure to move the row from the original position to a new position. SQL Server updates the references to the clustered key in all related nonclustered indexes. Eventually, updating clustered indexes results in a performance cost, index fragmentation, and page splits. You can reduce the risk of page splits by rebuilding the clustered index regularly by using a higher fill factor; however, rebuilding indexes increases maintenance cost. Consider clustered index data types and column widths ü ü Consider the frequency of data changes Clustered Index 4

Guidelines for Selecting a Nonclustered Index Course 50401A Guidelines for Selecting a Nonclustered Index Module 4: Designing Databases for Optimal Performance Consider performance gain versus maintenance cost ü Index on frequently used search arguments Consider nonclustered indexes for columns with high selectivity Consider placing nonclustered indexes on foreign key columns Choose a nonclustered index to cover the query Consider using included columns Consider using sys.sysindexes to gather information about an index Mention technical facts that describe the nonclustered index. Nonclustered indexes, like clustered indexes, are B-tree structures that speed the retrieval of rows from tables or views. If a table has a clustered index, all nonclustered indexes of table refer the clustered index. If columns needed to answer a query are included in the nonclustered index, the server uses the nonclustered index. If the columns needed to answer a query are not included in the nonclustered index, the server uses the nonclustered index to access the clustered index for retrieving the values. This mechanism is called as bookmark lookup. The only types of views that can have indexes are indexed views. An indexed view must have a clustered index before it can have any nonclustered indexes. SQL Server 2005 supports up to 999 indexes per table or indexed view. Discuss the following guidelines for selecting a clustered index. Consider performance gain versus maintenance cost. When you choose a nonclustered index, you must balance performance gain and maintenance cost. Although nonclustered indexes improve query performance, it increases maintenance cost. Index frequently used search arguments. When you design nonclustered indexes, look for columns used in search arguments known as SARGs. These SARGs are stored in the WHERE and JOIN clauses of a query. Nonclustered indexes work best when you estimate that the query will return just one row or a small number of rows. Consider nonclustered indexes for columns with high selectivity. Consider nonclustered indexes for columns with high selectivity, that is, a higher ratio of distinct values. For example, avoid choosing a nonclustered index on a column, such as Gender, that includes only two values, such as M, F. Consider placing nonclustered indexes on foreign key columns. It is a common practice to join tables on foreign key values, and if a nonclustered index is placed on foreign key values, the optimizer can use it in the join. Choose a nonclustered index to cover the query. In cases where performance is critical, you can choose a nonclustered index to cover the query. If the nonclustered index contains all the columns involved in the query, SQL Server can satisfy the query just from the index and need not access the underlying table, which causes a bookmark lookup. Consider using included columns. In cases where you have wide tables and a critical query is required to retrieve only some of the column data, you can avoid the cost of a nonclustered index on all the required columns by using included columns. You still need to index on the search arguments. However, you can include additional columns that are not part of the index key. Consider using sys.indexes to gather information about an index. You can use the sys.indexes catalog view to obtain information about each index in the database. Sys.indexes catalog view contains a row per index or heap of a tabular object, such as a table, view, or table-valued function. You can also join the data in the sys.indexes with other system views to obtain detailed information. id indid = 2 root Page 12 - Root Page 37 Page 28 Page 51 Page 61 Page 71 Martin Smith ... Mather Owen 4:708:01 4:706:04 4:707:02 Akers Ganio … Nonleaf Level Leaf Level (Key Value) Page 41 Barr Con 4:706:01 4:705:03 4:704:01 4:706:03 4:708:04 4:707:01 Hall Jones 4:709:01 4:709:03 4:709:02 sys.sysindexes

Guidelines for Selecting a Filtered Index Course 50401A Guidelines for Selecting a Filtered Index Module 4: Designing Databases for Optimal Performance Create filtered indexes for heterogeneous data ü Create filtered indexes for subsets of data ü Explain filtered index. Tell students that to design effective filtered indexes, it is important to understand what queries your application uses and how they relate to the subsets of your data. Discuss the following considerations for using filtered indexes. Create filtered indexes for heterogeneous data. When a table has heterogeneous data rows, you can create a filtered index for one or more categories of data. Create filtered indexes for subsets of data. When a column has a small number of relevant values for queries, you can create a filtered index on the subset of values. Compare views with filtered indexes. Consider the functionality allowed in views with that of filtered indexes to determine whether to use views or filtered indexes. Compare indexed views with filtered indexes. You can use a filtered index instead of an indexed view when views reference only one table, queries do not return computed columns, and the view predicate uses simple comparison logic. Include a small number of keys or included columns in a filtered index definition. Include a small number of keys or included columns in a filtered index definition, and incorporate only the columns that are necessary for the query optimizer to choose the filtered index for the query execution plan. Use data conversion operators in the filter predicate. Write the filtered index expression with the data conversion operator (CAST or CONVERT) on the right side of the comparison operator to avoid data conversion errors. Use referencing dependencies. Use the sys.sql_expression_dependencies catalog view to track each column in the filtered index expression as a referencing dependency. Use filtered indexes. Use filtered indexes when columns contain well-defined subsets of data that queries reference in SELECT statements. Compare views with filtered indexes ü Include a small number of key or included columns in a filtered index definition ü Use filtered indexes when columns contain well-defined subsets of data ü Compare indexed views with filtered indexes ü Use data conversion operators in the filter predicate ü Use referencing dependencies ü

Guidelines for Selecting a Computed Column Index Course 50401A Guidelines for Selecting a Computed Column Index Module 4: Designing Databases for Optimal Performance Choose a deterministic and precise computed column expression Discuss the following guidelines for selecting a clustered index. Choose a deterministic and precise computed column expression. An expression is deterministic if all functions (built-in, Transact-SQL, or common language runtime (CLR)) in the expression are deterministic and precise. A function is deterministic if it returns the same value every time you run the function with the same parameters. For example, the GETDATE or CURRENT_TIMESTAMP function is nondeterministic, whereas the ISNULL function is deterministic. A function is precise if the expression does not involve the use of floating-point (float or real) data types. When a function is not precise, a computed column must be PERSISTED to support an index. Assign only values of other columns in the same row. The computed column expression must be based only on values of other columns in the same row. The expression cannot reference other rows in the same table or columns in other tables. If the computed column references CLR functions, the column should not perform any system or user data access. Assess benefits for common or important queries. You can specify indexes on computed columns to increase the performance of critical or high-frequency queries. For example, in the OrderDetail table, you can define the ExtendedPrice column based on the following function: ROUND(Quantity*Price*(1-Discount/100)*(1+Tax/100),4). If this ExtendedPrice column is used to compute daily product sales, a covered index {OrderId, ProductID, ExtendedPrice} can help the performance of the query, and because the ExtendedPrice is included as part of the index, it will be a covered index. Assess performance cost against performance gain. Evaluate performance gain against performance cost while defining indexes on computed columns. When computed columns are persisted or used in indexes, every time the row is updated, the server generates the value again based on the expression. If the table is frequently updated, the performance of UPDATE statements is affected. Computed column indexes can be especially helpful in computed columns that are based on complex CLR functions, which are seldom updated. To confirm that the index is used and a performance gain is achieved, you can test the queries with and without the index in your prototype. Use CLR functions in computed columns to restrict access. If the computed column references CLR functions, they should not perform any system or user data access. Assign only values of other columns in the same row Assess benefits for common or important queries Assess performance cost against performance gain Use CLR functions in computed columns to restrict access

Guidelines for Selecting a Strategy for Index Compression Course 50401A Guidelines for Selecting a Strategy for Index Compression Module 4: Designing Databases for Optimal Performance Compresses Nonclustered indexes individually ü Rebuild all the nonclustered indexes on the table to compress a heap Enable or disable ROW or PAGE compression online or offline Non–leaf-level pages do not receive page compression when compressing indexes Data compression is not available for data that is stored separately Avoid specifying out-of-range partitions Rebuild a heap to compress new pages allocated to the heap For individual partitions, set the compression type to NONE and for a list of partitions, set the type to ROW Compress tables with row size less than 8,060 bytes Discuss the considerations for index compression. Mention the following points: Compression is available only in SQL Server 2008 Enterprise and Developer editions. Compression allows more rows to be stored on a page, but does not change the maximum row size of a table or index. A table cannot be enabled for compression when the maximum row size along with the compression overhead exceeds the maximum row size of 8,060 bytes. When a list of partitions is specified, you can set the compression type to ROW, PAGE, or NONE at the individual partition level. An error is generated if you specify a list of partitions or a partition that is out of range. Nonclustered indexes do not inherit the compression property of the table. To compress indexes, explicitly set the compression property of the indexes. When a clustered index is created on a heap, the clustered index inherits the compression state of the heap unless an alternative compression state is specified. When a heap is configured for page-level compression, pages receive page-level compression only in the following ways: Data is inserted by using the BULK INSERT syntax. Data is inserted by using the INSERT INTO ... WITH (TABLOCK) syntax. A table is rebuilt by executing the ALTER TABLE ... REBUILD statement with the PAGE compression option. New pages allocated in a heap as part of DML operations will not use PAGE compression until the heap is rebuilt. You can rebuild the heap by removing and reapplying compression or by creating and removing a clustered index. Changing the compression setting of a heap requires all nonclustered indexes on the table to be rebuilt so that they have pointers to the new row locations in the heap. You can enable or disable ROW or PAGE compression online or offline. Enabling compression on a heap is single threaded for an online operation. The disk space requirements for enabling or disabling row or page compression are the same as for creating or rebuilding an index. To determine the compression state of partitions in a partitioned table, query the data_compression column of the sys.partitions catalog view. When compressing indexes, you can compress leaf-level pages with both row and page compression. Non–leaf-level pages do not receive page compression. Data compression is not available for data that is stored separately, such as large-value data types. Discuss the database objects that can be compressed by using SQL Server 2008. You can also discuss row compression and page compression. In addition, you should discuss how compression can help increase performance and lessen the storage cost. 8

Discussion: Using Indexing Course 50401A Discussion: Using Indexing Module 4: Designing Databases for Optimal Performance Is it necessary for every table to have a clustered index? Justify your answer. An Orders table has a clustered index on the InvoiceNumber (int). The most frequently executed queries use SARG arguments on the OrderDate (datetime) column. A nonclustered index has been created on the OrderDate column. What are the advantages and disadvantages of this clustered index? Question: Is it necessary for every table to have a clustered index? Justify your answer. Answer: With few exceptions, yes. Most queries execute optimally against clustered indexes instead of heaps, and a clustered index will save most of the storage space required by the non-clustered index against a heap. Question: An Orders table has a clustered index on the InvoiceNumber (int). The most frequently executed queries use SARG arguments on the OrderDate (datetime) column. A non-clustered index has been created on the OrderDate column. What are the advantages and disadvantages of this clustered index? Answer: All queries retrieving ranges of InvoiceNumbers will be well served by the clustered index. By creating a nonclustered index on the OrderDate, date based queries have to traverse two indexes. Typical uses of date based SARG arguments is to return a range of data. This table may be a candidate for re-engineering to have OrderDate as the clustered index. More exploration and testing is indicated.

Lesson 2: Designing a Partitioning Strategy Course 50401A Lesson 2: Designing a Partitioning Strategy Module 4: Designing Databases for Optimal Performance Overview of Partitioning Guidelines for Planning Partitioned Tables and Indexes Designing Partitions to Manage Subsets of Data Designing Partitions to Improve Query Performance Special Guidelines for Partitioned Indexes Discussion: Using Partitioning

Overview of Partitioning Course 50401A Overview of Partitioning Module 4: Designing Databases for Optimal Performance Partitioning helps to break a large table into multiple physical files without comprising the integrity or structure of the database Discuss the following advantages of partitioning. Partitioning makes large tables or indexes more manageable. Partitioning enables you to manage and access subsets of data quickly and efficiently while maintaining the integrity of a data collection. Large operations such as loading data from an Online Transaction Processing (OLTP) to an On Line Analytical Processing (OLAP) system can be performed quickly. Partitioned tables and indexes support designing and querying. Partitioned tables and indexes support all the properties and features associated with designing and querying standard tables and indexes, including constraints, defaults, identity and timestamp values, and triggers. Maintenance operations performed on subsets of data can be performed more efficiently. With partitioning, subsets of data can be separated quickly into staging areas for offline maintenance and then added as partitions to existing partitioned tables, assuming that these tables are all in the same database instance. Partitioning a table or index might improve query performance. Partitioning a table or index might improve query performance if the partitions are designed correctly based on the types of queries you frequently run on your hardware configuration. Also, discuss when it is advisable to implement the partitioning strategy. Provide possible scenarios such as: The table contains, or is expected to contain, lots of data that is used in different ways. Queries or updates against the table are not performing as intended. Maintenance costs exceed predefined maintenance periods. In general, discuss the concept of partitioning with some real-life examples. Advantages of Partitioning When to Implement Partitioning? Partitioning makes large tables or indexes more manageable Partitioned tables and indexes support designing and querying Maintenance operations performed on subsets of data can be performed more efficiently Partitioning a table or index might improve query performance Implement partitioning when: The table contains, or is expected to contain data that is used in different ways Queries or updates against the table are not performing as intended Maintenance costs exceed predefined maintenance periods 11

Guidelines for Planning Partitioned Tables and Indexes Course 50401A Guidelines for Planning Partitioned Tables and Indexes Module 4: Designing Databases for Optimal Performance Partition function Defines how the rows of a table or index are mapped to partitioning columns Discuss the importance of creating database objects, such as partition function and partition scheme, before partitioning a table or index. Explain the following guidelines for each: Partition function A partition function defines how the rows of a table or index are mapped to a set of partitions, based on the values of certain columns, called partitioning columns. Consider the following two factors when planning a partition function: The column whose values determine how a table is partitioned known as the partitioning column. The range of values of the partitioning column for each partition. Partition scheme A partition scheme maps each partition specified by the partition function to a filegroup. When planning a partition scheme, decide which filegroup or filegroups you want to put your partitions on. The primary reason for placing your partitions on separate filegroups is to ensure that you can independently perform backup operations on partitions. Maps each partition specified by the partition function to a filegroup Partition scheme

Designing Partitions to Manage Subsets of Data Course 50401A Designing Partitions to Manage Subsets of Data Module 4: Designing Databases for Optimal Performance Adding a table as a partition to an already existing partitioned table Discuss how you can move subsets of data quickly and efficiently by partitioning a table or an index. Explain with an example the usage of the Transact-SQL ALTER TABLE...SWITCH statement in performing the following actions: Adding a table as a partition to an already-existing partitioned table Switching a partition from one partitioned table to another Removing a partition to form a single table Switching a partition from one partitioned table to another Removing a partition to form a single table

Designing Partitions to Improve Query Performance Course 50401A Designing Partitions to Improve Query Performance Module 4: Designing Databases for Optimal Performance Partitioning for Join Queries Discuss how partitioning a table or index improves query performance based on the following factors. Partitioning for Join Queries If you frequently run queries that involve an equi-join between two or more partitioned tables, their partitioning columns should be the same as the columns on which the tables are joined. Additionally, the tables or their indexes should be collocated. Taking Advantage of Multiple Disk Drives It is a better idea to stripe the data files of your partitions across more than one disk by setting up a RAID. In this way, although SQL Server still sorts data by partition, it can access all drives of each partition simultaneously. This configuration can be designed, regardless of whether all partitions are in one filegroup or multiple filegroups. Controlling Lock Escalation Behavior Partitioning tables can improve performance by enabling lock escalation to a single partition, instead of a whole table. To reduce lock contention by allowing lock escalation to the partition, use the LOCK_ESCALATION option of the ALTER TABLE statement. Taking Advantage of Multiple Disk Drives Controlling Lock Escalation Behavior

Special Guidelines for Partitioned Indexes Course 50401A Special Guidelines for Partitioned Indexes Module 4: Designing Databases for Optimal Performance Partitioning Unique Indexes ü Discuss the following guidelines for implementing partitioned indexes. Partitioning unique indexes. When partitioning a unique, clustered or nonclustered index, the partitioning column must be chosen from among those used in the unique index key. Partitioning clustered indexes. When partitioning a clustered index, the clustering key must contain the partitioning column. When partitioning a non-unique, clustered index, and the partitioning column is not explicitly specified in the clustering key, SQL Server adds the partitioning column by default to the list of clustered index keys. If the clustered index is unique, you must explicitly specify that the clustered index key contains the partitioning column. Partitioning nonclustered indexes. When partitioning a unique, nonclustered index, the index key must contain the partitioning column. When partitioning a non-unique, nonclustered index, SQL Server adds the partitioning column by default as a nonkey (included) column of the index to ensure that the index is aligned with the base table. SQL Server does not add the partitioning column to the index if it is already present in the index. Memory limitations and partitioned indexes. Memory limitations can affect the performance or ability of SQL Server to build a partitioned index. This is especially the case when: The index is not aligned with its base table. The index is not aligned with its clustered index. The table already has a clustered index applied to it. You should also discuss the following points on parallel execution strategy: The query processor determines the table partitions required for the query and the proportion of threads to allocate to each partition. The query processor allocates an equal or an almost equal number of threads to each partition and then executes the query in parallel across partitions. In addition, discuss the best practices to be followed for improving the performance of queries that access a large amount of data from large partitioned tables and indexes. Partitioning Clustered Indexes ü Partitioning Nonclustered Indexes ü Memory Limitations and Partitioned Indexes ü

Discussion: Using Partitioning Course 50401A Discussion: Using Partitioning Module 4: Designing Databases for Optimal Performance What problems does table partitioning solve? How? Please explain how to create a table partition, identifying the T-SQL object and statement level support Question: What problems does table partitioning solve? How? Answer: Table partitioning can improve query performance on very large tables by reducing the set of date examined by a query. In table partitioning, you can partition data by ‘age’, by ‘location’, or by any way the business segregates its business requirements. In addition, most queries are directed to the ‘active’ data partition. Question: Please explain how to create a table partition, identifying the T-SQL object and statement level support. Suggested answer for the conversation: The discussion should include the use of Partition Functions, Partition Schemes, and the ALTER TABLE … SWITCH statement. Include in the discussion issues related to partitioning indexes and table lock escalation behavior.

Lesson 3: Designing a Plan Guide Course 50401A Lesson 3: Designing a Plan Guide Module 4: Designing Databases for Optimal Performance Overview of Plan Guide Guidelines for Designing Plan Guides Designing Plan Guides for Parameterized Queries Discussion: Using Plan Guides

Course 50401A Overview of Plan Guide Module 4: Designing Databases for Optimal Performance Plan guides in SQL Server are useful when a small subset of queries in a database application deployed from a third-party vendor are not performing as expected. Plan guides influence optimization of queries by attaching query hints or a fixed query plan to them Using the slide, describe plan guides. You should also describe the following types of plan guides: OBJECT plan guides. OBJECT plan guides match queries that execute in the context of Transact-SQL stored procedures, scalar functions, multiple statement table-valued functions, and data manipulation language (DML) triggers. SQL plan guides. SQL plan guides match queries that execute in the context of stand-alone Transact-SQL statements and batches that are not part of a database object. SQL-based plan guides can also be used to match queries that parameterize to a specified form. TEMPLATE plan guides. TEMPLATE plan guides match stand-alone queries that parameterize to a specified form. These plan guides are used to override the parameterization behavior of specific query forms. Types of plan guides include: Object plan guide SQL plan guide Template plan guide

Guidelines for Designing Plan Guides Course 50401A Guidelines for Designing Plan Guides Module 4: Designing Databases for Optimal Performance Attach query hints to plan guide ü Discuss the following guidelines for designing plan guides. Attach query hints to a plan guide. Plan guides can use any combination of valid query hints. When a plan guide matches a query, the OPTION clause specified in the hints clause of a plan guide is added to the query before it compiles and optimizes. If a query that is matched to a plan guide already has an OPTION clause, the query hints specified in the plan guide replace those in the query. However, for a plan guide to match a query that already has an OPTION clause, you must include the OPTION clause of the query when you specify the text of the query to match the sp_create_plan_guide statement. Attach a query plan to a plan guide. Plan guides that apply a fixed query plan are useful when you are aware of an existing execution plan that performs better than the one selected by the optimizer for a particular query. Note that applying a fixed plan to a query means that the query optimizer can no longer adapt the plan for the query to changes in statistics and indexes. When you consider plan guides that use fixed query plans , ensure that you compare the benefits of applying a fixed plan with the inability to adapt the plan automatically as data distribution and available indexes change. You can attach a specific query plan to a plan guide by specifying the XML Showplan of the plan in the xml_showplan parameter in the sp_create_plan_guide statement or by specifying the plan handle of a cached plan in the sp_create_plan_guide_from_handle statement. Both these methods apply the fixed query plan to the targeted query. Follow the plan guide that matches requirements. Plan guides are scoped to the database in which they are created. Therefore, only plan guides that exist in the current database when a query executes can be matched to the query. For SQL- or TEMPLATE-based plan guides, SQL Server matches the values for @module_or_batch and @params arguments to a query by comparing the two values character by character. This means you must provide the text exactly as SQL Server receives it in the actual batch. Evaluate the plan guide effect on the plan cache. A plan guide removes the query plan for that module from the plan cache. Creating a plan guide of type OBJECT or SQL on a batch removes the query plan for a batch that has the same hash value. Creating a plan guide of type TEMPLATE removes all single-statement batches from the plan cache within that database. Attach a query plan to a plan guide ü Follow the plan guide that matches requirements ü Evaluate the plan guide effect on the plan cache ü

Designing Plan Guides for Parameterized Queries Course 50401A Designing Plan Guides for Parameterized Queries Module 4: Designing Databases for Optimal Performance To obtain the parameterized form of a query and create a plan guide on it, perform the following steps: Obtain the parameterized form of the query by executing the sp_get_query_template 1 Discuss the reasons to parameterize a query. Mention that a query can be parameterized for any of the following reasons: The query is submitted by using sp_executesql. Forced parameterization is enabled in the database. This parameterizes all eligible queries. A separate plan guide has been created on a class of queries to which this query belongs, specifying that they be parameterized. In addition, use the slide to explain the steps for designing plan guides for parameterized queries. Create a plan guide of type TEMPLATE to force parameterization If the query is not already being parameterized by SQL Server by using the sp_executesql or the PARAMETERIZATION FORCED database SET option 2 Create a plan guide of type SQL on the parameterized query 3

Discussion: Using Plan Guides Course 50401A Discussion: Using Plan Guides Module 4: Designing Databases for Optimal Performance What problems does plan guide solve? How? Question: What problems does plan guide solve? How? Answer: Plan guide allows providing query plan hints that can be used when the actual T-SQL query code cannot be altered. For example, Company Policies that prohibit changing any aspect of the application or data environment without regression testing, ‘third party’ softwares, Changes in underlying data structures and distributions. Note to the instructor: Explore other scenarios where the participants may Plan Guides useful.

Lesson 4: Designing Scalable Databases Course 50401A Lesson 4: Designing Scalable Databases Module 4: Designing Databases for Optimal Performance Guidelines for Scaling-Out Databases Overview of Federated Databases Selecting Federated Databases Overview of Scalable Shared Databases Guidelines for Selecting Scalable Shared Databases Overview of Replication Guidelines for Selecting Replication Overview of Database Mirroring Guidelines for Selecting Database Mirroring Discussion: Using Scalable Databases

Guidelines for Scaling-Out Databases Course 50401A Guidelines for Scaling-Out Databases Module 4: Designing Databases for Optimal Performance Scale out to multiple database servers and instances Explain the importance of scaling-out databases. Tell students that scalability improves an application’s efficiency to use more resources to do more useful work. Microsoft SQL server supports two different types of scalability solution—scale-out and scale-up. Scaling out improves the processing capacity of a system by adding one or more additional computers, or nodes, instead of increasing the hardware of a single computer. Scaling out is adding more resources to divide the database workload. You can discuss the guidelines for specific goals such as: Choosing multiple data stores: Scale out to multiple SQL Server databases in the same instance. Scale out to multiple SQL Server instances. Scale out to multiple database servers. Scaling out for performance: Understand the requirements of the application. Match hardware to the workload. Keep most data access operations local. Scale out with redundancy: Optimize the database to support specific functionality. Exploit local autonomy and availability. Implement load balancing. Tolerate failure. Distribute data and minimize latency. Store multiple copies of data. Manage the additional complexity and security. In addition to scaling out, you can discuss scaling up. Mention how scaling up is different from scaling out and explain the benefits and drawbacks of this approach. To scale up your database, you need to buy new or improved hardware for your database machine. This hardware may include faster controllers, a faster disk subsystem, more RAM, and more processors. Adding new hardware to a machine accelerates your applications. However, the hardware for enterprise-level server machines is expensive, so this can be a prohibitively expensive endeavor. Scale out with redundancy Scale up for improved performance

Overview of Federated Databases Course 50401A Overview of Federated Databases Module 4: Designing Databases for Optimal Performance SQL Server shares the database processing load across a group of servers that process database requests cooperatively. This cooperative group of servers is called a federation. Provide an overview of federated databases. Explain that a multiple-tier system balances the processing load for each tier across multiple servers. These servers are managed independently, but cooperate to process database requests from the applications. Such a cooperative group of servers is called a federation. In addition, you can use the slide to explain the differences between a single server tier and a federated server tier. Single Server Tier Federated Server Tier There is one instance of SQL Server on the production server. There is one instance of SQL Server on each member server. The production data is stored in one database. Each member server has a member database, containing a copy of each table, with only the data relevant to that site. Each table is typically a single entity. Distributed partitioned views are used to make it appear as if there was a full copy of the original table on each member server. All connections are made to the single server, and all SQL statements are processed by the same instance of SQL Server. The application layer must be able to direct the SQL statements to the member server that contains most of the data referenced by the statement.

Asymmetric Partitions Distributed Partitioned Views Course 50401A Selecting Federated Databases Module 4: Designing Databases for Optimal Performance Symmetric Partitions Asymmetric Partitions Distributed Partitioned Views Explain that building a federation of database servers involves designing a set of distributed partitioned views that spreads data across servers. Partitioning works well if the tables in the database are naturally divisible into similar partitions where most of the rows accessed by any SQL statement can be put on the same member server. Tables are clustered in related units. Discuss the following guidelines for selecting symmetric and asymmetric partitions. Symmetric partitions. Partitioned views are most effective if all tables in a database can be partitioned symmetrically in the following ways: Related data is put on the same member server so that most SQL statements routed to the correct member server will have minimal requirements for data on other member servers. A distributed partitioned view design goal can be stated as an 80/20 rule—design partitions so that most SQL statements can be routed to a member server, where at least 80 percent of the data is on that server, and distributed queries are required for 20 percent or less of the data. Data is partitioned uniformly across member servers. For example, suppose a company has divided North America into regions. Each employee works in one region, and customers make most of their purchases in the state or province where they live. The region and employee tables are partitioned along regions. Customers are partitioned between regions by state or province. Although some queries require data from multiple regions, the data needed for most queries is on the server for one region. Asymmetric partitions. Asymmetric partitions cause some member servers to assume larger roles than others. For example, only some of the tables in a database may be partitioned, with the tables that have not been partitioned remaining on the original server. Asymmetric partitions can provide much of the performance of a symmetric partition, with the following benefits: Improving the performance of a database that cannot be symmetrically partitioned by asymmetrically partitioning some of its tables. Partitioning a large existing system by making a series of iterative, asymmetric improvements. You should also discuss the guidelines for using distributed partitioned views. Explain that distributed partitioned views can be used to implement a federation of database servers. To do this, you need to consider the following: Develop a list of the SQL statements that the application will execute during typical processing periods. Divide the list into SELECT, UPDATE, INSERT, and DELETE categories and order the list in each category by the frequency of execution. Find clusters of tables that can be partitioned along the same dimension, such as part number or department number so that all rows related to individual occurrences of that dimension will end up on the same member server. Match the frequency of SQL statements against partitions defined from analyzing foreign keys. Select the partitioning that will best support the mix of SQL statements in your application. If some sets of tables can be partitioned in more than one way, use the frequency of SQL statements to determine which of the partitions satisfy the largest number of SQL statements. Define the SQL statement routing rules. The routing rules must be able to define which member server can most effectively process each SQL statement. They must establish a relationship between the context of the input of the user and the member server that contains the bulk of the data required to complete the statement. The applications must be able to take a piece of data entered by the user and match it against the routing rules to determine which member server should process the SQL statement. Distributed Partitioned Views Symmetric Partitions Asymmetric Partitions Asymmetric partitions can: Improve the performance of databases that cannot be symmetrically partitioned Partition a large, existing system by using a series of iterative, asymmetric improvements To use distributed partitioned views, consider the: Pattern of SQL statements executed by an application Relationships of the tables Frequency of SQL statements against the partitions SQL statement routing rules Symmetric partitions are effective when: Related data is put on the same member server Data is partitioned uniformly across the member servers

Overview of Scalable Shared Databases Course 50401A Overview of Scalable Shared Databases Module 4: Designing Databases for Optimal Performance Scalable shared databases let you attach a read-only reporting database to multiple server instances over a storage area network (SAN) SAN Using the slide, introduce the concept of scalable shared databases. Discuss the benefits and limitations of scalable shared databases. Scalable Shared Databases The scalable shared database feature allows you to scale out a read-only database built for a reporting database. To be made into a scalable shared database, the reporting database must reside on a set of dedicated, read-only volumes whose primary purpose is hosting the database. Using commodity hardware for servers and volumes, you can scale out a reporting database, which provides an identical view of the reporting data on multiple reporting servers. This feature provides a smooth update path for the reporting database. After the reporting database is built on a set of reporting volumes, the volumes are marked as read-only and mounted to multiple reporting servers. On each reporting server, the reporting database is attached to an instance of Microsoft SQL Server 2005 or 2008 and becomes available as a shared scalable database. Mention the following benefits of scalable shared databases. Scale out of the workload on your reporting databases by using commodity servers and hardware. A scalable shared database is a cost-effective way of making a read-only data mart or data warehouses accessible to multiple server instances for reporting purposes, such as running queries or using Reporting Services. Provides workload isolation. Each server uses its own memory, CPU, and tempdb database, which prevents one poorly-tuned query from monopolizing all server resources. Ensures an identical view of reporting data from all servers. All server instances are configured identically to ensure an identical reporting data from all servers. Mention the following limitations of scalable shared databases: The database must be on a read-only volume. The data files are accessible only over a SAN. The databases are supported by Windows Storage running only on Windows Server 2003 SP1 or later. Scalable shared database configurations are limited to eight server instances per shared database. Scalable shared databases do not support database snapshots. Benefits Limitations Allows workload scale-out on reporting databases by using commodity servers and hardware Provides workload isolation Ensures identical views of reporting data from all servers The database must be on a read-only volume The data files can be accessed only over a SAN The databases do not support database snapshots

Guidelines for Selecting Scalable Shared Databases Course 50401A Guidelines for Selecting Scalable Shared Databases Module 4: Designing Databases for Optimal Performance Verify that the reporting servers and associated reporting database are running on identical platforms Update all reporting servers for a scalable shared database uniformly Limit your scalable shared database configurations to eight server instances per shared database Ensure that the reporting database has the same layout as the production database Use a single path for the reporting database and the production database Ensure that the scalable shared database is on a read-only volume that is accessible over your SAN from all the reporting servers Ensure that all the server instances use the same sort order Ensure that all the server instances use the same memory footprint Discuss the guidelines for selecting scalable shared databases. Explain the guidelines to ensure that the environment supports scalable shared databases. When explaining the point on using a single path for the reporting and production databases, mention that you should use the same drive letter for the reporting database and the same directory path for the database. Explain the considerations that apply if the reporting database uses a different drive letter from the production database, such as: If you build the reporting database by restoring a database backup, your RESTORE DATABASE statement requires a WITH MOVE clause that specifies the full path of the restored data files. If your reporting database is a copy of the production database, the FOR ATTACH clause of your CREATE DATABASE statement must list all files and specify its full path when you attach the reporting database.

Transactional Replication Peer-to-Peer Replication Course 50401A Overview of Replication Module 4: Designing Databases for Optimal Performance Snapshot Replication Distributes data exactly as it appears at a specific moment in time and does not monitor for updates to the data Discuss the different types of replication, such as Snapshot replication, Transactional replication, Merge replication, and Peer-to-Peer replication. Point out the scenarios that are suitable for each replication type. Explain that you can use Snapshot replication if: Data changes occur infrequently. It is acceptable to have copies of data that are out of date with respect to the Publisher for a period of time. Small volumes of data are replicated. A large volume of data changes occurs over a short period of time. Explain that you can use Transactional replication if: You want incremental changes to be propagated to Subscribers as they occur. The application requires low latency between the time changes are made at the Publisher and the changes arrive at the Subscriber. The application requires access to intermediate data states. The Publisher has a high volume of insert, update, and delete activity. The Publisher or Subscriber is a non-SQL Server database, such as Oracle. Explain that you can use Merge replication if: Multiple Subscribers can update the same data at various times and propagate those changes to the Publisher and to other Subscribers. Subscribers need to receive data, make changes offline, and synchronize changes with the Publisher and other Subscribers. Each Subscriber requires a different partition of data. Conflicts occur and you need the ability to detect and resolve them. The application requires net data change rather than access to intermediate data states. Explain that you can use Peer-to-Peer replication if: You have SQL Server 2008 Enterprise. All participant databases contain identical schema and data. Each node uses its own distribution database. This eliminates the potential of having a single point of failure. There are tables and other objects that cannot be included in a single publication database. A publication is enabled for peer-to-peer replication before any subscriptions are created. Subscriptions are initialized by using a backup or with the “replication support only” option. Identity columns are not used. When using identities, you must manually manage the ranges assigned to the tables at each participating database. Transactional Replication Takes an initial snapshot. Subsequent data changes and schema modifications are delivered to the Subscriber as they occur Merge Replication Takes an initial snapshot. Subsequent data changes and schema modifications are tracked with triggers Replication Peer-to-Peer Replication Provides a scale-out and high-availability solution by maintaining copies of data across multiple server instances

Guidelines for Selecting Replication Course 50401A Guidelines for Selecting Replication Module 4: Designing Databases for Optimal Performance Snapshot Replication Merge Replication Transactional Peer-to-Peer Use the slide to discuss the guidelines for selecting each type of replication. When discussing the guidelines for selecting Transactional replication, mention the following points relating to triggers on a subscription database: By default, triggers execute with the XACT_ABORT setting ON. If a statement within a trigger causes an error while the Distribution Agent is applying changes at the Subscriber, the entire batch of changes will fail, rather than the individual statement. You should avoid including explicit transactions in triggers at the Subscriber. When discussing the SELECT and INSERT statements for Merge replication, mention that Merge replication uses a globally unique identifier (GUID) column to identify each row during the merge replication process. If a published table does not have a unique identifier column with the ROWGUIDCOL property and a unique index, replication adds the column. If a table is no longer published and replication added the column, the column is removed. If the column already existed, it is not removed. Discuss how peer-to-peer replication supports the core features of transactional replication, but does not support the following options: Initialization and reinitialization with a snapshot Row and column filters Timestamp columns Non-SQL Server Publishers and Subscribers Immediate updating and queued updating subscriptions Anonymous subscriptions Partial subscriptions Attachable subscriptions and transformable subscriptions (Both of these options were deprecated in SQL Server 2005.) Shared Distribution Agents The Distribution Agent parameter -SubscriptionStreams and the Log Reader Agent parameter -MaxCmdsInTran The article properties @destination_owner and @destination_table Explain the following properties that have special considerations: The publication property @allow_initialize_from_backup requires a value of true. The article property @replicate_ddl requires a value of true; @identityrangemanagementoption requires a value of manual; and @status requires that option 24 is set. The value for article properties @ins_cmd, @del_cmd, and @upd_cmd cannot be set to SQL. The subscription property @sync_type requires a value of none or automatic. Snapshot Replication Merge Replication Transactional Replication Peer-to-Peer Replication Create and secure the snapshot folder Estimate the disk space required to transfer and store snapshot files Schedule snapshots at off-peak hours Set up a mail-enabled user account in Active Directory Domain Services (ADDS) Use each node for its own distribution database Avoid including tables in multiple peer-to-peer publications in a single publication database Enable publications for peer-to-peer replication before creating subscriptions Initialize subscriptions by using a backup Avoid using identity columns Ensure adequate space for the transaction log Ensure adequate space for the distribution database Declare primary keys for each published table Consider the issues with using triggers Consider using large object (LOB) data types Ensure that any SELECT and INSERT statements that reference published tables use column lists Filter out Timestamp columns during article validation Specify a value of TRUE for the @stream_blob_columns parameter of sp_addmergearticle Add a dummy UPDATE statement within a transaction Track changes when performing bulk updates

Witness Server (optional) Working of Database Mirroring Course 50401A Overview of Database Mirroring Module 4: Designing Databases for Optimal Performance Benefits Witness Server (optional) Improved data protection Provide an overview of database mirroring. Explain that database mirroring is a software solution for increasing database availability. You can implement mirroring on a per database basis. Explain how database mirroring works and point out that it works only with databases that use the full recovery model. Discuss the different operating modes that mirroring uses. Explain that a database mirroring session runs with either synchronous or asynchronous operation. Under asynchronous operation, the transactions commit without waiting for the mirror server to write the log to disk, which maximizes performance. Under synchronous operation, a transaction is committed on both partners, but the transaction latency increases. Mention that there are two mirroring operating modes—high-safety and high-performance. The high-safety mode supports synchronous operation. Under the high-safety mode, when a session starts, the mirror server synchronizes the mirror database together with the principal database as quickly as possible. As soon as the databases are synchronized, a transaction is committed on both partners, at the cost of increased transaction latency. The high-performance operating mode runs asynchronously. The mirror server tries to keep up with the log records sent by the principal server. The mirror database might lag somewhat behind the principal database. Typically, the gap between the databases is small. However, the gap can become significant if the principal server is under a heavy work load or the system of the mirror server is overloaded. In the high-performance mode, as soon as the principal server sends a log record to the mirror server, the principal server sends a confirmation to the client. It does not wait for an acknowledgement from the mirror server. This means that transactions commit without waiting for the mirror server to write the log to disk. Such asynchronous operation enables the principal server to run with minimum transaction latency, while risking some data loss. You can continue the discussion by explaining the concept of role switching and its following three forms: Automatic Failover. This requires a high-safety mode and the presence of the mirror server and a witness. The database must already be synchronized, and the witness must be connected to the mirror server. The role of the witness is to verify whether a given partner server is up and functioning. If the mirror server loses its connection to the principal server, but the witness is still connected to the principal server, the mirror server does not initiate a failover. Manual Failover. This requires a high-safety mode. The partners must be connected to each other, and the database must already be synchronized. Forced Service. This is possible if the principal server has failed and the mirror server is available. Allows reporting of Mirror Server Improved database availability Data Flow Principal Server Mirror Server Improved availability of the production database during upgrades Working of Database Mirroring

Guidelines for Selecting Database Mirroring Course 50401A Guidelines for Selecting Database Mirroring Module 4: Designing Databases for Optimal Performance Consider using the high-performance mode for disaster-recovery scenarios in which the principal and mirror servers are separated by a significant distance and where you do not want small errors to impact the principal server ü Use the slide to explain the guidelines for selecting database mirroring. Mention that the high-performance mode supports only the forced service (with possible data loss) form of role switching. This mode uses the mirror server as a warm standby server. Discuss the scenarios in which you can use the high-performance mode. You can also discuss the impact of a witness server on the high-performance mode. In addition, you can explain the options that database owners can adopt if the principal server fails. Consider using log shipping as an alternative to asynchronous database mirroring ü Consider setting the WITNESS property to OFF if the SAFETY property is set to OFF when you use Transact-SQL to configure high-performance mode ü When the principal server fails, you can: Leave the database unavailable until the principal server becomes available Manually update the database and then begin a new database mirroring session Sparingly use forced service on the mirror server ü

Discussion: Using Scalable Databases Course 50401A Discussion: Using Scalable Databases Module 4: Designing Databases for Optimal Performance Federated databases can increase the total storage and performance in extremely high capacity or high performance systems. What is the single key element necessary to ensure that a query is executed on the server contains the appropriate data? What is the primary problem that scalable shared databases solve? A single table from the production database is required to be copied to a different database, on a different server instance. Select the best solution from the following options. Why? (A) Clustering, (B) Mirroring, (C) Replication Question: Federated databases can increase the total storage and performance in extremely high capacity or high performance systems. What is the single key element necessary to ensure that a query is executed on the server contains the appropriate data? Answer: The SQL statement routing rules ensures that a query is executed on the server contains the appropriate data. Question: What is the primary problem that scalable shared databases solve? Answer: The scalable shared database removes the performance impact of reporting off of the production server. Question: A single table from the production database is required to be copied to a different database, on a different server instance. Select the best solution from the following options. Why? (1) Clustering, (2) Mirroring, (3) Replication Answer: The best solution to copy a single table from the production database to a different database would be Replication. Clustering copies data at the server level, Mirroring copies data at the database level while replication copies data at the table level.

Lab 4: Designing Databases for Optimal Performance Course 50401A Lab 4: Designing Databases for Optimal Performance Module 4: Designing Databases for Optimal Performance Exercise 1: Applying Optimization Techniques Exercise 2: Creating Plan Guides Exercise 3: Designing a Partitioning Strategy In this lab, students will examine the business requirements and identify different ways to improve performance. Students will enhance the database performance by creating appropriate indexes, plan guide, and partition. They will also define a partition strategy for the company database. Exercise 1: In this exercise, students will review the methods to increase query performance. Exercise 2: In this exercise, students will: Create a plan guide. View the created plan guide. Exercise 3: Create filegroups and files. Create a partition function. Create a partition scheme. Create a partitioned table. Insert data into the partitioned table. View the partitioned data. Before the students begin the lab, read the scenario associated with each exercise to the class. This will reinforce the broad issue that the students are troubleshooting and will help to facilitate the lab discussion at the end of the module. Remind the students to complete the discussion questions after the last lab exercise. Note: The lab exercise answer keys are provided on the Course Companion CD. To access the answer key, click the link located at the bottom of the relevant lab exercise page. Logon Information Virtual machine User name Password NYC-SQL1 Administrator Pa$$w0rd Estimated time: 60 minutes

Course 50401A Lab Scenario Module 4: Designing Databases for Optimal Performance You are a lead database administrator at QuantamCorp. You are working on the Human Resources Vacation and Sick Leave Enhancement (HR VASE) project that is designed to enhance the current HR system of your organization. This system is based on the QuantamCorp sample database in SQL Server 2008. The main goals of the HR VASE project are as follows: • Provide managers with current and historical information about employee vacation and sick-leave data. • Provide permission to individual employees to view their vacation and sick-leave balances. • Provide permission to selected employees in the HR department to view and update employee vacation and sick-leave data. • Provide permission to the HR manager to view and update all data. •Ensure that the application uses the database in an optimal way and optimize the performance of reports for managers and HR personnel. You need to formulate a list of tasks that you would need to ensure optimal query performance. Before finalizing the task, you need to verify the result of each task. In this lab, you will examine the business requirements and identify different ways to improve performance. You will enhance the database performance by creating appropriate indexes, plan guide, and partition.

Course 50401A Lab Review Module 4: Designing Databases for Optimal Performance What is the purpose of examining the database model, schema, data metadata, and dynamic management views before you decide the course of action to improve query performance. What is a plan guide? You are developing a partitioning scheme for your application database. The table that you need to partition is sorted according to the date. Users usually access yearly data from that table. How would you design the partitioning scheme? You are working on partitioning a data warehouse table by using a column that has the datetime datatype. Why you would you use RIGHT as the RANGE parameter for the partitioning scheme? Use the questions on the slide to guide the debriefing after students have completed the lab exercises. Review Questions: Question: What is the purpose of examining the database model, schema, data metadata, and dynamic management views before you decide the course of action to improve query performance. Answer: Before deciding the course of action to improve query performance, you should examine the database model, schema, data metadata, and dynamic management views. It helps you to develop a highlevel understanding of the database so that you concentrate on the remediation resources in order to maximize the return on investment. In addition, you do not need to spend time solving a problem that does not occur very often, nor do you need to devote resources to solving a problem that will have minimal impact. Question: What is a plan guide? Answer: A plan guide is a method to provide an execution guideline to the query optimizer on the manner in which it should run the query. Question: You are developing a partitioning scheme for your application database. The table that you need to partition is sorted according to the date. Users usually access yearly data from that table. How would you design the partitioning scheme? Answer: You need 13 partitions for a table that contains a partition for each month of the previous year and for all old data. Question: You are working on partitioning a data warehouse table by using a column that has the datetime datatype. Why you would you use RIGHT as the RANGE parameter for the partitioning scheme? Answer: The RIGHT parameter of RANGE works better than the LEFT parameter when you are using a column that has the datetime datatype as the key for partitioning the table. A RANGE RIGHT has each partition starting at midnight on the first day of the specified date range. RANGE LEFT requires extensive coding to include the last day of the date range, as well as the variations to allow for different time precisions.

Module Review and Takeaways Course 50401A Module Review and Takeaways Module 4: Designing Databases for Optimal Performance Review Questions Real-world Issues and Scenarios List of Tools Review Questions: Question: Are you allowed to have a multiple column clustered index? Answer: Yes, However it is not advisable because additional columns increase the index key width and negatively impacts the performance efficiency of each non-clustered indexes. Question: Does the column sequence matter in a multiple-column nonclustered index? Why? Answer: Yes, the first column determines the index organization. You should create the index with the columns in the order most often used in search arguments. If the SARG argument requires the second column, an index scan is required rather than the more efficient index seek operation used for the first column. Question: For symmetrically partitioned server, what are the considerations you have to take into account? Answer: For symmetrically partitioned server, you should take into account the following consideration: Most of the data and indexes required to satisfy a query should be place on the same server. The data on the servers should be approximately equally distributed. Question: When you use the bulk insert statement, How will you ensure that the merge replication on the table still works? Answer: Merge replication requires the use of triggers. By default, bulk insert operations do not fire triggers. You should use the ‘FIRE TRIGGERS’ bulk insert or bcp option. Alternatively, execute the sp_addtabletocontents stored procedure after the insert operation. Question: Would it help performance if you could utilize partitioned table in your database? Would the performance be substantial on one hard disk only? Why? Answer: Yes it could be helpful to increase performance if you utilize partitioned table in your database. There should be less data in the active partition, requiring less I/O for both tables and indexes. And of course, it may also be negligible–depending on the size of the data tables and the hardware performance. Question: What kind of indexes can you create on a partitioned table? What is the difference between them? Answer: You can create filtered Indexes on a partitioned table. The indexes would be created using the same partitioning key values at the table partitions. Real-world Issues and Scenarios Question: You are the database administrator in an enterprise environment. There has been a huge amount of data loaded into a few tables. Queries that previously worked have slowed down considerably. What are the tools you can use to identify the queries that are slow, to generate new indexes or to make changes to existing indexes? Answer: You could use SQL Profiler to identify the queries that are slow. You could also use the dynamic management view ‘dm_exec_query_stats’ to return query execution data. Then you can use the Database Turning Advisor, or the dynamic management view ‘sys.dm_db_missing_index_details’ to gather information to help determine the appropriate changes in indexing strategy.

Notes Page Over-flow Slide. Do Not Print Slide. See Notes pane. Course 50401A Notes Page Over-flow Slide. Do Not Print Slide. See Notes pane. Module 4: Designing Databases for Optimal Performance Question: You have a varchar(max) column that you want added to the non-clustered index as there is a high percentage of queries that needs that column. How will you do that? Answer: You cannot add the varchar(max) column to the indexed columns. However, you can add the varchar(max) column to the index as an ‘included column’. Question: List down the steps needed to create a partitioned table for the following table. Column Name Data Type OrderID Char(7) CustID Int OrderDate SmallDateTime DeliveryDate SmallDateTime InvoiceID Int InvoiceDate SmallDateTime OrderDetailID Char(10) The table will be partitioned on the OrderDate column. There will be four partitions with dates starting 2006/01/01, and up until 2009/09/01. You can generate a "next" filegroup if required. For optimal performance, each partition should be located on a separate drive. You have five drives to work with, the E, F, G, H, I drives. Answer: The steps needed to create a partitioned table include: Use ALTER DATABASE to add five filegroups. Use ALTER DATABASE to add five files, for to each of the filegroups. Design and create a partition function. design and create a partition scheme. CREATE PARTITION FUNCTION [DateRangePF1] (datetime) AS RANGE RIGHT FOR VALUES ('20060101', '20070101', '20080101', '20090101‘,); CREATE PARTITION SCHEME DateRangePS1 AS PARTITION DateRangePF1 TO (FileGroup1, FileGroup2, FileGroup3, FileGroup4, FileGroup5); List of Tools: SQL Server Profiler SQL Server Database Engine Tuning Advisor