IMS 4212: Indexes (Indices) 1 Dr. Lawrence West, Management Dept., University of Central Florida Indexes—Topics Reasons for concern Data.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Physical Database Design CIT alternate keys - named constraints - indexes.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Physical Database Monitoring and Tuning the Operational System.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
Chapters 17 & 18 Physical Database Design Methodology.
CSC271 Database Systems Lecture # 30.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
IMS 6217: Primary Key Reference 1 Dr. Lawrence West, MIS Dept., University of Central Florida Primary Keys Uniqueness of Table Rows Identifier.
Index tuning Performance Tuning. Overview Index An index is a data structure that supports efficient access to data Set of Records index Condition on.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Indexing.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
Chapter 10 Designing the Files and Databases. SAD/CHAPTER 102 Learning Objectives Discuss the conversion from a logical data model to a physical database.
Methodology – Physical Database Design for Relational Databases.
IMS 4212: Data Manipulation 1 Dr. Lawrence West, MIS Dept., University of Central Florida Additional Data Manipulation Statements INSERT.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
More queries Outer joins and summary queries. Inner and outer joins An Inner join only returns matching rows from two tables –E.g. if I join the customer.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
9-1 © Prentice Hall, 2007 Topic 9: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
Connecting (relating) Data Tables to get Custom Records (Queries) Database Basics.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
IMS 4212: Application Architecture and Intro to Stored Procedures 1 Dr. Lawrence West, Management Dept., University of Central Florida
IMS 4212: Intro to Multi-Table SELECT Statements 1 Dr. Lawrence West, MIS Dept., University of Central Florida Multi-Table SELECT Statements—Topics.
IMS 4212: Constraints & Triggers 1 Dr. Lawrence West, Management Dept., University of Central Florida Stored Procedures in SQL Server.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
11-1 © Prentice Hall, 2004 Chapter 11: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
IMS 4212: Normalization 1 Dr. Lawrence West, Management Dept., University of Central Florida Normalization—Topics Functional Dependency.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
SQL Basics Review Reviewing what we’ve learned so far…….
Select Operation Strategies And Indexing (Chapter 8)
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
CompSci 280 S Introduction to Software Development
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
Index An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed.
Indexing and hashing.
Physical Database Design
Database Performance Tuning and Query Optimization
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Physical Database Design
Chapter 11 Database Performance Tuning and Query Optimization
Database Performance Part 1—Topics
Presentation transcript:

IMS 4212: Indexes (Indices) 1 Dr. Lawrence West, Management Dept., University of Central Florida Indexes—Topics Reasons for concern Data Volume Analysis Data Usage Analysis Index Design

IMS 4212: Indexes (Indices) 2 Dr. Lawrence West, Management Dept., University of Central Florida How Computers Work Data flows through different parts of the computer as application instructions are executed

IMS 4212: Indexes (Indices) 3 Dr. Lawrence West, Management Dept., University of Central Florida SQL Server Data Storage Data in tables is stored on pages and there are eight pages per extent. When more space is needed an entire extent is added to the database Each row (record) in the database is physically stored on a page and in an extent Each row has a RowID that identifies it and it’s location in the page

IMS 4212: Indexes (Indices) 4 Dr. Lawrence West, Management Dept., University of Central Florida SQL Server Data Storage (cont.) Without a clustered index (covered soon) rows are added to pages in the order of insertion. When pages are full rows are added to the next page in the extent. When extents are full new extents are created Tables keep track of the sequence of extents that contain their contents

IMS 4212: Indexes (Indices) 5 Dr. Lawrence West, Management Dept., University of Central Florida Data Retrieval By default, queries of tables require that each page be loaded into memory in sequence and each row examined to see if it meets the query conditions This is a full table scan

IMS 4212: Indexes (Indices) 6 Dr. Lawrence West, Management Dept., University of Central Florida Data Retrieval (cont.) The Page is the basic unit of IO –Entire page is moved from physical storage to RAM for evaluation In a pure table scan (the default method of retrieval) each record is examined to see if it matches the WHERE clause conditions (if any) –Test value and column value moved to CPU for testing –Records where condition is TRUE are added to result set Pages are cached and the cached copy will be read if available and needed

IMS 4212: Indexes (Indices) 7 Dr. Lawrence West, Management Dept., University of Central Florida Data Retrieval (cont.) In SQL Server page sizes are fixed at 8 KB –(Entire extent is 64 KB) –Some DBMS have different sizes –Some DBMS allow tuning on a table by table basis –8 KB is also the maximum record size Number of Records on a page depends on record size –Sum of data sizes of each column IO time for a pure scan increases with –Number of records –Record size

IMS 4212: Indexes (Indices) 8 Dr. Lawrence West, Management Dept., University of Central Florida Data Retrieval Costs Two levels of costs associated with data retrieval –Most Important: IO moving page from disk storage to RAM –Less Important: CPU effort to evaluate records –In default mode records cannot be evaluated until they have been moved into RAM We also care about physical storage space –Less important as a performance issue We also care about costs of reorganizing data as it is added to the DB or updated (later)

IMS 4212: Indexes (Indices) 9 Dr. Lawrence West, Management Dept., University of Central Florida Data Retrieval Costs (cont.) ALL Retrieval Enhancement mechanisms must be evaluated on the dimensions from the previous slide None of the enhancements come without cost Decisions affected by use of the data, not just pure database characteristics –Understanding organizational tasks and priorities key –Requires balance between technical and organizational knowledge –MIS graduates ideally positioned to participate in this analysis

IMS 4212: Indexes (Indices) 10 Dr. Lawrence West, Management Dept., University of Central Florida Data Retrieval Costs (cont.) Degree of the cost changes with many factors –Table sizes –Access mechanisms (paths—more later) –Nature of query –Number of tables needed in query –Nature of the enhancement approach Remember that our DB design goal of minimizing storage space and redundancy (normalization) spread data around the database –More tables containing transaction logic –More complicated queries

IMS 4212: Indexes (Indices) 11 Dr. Lawrence West, Management Dept., University of Central Florida Indexes If SQL Server knows the extent address, page address, and RowID of desired data it can go directly to the page in question (one page read into memory) and directly to the desired record Indexes are separate storage structures that map from values in columns of tables to the Page and RowID of the row from which the value was taken

IMS 4212: Indexes (Indices) 12 Dr. Lawrence West, Management Dept., University of Central Florida Indexes (cont.) Indexes let the system search a small record to find the exact address of a large record More records per page than the main table

IMS 4212: Indexes (Indices) 13 Dr. Lawrence West, Management Dept., University of Central Florida Indexes (cont.) There are a multitude of algorithms and techniques for implementing indexes Computer scientists develop, test, and evaluate various indexing methods Our indexing techniques will usually be determined by our choice of RDBMS

IMS 4212: Indexes (Indices) 14 Dr. Lawrence West, Management Dept., University of Central Florida The B-Tree (Balanced Tree) Index Root Page Leaf Pages Data Pages

IMS 4212: Indexes (Indices) 15 Dr. Lawrence West, Management Dept., University of Central Florida The B-Tree Index (cont.) Rows in each index page are in order according to the column(s) on which the index was created Upper level pages have sparse populations of indexes values –Not all values listed –Each entry points to the page with denser values Leaf pages (nodes) contain all values within a range Leaf pages point to the actual data page and Row ID from which the index value came

IMS 4212: Indexes (Indices) 16 Dr. Lawrence West, Management Dept., University of Central Florida Clustered Index In a clustered index the data rows are physically in the order specified by the index key Leaf Nodes in the index are actually the data pages CustomerID CompanyName ALFKI Alfreds Futterkiste ANATR Ana Trujillo Emparedados y helados ANTON Antonio Moreno Taquería AROUT Around the Horn BERGS Berglunds snabbköp BLAUS Blauer See Delikatessen BLONP Blondesddsl père et fils BOLID Bólido Comidas preparadas BONAP Bon app' BOTTM Bottom-Dollar Markets

IMS 4212: Indexes (Indices) 17 Dr. Lawrence West, Management Dept., University of Central Florida Clustered Index (cont.) Because data rows are physically ordered by the index value records must be moved around to allow insertions CustomerID CompanyName ALFKI Alfreds Futterkiste ANATR Ana Trujillo Emparedados y helados ANTON Antonio Moreno Taquería AROUT Around the Horn BERGS Berglunds snabbköp BERNI Bernie’s Fish-O-Rama BLAUS Blauer See Delikatessen BLONP Blondesddsl père et fils BOLID Bólido Comidas preparadas BONAP Bon app' BOTTM Bottom-Dollar Markets Other records must be moved Insertion

IMS 4212: Indexes (Indices) 18 Dr. Lawrence West, Management Dept., University of Central Florida Clustered Indexes (cont.) When a clustered index page is full it must “split” –Half of records are moved to new page and half remain in place –New pages may end up in new extents –Pointers must link pages in the logical order of the data Pages with extensive insertions that are not naturally in the clustered index order can take extensive processing time –E.g.—Adding Employees with SSN PK Page splits may cascade upwards to splits of index pages

IMS 4212: Indexes (Indices) 19 Dr. Lawrence West, Management Dept., University of Central Florida Clustered Indexes (cont.) Clustered indexes have significant advantages when performing range queries or when the desired index value is a ‘natural’ sequence for the data –Timestamp –CustomerID There can only be one clustered index per table (Why?) Nonclustered indexes on a clustered index table point to the clustered index leaf node

IMS 4212: Indexes (Indices) 20 Dr. Lawrence West, Management Dept., University of Central Florida Implementing Indexes Use the Manage Indexes & Keys window in Enterprise Manager Default for PK index is to make it clustered –Override if you don’t want this –Do not automatically accept the default

IMS 4212: Indexes (Indices) 21 Dr. Lawrence West, Management Dept., University of Central Florida Using Indexes SQL Server will automatically select indices to use in queries –Where clauses –Inner Join clauses –Order By clauses First column of the index must match the criteria Additional columns will be used if available

IMS 4212: Indexes (Indices) 22 Dr. Lawrence West, Management Dept., University of Central Florida Indexes (cont.) Places to consider implementing indexes –Primary Keys (required in most RDBMS) –Foreign Keys –Other ‘access fields’ E.g., Customer phone number if used as a lookup field Look at data usage analysis for other potential targets –Fields in WHERE clause of SQL statements –Fields in ORDER BY clause of SQL query

IMS 4212: Indexes (Indices) 23 Dr. Lawrence West, Management Dept., University of Central Florida Indexes (cont.) Contraindications for indexes –Very little variation among the attribute values in the indexed field(s) Class (Freshman, Sophomore, etc.) Gender –Many null values in the indexed field(s) –Small tables (Index may be as large as the table)

IMS 4212: Indexes (Indices) 24 Dr. Lawrence West, Management Dept., University of Central Florida Indexes (cont.) Don't forget indexing second (or more) FKs in composite PK associative entities when both PK elements are also FKs Searching for OrderID will use PK index Searching for ProductID cannot use PK index—needs its own index

IMS 4212: Indexes (Indices) 25 Dr. Lawrence West, Management Dept., University of Central Florida Index Benefits Avoid table scan Quick location of record address—one page record to get data –Small row sizes per each index entry→many fewer page reads to find record address –B-tree algorithm discards high percentage of records with each level of the index pages evaluated SQL stops looking when it knows it has finished— indices can determine this Indexes may be used for IF EXISTS queries without accessing data pages (Referential Integrity Checking)

IMS 4212: Indexes (Indices) 26 Dr. Lawrence West, Management Dept., University of Central Florida Index Costs Extra storage space Each table index must be updated with each data modification to the table –Increased processing time Easy to implement and sometimes overused

IMS 4212: Indexes (Indices) 27 Dr. Lawrence West, Management Dept., University of Central Florida Index Tricks and Techniques Consider dropping and then rebuilding indices when bulk updates are required Nonclustered indices can have additional data included in the leaf node –Avoid retrieval of main data page –Increases index size and therefore reduces efficiency