Search Algorithm Lecture-14 1. Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection:

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
Data Structures Using C++ 2E
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hashing as a Dictionary Implementation
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
BTrees & Bitmap Indexes
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Hash Tables1 Part E Hash Tables  
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Hash Tables1 Part E Hash Tables  
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
CSE 373 Data Structures and Algorithms Lecture 18: Hashing III.
Chapter 19 Query Processing and Optimization
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Query Processing and Optimization
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
1 B + -Trees: Search  If there are n search-key values in the file,  the path is no longer than  log  f/2  (n)  (worst case).
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSE 373: Data Structures and Algorithms Lecture 16: Hashing III 1.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Sets and Maps Chapter 9.
TCSS 342, Winter 2006 Lecture Notes
Database Management Systems (CS 564)
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Lecture 2- Query Processing (continued)
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advance Database Systems
Database Design and Programming
Sets and Maps Chapter 9.
Database Administration
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Data Structures and Algorithm Analysis Hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

Search Algorithm Lecture-14 1

Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection: S1. Linear search (brute force): Retrieve every record in the file, and test whether its attribute values satisfy the selection condition. S2. Binary search : If the selection condition involves an equality comparison on a key attribute on which the file is ordered, binary search (which is more efficient than linear search) can be used. S3. Using a primary index or hash key to retrieve a single record: If the selection condition involves an equality comparison on a key attribute with a primary index (or a hash key), use the primary index (or the hash key) to retrieve the record.

Chapter 15-3 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation (cont.): Search Methods for Simple Selection: S4. Using a primary index to retrieve multiple records: If the comparison condition is >, ≥, <, or ≤ on a key field with a primary index, use the index to find the record satisfying the corresponding equality condition, then retrieve all subsequent records in the (ordered) file. S5. Using a clustering index to retrieve multiple records: If the selection condition involves an equality comparison on a non-key attribute with a clustering index, use the clustering index to retrieve all the records satisfying the selection condition. S6. Using a secondary (B+-tree) index : On an equality comparison, this search method can be used to retrieve a single record if the indexing field has unique values (is a key) or to retrieve multiple records if the indexing field is not a key. In addition, it can be used to retrieve records on conditions involving >,>=, <, or <=. (FOR RANGE QUERIES)

Chapter 15-4 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation (cont.): Search Methods for Complex Selection: S7. Conjunctive selection : If an attribute involved in any single simple condition in the conjunctive condition has an access path that permits the use of one of the methods S2 to S6, use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive condition. S8. Conjunctive selection using a composite index : If two or more attributes are involved in equality conditions in the conjunctive condition and a composite index (or hash structure) exists on the combined field, we can use the index directly.

Chapter 15-5 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation (cont.): Search Methods for Complex Selection: S9. Conjunctive selection by intersection of record pointers : This method is possible if secondary indexes are available on all (or some of) the fields involved in equality comparison conditions in the conjunctive condition and if the indexes include record pointers (rather than block pointers). Each index can be used to retrieve the record pointers that satisfy the individual condition. The intersection of these sets of record pointers gives the record pointers that satisfy the conjunctive condition, which are then used to retrieve those records directly. If only some of the conditions have secondary indexes, each retrieved record is further tested to determine whether it satisfies the remaining conditions.

Chapter 15-6 Algorithms for SELECT and JOIN Operations (7) Implementing the SELECT Operation (cont.): Whenever a single condition specifies the selection, we can only check whether an access path exists on the attribute involved in that condition. If an access path exists, the method corresponding to that access path is used; otherwise, the “brute force” linear search approach of method S1 is used. For conjunctive selection conditions, whenever more than one of the attributes involved in the conditions have an access path, query optimization should be done to choose the access path that retrieves the fewest records in the most efficient way.

Chapter 15-7 Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL JOIN) – two–way join: a join on two files e.g. R A=B S – multi-way joins: joins involving more than two files. e.g. R A=B S C=D T Examples (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE

Chapter 15-8 Algorithms for SELECT and JOIN Operations (9) Implementing the JOIN Operation (cont.): Methods for implementing joins: J1.Nested-loop join (brute force): For each record t in R (outer loop), retrieve every record s from S (inner loop) and test whether the two records satisfy the join condition t[A] = s[B]. J2.Single-loop join (Using an access structure to retrieve the matching records): If an index (or hash key) exists for one of the two join attributes — say, B of S — retrieve each record t in R, one at a time, and then use the access structure to retrieve directly all matching records s from S that satisfy s[B] = t[A].

Chapter 15-9 Algorithms for SELECT and JOIN Operations (10) Implementing the JOIN Operation (cont.): Methods for implementing joins: J3.Sort-merge join: If the records of R and S are physically sorted (ordered) by value of the join attributes A and B, respectively, we can implement the join in the most efficient way possible. Both files are scanned in order of the join attributes, matching the records that have the same values for A and B. In this method, the records of each file are scanned only once each for matching with the other file— unless both A and B are non-key attributes, in which case the method needs to be modified slightly.

Chapter Algorithms for SELECT and JOIN Operations (11) Implementing the JOIN Operation (cont.): Methods for implementing joins: J4.Hash-join: The records of files R and S are both hashed to the same hash file, using the same hashing function on the join attributes A of R and B of S as hash keys. A single pass through the file with fewer records (say, R) hashes its records to the hash file buckets. A single pass through the other file (S) then hashes each of its records to the appropriate bucket, where the record is combined with all matching records from R.

BTREE fundamentals Lecture

B-TREE A B-tree is a specialized multiway tree designed especially for use on disk In a B-tree each node may contain a large number of keys Only a small number of nodes must be read from disk to retrieve an item The goal is to get fast access to the data 12

A multiway tree of order m is an ordered tree where each node has at most m children. For each node, if k is the actual number of children in the node, then k - 1 is the number of keys in the node. If the keys and subtrees are arranged in the fashion of a search tree, then this is called a multiway search tree of order m. For example, the following is a multiway search tree of order 4. Note that the first row in each node shows the keys, while the second row shows the pointers to the child nodes. Of course, in any useful application there would be a record of data associated with each key, so that the first row in each node might be an array of records where each record contains a key and its associated data. Another approach would be to have the first row of each node contain an array of records where each record contains a key and a record number for the associated data record, which is found in another file. This last method is often used when the data records are large. 13 B-TREE

Hashing Functions Lecture

Hash tables hash table: an array of some fixed size, that positions elements according to an algorithm called a hash function 15 elements (e.g., strings) 0 … length –1 hash func. h(element) hash table

Hashing, hash functions The idea: somehow we map every element into some index in the array ("hash" it); this is its one and only place that it should go – Lookup becomes constant-time: simply look at that one slot again later to see if the element is there – add, remove, contains all become O(1) ! For now, let's look at integers ( int ) – a "hash function" h for int is trivial: store int i at index i (a direct mapping) if i >= array.length, store i at index (i % array.length) – h(i) = i % array.length 16

Hash function in action Add these elements to the hash table: – 89 – 18 – 49 – 58 – 9 17

Hash function example elements = Integers h(i) = i % 10 add 41, 34, 7, and 18 constant-time lookup: – just look at i % 10 again later We lose all ordering information: – getMin, getMax, removeMin, removeMax – the various ordered traversals – printing items in sorted order

Hash collisions collision: the event that two hash table elements map into the same slot in the array example: add 41, 34, 7, 18, then 21 – 21 hashes into the same slot as 41! – 21 should not replace 41 in the hash table; they should both be there collision resolution: means for fixing collisions in a hash table

Linear probing linear probing: resolving collisions in slot i by putting the colliding element into the next available slot (i+1, i+2,...) – add 41, 34, 7, 18, then 21, then collides (41 is already there), so we search ahead until we find empty slot 2 57 collides (7 is already there), so we search ahead twice until we find empty slot 9 – lookup algorithm becomes slightly modified; we have to loop now until we find the element or an empty slot what happens when the table gets mostly full?

Clustering problem clustering: nodes being placed close together by probing, which degrades hash table's performance – add 89, 18, 49, 58, 9 – now searching for the value 28 will have to check half the hash table! no longer constant time

Quadratic probing quadratic probing: resolving collisions on slot i by putting the colliding element into slot i+1, i+4, i+9, i+16,... – add 89, 18, 49, 58, 9 49 collides (89 is already there), so we search ahead by +1 to empty slot 0 58 collides (18 is already there), so we search ahead by +1 to occupied slot 9, then +4 to empty slot 2 9 collides (89 is already there), so we search ahead by +1 to occupied slot 0, then +4 to empty slot 3 – clustering is reduced – what is the lookup algorithm?

Quadratic probing in action 23

Chaining chaining: All keys that map to the same hash value are kept in a linked list

Load factor load factor: ratio of elements to capacity – load factor = size / capacity = 6 / 10 =

Rehashing, hash table size rehash: increasing the size of a hash table's array, and re-storing all of the items into the array using the hash function – can we just copy the old contents to the larger array? – When should we rehash? Some options: when load reaches a certain level (e.g., = 0.5) when an insertion fails What is the cost (Big-Oh) of rehashing? what is a good hash table array size? – how much bigger should a hash table get when it grows? 26

Hash table removal lazy removal: instead of actually removing elements, replace them with a special REMOVED value – avoids expensive re-shuffling of elements on remove – example: remove 18 --> – lookup algorithm becomes slightly modified what should we do when we hit a slot containing the REMOVED value? – keep going – add algorithm becomes slightly modified what should we do when we hit a slot containing the REMOVED value? – use that slot, replace REMOVED with the new value – add(17) --> slot REMOVED 957

Hashing practice problem Draw a diagram of the state of a hash table of size 10, initially empty, after adding the following elements: 7, 84, 31, 57, 44, 19, 27, 14, and 64 Assume that the hash table uses linear probing. Assume that rehashing occurs at the start of an add where the load factor is Repeat the above problem using quadratic probing. 28

Writing a hash function If we write a hash table that can store objects, we need a hash function for the objects, so that we know what index to store them We want a hash function to: 1.be simple/fast to compute 2.map equal elements to the same index 3.map different elements to different indexes 4.have keys distributed evenly among indexes 29

Hash function for strings elements = Strings let's view a string by its letters: – String s : s 0, s 1, s 2, …, s n-1 how do we map a string into an integer index? (how do we "hash" it?) one possible hash function: – treat first character as an int, and hash on that h(s) = s 0 % array.length is this a good hash function? When will strings collide? 30

Better string hash functions view a string by its letters: – String s : s 0, s 1, s 2, …, s n-1 another possible hash function: – treat each character as an int, sum them, and hash on that h(s) = % array.length what's wrong with this hash function? When will strings collide? a third option: – perform a weighted sum of the letters, and hash on that – h(s) = % array.length 31

Analysis of hash table search load: the load of a hash table is the ratio:  no. of elements  array size analysis of search, with linear probing: – unsuccessful:  – successful:  32

Analysis of hash table search analysis of search, with chaining: – unsuccessful: (the average length of a list at hash(i)) – successful: 1 + ( /2) (one node, plus half the avg. length of a list (not including the item)) 33

Compound collections Collections can be nested to represent more complex data example: A person can have one or many phone numbers – want to be able to quickly find all of a person's phone numbers, given their name implement this example as a HashMap of Lists – keys are Strings (names) – values are Lists (e.g ArrayList) of Strings, where each String is one phone number StringList name -->list of phone numbers "Phil" -->[" ", " ",...] 34

35 The Tuning Cycle Lecture -17

Performance tuning Performance tuning is the main activity associated with performance management. Reduced to its most basic level, tuning consists of finding and eliminating bottlenecks — a condition that occurs, and is revealed, when a piece of hardware or software in a server approaches the limits of its capacity. 36

The Tuning Cycle The tuning cycle, which is an iterative series of controlled performance experiments.. Repeat the four phases of the tuning cycle shown below until you achieve the performance goals that you established prior to starting the tuning process. Tuning Cycle: 37

Collecting The Collecting phase is the starting point of any tuning exercise. Gathering data with the collection of performance counters that are chosen for a specific part of the system. Analyzing After you have collected the performance data that you require for tuning the selected part of the system, you need to analyze the data to determine bottlenecks. 38 Tuning Cycle

Configuring After you have collected your data and completed the analysis of the results, you can determine which part of the system is the best candidate for a configuration change and implement this change. Testing After implementing a configuration change, you must complete the appropriate level of testing to determine the impact of the change on the system that you are tuning. 39

When testing, be sure to: Check the correctness and performance of the application that you are using for testing by looking for memory leaks and inordinate delays in response to client requests. Ensure that all tests are working correctly. Make sure you can repeat all tests by using the same transaction mix and the same clients generating the same load. Document changes and results. 40 Tuning Cycle

Database Performance Tuning DBMS Installation – Setting installation parameters Memory Usage – Set cache levels – Choose background processes Input / Output (I/O) Contention – Distribution of heavily accessed files CPU Usage – Monitor CPU load Application tuning – Modification of SQL code in applications 41