Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Indexing.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
© 2004 Goodrich, Tamassia Hash Tables1  
Data Structures Hash Tables
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
LEARNING OBJECTIVES Index files.
Previous Lecture Revision Previous Lecture Revision Hashing Searching : –The Main purpose of computer is to store & retrieve –Locating for a record is.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
FALL 2004CENG 351 File Structures1 Indexing Reference: Sections
Chapter 7 Indexing Objectives: To get familiar with: Indexing
Chapter 9 Chapter 9 TABLES AND INFORMATION RETRIEVAL.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Chapter 9 Tables and Information Retrieval. Tables Introduction In chapter 7 we showed that –By use of key comparisons alone, it is impossible to complete.
File Processing - Indexing MVNC1 Indexing Jim Skon.
1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Comp 335 File Structures Hashing.
1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Chapter 5 Record Storage and Primary File Organizations
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Data Structures Using C++ 2E
Ch. 8 File Structures Sequential files. Text files. Indexed files.
Data Structures Using C++ 2E
Hash Tables.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Indexing 4/11/2019.
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing

Kruse/Ryba ch09 What is an INDEX? An index lets you impose order on a file without actually rearranging the file. An index gives keyed access to fixed or variable- length record files.

Kruse/Ryba ch09 Simple Index A simple index uses a simple array to implement the index. Called by IBM ISAM (Indexed Sequential Access Method)

Kruse/Ryba ch09 ANG COL COL DG DG FF LON MER RCA WAR LON|2312|Romeo and Juliet|... RCA|2626|Quartet in C Sharp... WAR|23699|Topuchstone|... ANG|3795|Symphony No. 9|... COL|38358|Nebraska|... DG|18807|Symphony No. 9|... MER|75016|Coq d'or Suite|... COL|31809|Symphony No. 9|... DG|139201|Violin Concerto|... FF|245|Good News|... Indexfile Key Reference Datafile Actual data record

Kruse/Ryba ch09 Concerns Two files to deal with Index file easier to deal with than data file because it has fixed-length records Fixed-length fields impose limits on size of keys In the example, the index carries no information other than the keys and the reference fields. Other data could be included. (length)

Kruse/Ryba ch09 Basic Operations Create the original empty index and data files. Load the index file into memory before using it. Rewrite the index file from memory after using it. Add records to the data file and index. Delete records from the data file. Update records in the data file.

Kruse/Ryba ch09 Creating the Files Create both the index and data files as empty files. Write headers to both files.

Kruse/Ryba ch09 Loading the Index into Memory Assume that the index file is small enough to fit into RAM. Each array element is an index record.

Kruse/Ryba ch09 Safety Mechanisms Know when the index is out of date. Be able to reconstruct the index from the data file.

Kruse/Ryba ch09 Record Addition Adding a new record to the data file requires that we also add a record to the index file.

Kruse/Ryba ch09 ANG COL COL DG DG FF LON MER RCA WAR LON|2312|Romeo and Juliet|... RCA|2626|Quartet in C Sharp... WAR|23699|Topuchstone|... ANG|3795|Symphony No. 9|... COL|38358|Nebraska|... DG|18807|Symphony No. 9|... MER|75016|Coq d'or Suite|... COL|31809|Symphony No. 9|... DG|139201|Violin Concerto|... FF|245|Good News|... Indexfile Key Reference Datafile Actual data record LON|783|Sweet Somthings|... LON MER RCA

Kruse/Ryba ch09 Record Deletion Any of the methods discussed in chapter 5 could be used. However, the index file must now be considered. The index entry could be removed and the array adjusted or the index entry could just be marked as deleted.

Kruse/Ryba ch09 Record Updating Updating changes the key field –conceptually, this is best thought of as a deletion followed by an addition Updating does not change a key field –this will not cause any changes in the index file but could well cause changes in the data file if the size of the record changes.

Kruse/Ryba ch09 Indexes too large to fit in RAM Essentially, the later text material deals with this problem. Hashed Organization Tree-structures

Kruse/Ryba ch09 Access by Multiple Keys BEETHOVEN ANG3795 BEETHOVEN DG BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Secondary key organized by composer

Kruse/Ryba ch09 Record Addition Additional indices imply additional overhead when new records are added.

Kruse/Ryba ch09 Record Deletion This usually implies removing all references to that record in the file system. Since the primary index does reflect a deletion, a request from a secondary index will result in a failure, implying the record has been deleted. Such a method would result in wasted space in the secondary index.

Kruse/Ryba ch09 Record Updating If the update changes the secondary key –it may be necessary to rearrange the secondary key index so it stays in sorted order If the update changes the primary key –this creates a major impact on secondary indices If the update is confined to other fields. –Updates that do not affect either the primary or secondary key fields do not affect the secondary key index.

Kruse/Ryba ch09 Access by Multiple Keys COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG Secondary key organized by recording title

Kruse/Ryba ch09 Access by Multiple Keys COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

Kruse/Ryba ch09 Access by Multiple Keys COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

Kruse/Ryba ch09 Access by Multiple Keys BEETHOVEN ANG3795 BEETHOVEN DG BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

Kruse/Ryba ch09 Access by Multiple Keys BEETHOVEN ANG3795 BEETHOVEN DG BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

Kruse/Ryba ch09 ANG COL COL DG DG FF LON MER RCA WAR LON|2312|Romeo and Juliet|... RCA|2626|Quartet in C Sharp... WAR|23699|Topuchstone|... ANG|3795|Symphony No. 9|... COL|38358|Nebraska|... DG|18807|Symphony No. 9|... MER|75016|Coq d'or Suite|... COL|31809|Symphony No. 9|... DG|139201|Violin Concerto|... FF|245|Good News|... Indexfile Key Reference Datafile Actual data record LOGICAL AND

Kruse/Ryba ch09 Problems We have to rearrange the index file every time a new record is added to the file, even if the new record is from an existing secondary key.

Kruse/Ryba ch09 A Better Solution: Linking the List of References Inverted lists work their way backward from a secondary key to the primary key to the record itself.

Kruse/Ryba ch09 BEETHOVEN COREA DVORAK PROKOFIEV ANG3795 DG DG18807 RCA2626 WAR23699 COL31809 LON2312

Kruse/Ryba ch09 BEETHOVEN COREA DVORAK PROKOFIEV ANG3795 DG DG18807 RCA2626 WAR23699 COL31809 LON2312 Might create a large number of small files, one for each composer.

Kruse/Ryba ch09 Improved Version Redefine the secondary key index so it consists of records with two fields - a secondary key field, and a field containing the relative record number of the first corresponding primary key reference in the inverted list. The actual primary key references associated with each secondary key would be stored in a separate entry-sequenced file.

Kruse/Ryba ch09 3 BEETHOVEN 2 COREA 7 DVORAK 10 PROKOFIEV 6 RIMSKY-KORSAKOV 4 SPRINGSTEEN 9 SWEET HONEY IN LON2312 RCA2626 WAR23699 ANG2795 COL38358 DG18807 MER75016 COL31809 DG FF245 ANG Secondary Index File Lable ID List File

Kruse/Ryba ch0931 Hash Functions Truncation –Ignore part, use the rest for key Folding –Partition and combine Modular Arithmetic Perfect Hash Function

Kruse/Ryba ch0932 int hash(const Key &target) { int value = 0; for (int position = 0; position < 8; position++) value = 4 * value + target.key_letter(position); return value % hash_size; } C++ Example

Kruse/Ryba ch

Kruse/Ryba ch0934 Collision Resolution Linear Probing –Clustering Rehashing Increment Functions Quadratic Probing –h+i 2 Key-Dependent Increments –Increment = (int)the_data.key_letter(0); Random Probing

Kruse/Ryba ch0935 Error_code Hash_table::insert(const Record &new_entry) { Error_code result = success; int probe_count, // be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed Key null; // Null key for comparison purposes. null.make_blank(); probe = hash(new_entry); probe_count = 0; increment = 1; while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1) / 2) {// Has overflow occurred? probe_count++; probe = (probe + increment) % hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; else if(table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result; }

Kruse/Ryba ch0936 Collision Resolution with Buckets 0 1 2

Kruse/Ryba ch0937 Collision Resolution by Chaining

Kruse/Ryba ch0938 Collision Resolution by Chaining Advantages –Saving of space –Simple, efficient collision handling –Size of hash table does not need to exceed the number of records –Deletion becomes quick and easy Disadvantage –Links require space

Kruse/Ryba ch0939 Theoretical Comparison Load factor Successful search, expected number of probes: Chaining Open, random probes Open, linear probes

Kruse/Ryba ch0940 Theoretical Comparison Load factor Unsuccessful search, expected number of probes: Chaining Open, random probes Open, linear probes

Kruse/Ryba ch0941 Empirical Comparison Load factor Successful search, expected number of probes: Chaining Open, quadratic probes Open, linear probes

Kruse/Ryba ch0942 Empirical Comparison Load factor Unsuccessful search, expected number of probes: Chaining Open, quadratic probes Open, linear probes

Kruse/Ryba ch0943 Highlights

Kruse/Ryba ch0944 Chapter 9 - The End