Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
1 Suffix Trees and Suffix Arrays Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 8)
Tries Standard Tries Compressed Tries Suffix Tries.
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
Digital Search Trees & Binary Tries Analog of radix sort to searching. Keys are binary bit strings.  Fixed length – 0110, 0010, 1010,  Variable.
Hashing Techniques.
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
Chapter 4: Trees Radix Search Trees Lydia Sinapova, Simpson College Mark Allen Weiss: Data Structures and Algorithm Analysis in Java.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Digital Search Trees & Binary Tries Analog of radix sort to searching. Keys are binary bit strings.  Fixed length – 0110, 0010, 1010,  Variable.
Searching with Structured Keys Objectives
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Dictionaries CS 105. L11: Dictionaries Slide 2 Definition The Dictionary Data Structure structure that facilitates searching objects are stored with search.
1 COP 3538 Data Structures with OOP Chapter 8 - Part 2 Binary Trees.
1 Trees Tree nomenclature Implementation strategies Traversals –Depth-first –Breadth-first Implementing binary search trees.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
1 Tries When searching for the name “Smith” in a phone book, we first locate the group of names starting with “S”, then within those we search for “m”,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
CSE Advanced Algorithms Instructor : Gautam Das Submitted by Raja Rajeshwari Anugula & Srujana Tiruveedhi.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Tries 07/28/16 11:04 Text Compression
IP Routers – internal view
CS522 Advanced database Systems
Mark Redekopp David Kempe
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Digital Search Trees & Binary Tries
Patricia Practical Algorithm To Retrieve Information Coded In Alphanumeric. Compressed binary trie. All nodes are of the same data type (binary tries use.
Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes,
Data Structures and Algorithms for Information Processing
Indexing and Hashing Basic Concepts Ordered Indices
Digital Search Trees & Binary Tries
Higher Order Tries Key = Social Security Number.
Database Design and Programming
Indexing 4/11/2019.
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
Presentation transcript:

Sets of Digital Data CSCI 2720 Fall 2005 Kraemer

Digital Data  In earlier work with BSTs and various balanced trees, we compared keys for order or equality  Here, we take advantage of structure of key  Use it as an index, or  Decompose string key into characters, or  Treat key as numerical quantity on which we can perform operations

Assumptions  We will construct and manipulate sets that  Are drawn from a universe U of size N  U = {u 0, …u N-1 }  A relatively simple procedure exists by which we can compute, for an element u  U, the index i such that u = u i.  Easy if U is set of integers  Also easy if U is set of characters with character codes in a contiguous interval

Bit Vector  Used to represent a subset S  U  A table of N bits, Bits[0.. N-1]  Bits[i] == 1 if u i  S  Bits[i] == 0 if u i  S  Example: today’s attendance student number 1 = present 0 = absent

Bit Vectors  Assume:  determining element index takes constant time  accessing position in table takes constant time  May actually take several ops, and depend somewhat on N(size of universe), but not on size of set represented  Then:  Insert, Delete, Member are constant time ops

Bit Vectors  A subset of a set of size N always takes N bits to represent, independent of size of subset  Makes sense if:  N is not too large  need to represent sets of size comparable to N

Storage Efficiency  Bit Vector vs. Binary Trees  Binary Tree, set of size n  Requires n(2p + K) bits  K >= lg N, size of field to represent key value  p = number of bits in a pointer  Bit Vector, takes N bits  If n  N, then bit vector more efficient  If p = K = 32, then tree becomes more space efficient when n/N  1%  Actually, when n(2p + K) = N, which is when n/N = 1/96

When to use Bit Vectors?  When universe is relatively small  When sets are large in relation to size of universe

Advantages of Bit Vectors  O(1) implementation of Insert, Delete, Member  Union and Intersection easy  Implement via Boolean and and or operations  May actually take less than one op/element, as operations are performed on full machine word  If machine word == 32, then one machine operation handles 32 potential elements of set

Disadvantages of Bit Vectors  On some computers access to individual bits can require shifting and masking operations (expensive)  Result is that Member may be much more expensive than Union  Initialization takes  (N) -- zero all the bits in the vector  But can use constant time initialization algorithm  But that makes storage requirement go to 2p + 1 bits per element  So, in practice, just use machine ops to set to zero, which are efficient

Tries and Digital Search Trees  If the key can be decomposed into characters, then the characters of the key can be used as indices  Tries are based on this idea  “trie” is the middle symbol of retrieval, a pun on tree, but pronounced “try”

Tries  Assume k possible character values  A trie is a (k+1)-ary tree  each node a table of k+1 pointers  One pointer for each possible character  One for the end of string character, 

Trie Example

Tries  Path for key of m characters is length m, with pointer at   Don’t need to store key itself.. It is the path followed.  Info field might be pointed to by  element

Tries: Analysis  Let:  n be the number of keys stored in a trie  l be the length(in characters) of the longest key  s be the number of nodes in the trie  k be the size of the alphabet  Pro:  Access time is O(l), independent of k, n and s  Con:  Size -- requires (k+1) * s * p bits  Most pointers are null, so lots of wasted space

Strategies for reducing storage requirements of tries 1.Implement a k-ary trie with m nodes as a 2-D, m by k table A B C D E … M …. P …. T …. 

Table approach  Number the nodes in the diagram of slide 13 from 1 to m  The table entry corresponding to j th child of i th node is the index of the child node  How does that save space? Just as many nodes and elements as on slide 13  … need only ceil(lg(m)) bits to represent, smaller than a pointer …

Patricia Tree: Another strategy for reducing space in a trie  Patricia tree  Practical Algorithm to Retrieve Information Coded in Alphanumeric  Eliminate nodes with only one nonempty child  Can now skip right from T to  in TURING in our example  Skip from MA …. To E or  in the MENDEL, MENDELEEV chain  But need to store with each node the index of the character on which it discriminates  And need to store the key itself at the leaf

Patricia tree

de la Briandais trees  Another strategy to save space vs. standard tries  Use a linked list instead of a table at the node level  Each pointer labeled with the character it indexes  longer search time than tries; depends on size of character set  saves significant amounts of memory

de la Briandais

Another strategy …  Use tries at the first few levels  Use ordinary BSTs or de la Briandais at the lower levels  reasoning:  speed advantage at the top, but not too much extra memory required  save space at lower levels

Digital Search Trees  Treat keys as bit strings  (strings over the alphabet {0,1})  Binary tree – search directed left on 0, right on 1  Each node contains not only two pointers, but also contains a key that matches that string prefix  Compare for equality before searching left or right  If frequencies are known, store higher frequency keys nearer root  Can be grown dynamically  Expected Search time: O(log n)

Digital Search Tree