Forms of Retrieval Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases.

Slides:



Advertisements
Similar presentations
Nearest Neighbor Queries using R-trees
Advertisements

Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
S. Sudarshan Based partly on material from Fawzi Emad & Chau-Wen Tseng
Tree Data Structures &Binary Search Tree 1. Trees Data Structures Tree  Nodes  Each node can have 0 or more children  A node can have at most one parent.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
K-Nearest Neighbors (kNN) Given a case base CB, a new problem P, and a similarity metric sim Obtain: the k cases in CB that are most similar to P according.
Multidimensional Indexing
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Searching on Multi-Dimensional Data
Chapter 6: Transform and Conquer
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Advanced Data Structures Chapter 16. Priority Queues Collection of elements each of which has a priority. Does not maintain a first-in, first-out discipline.
SASH Spatial Approximation Sample Hierarchy
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Efficiency of Algorithms
Chapter 5 Trees PROPERTIES OF TREES 3 4.
Decision Trees Chapter 18 From Data to Knowledge.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
K-d tree k-dimensional indexing. Jaruloj Chongstitvatana k-d trees 2 Definition Let k be a positive integer. Let t be a k -d tree, with a root node p.
1 Section 9.2 Tree Applications. 2 Binary Search Trees Goal is implementation of an efficient searching algorithm Binary Search Tree: –binary tree in.
Efficient Case Retrieval Sources: –Chapter 7 – –
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Module 04: Algorithms Topic 07: Instance-Based Learning
Review Binary Tree Binary Tree Representation Array Representation Link List Representation Operations on Binary Trees Traversing Binary Trees Pre-Order.
K-d tree k-dimensional indexing. Jaruloj Chongstitvatana k-d trees 2 Definition Let k be a positive integer. Let t be a k -d tree, with a root node p.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Efficient Case Retrieval Sources: –Chapter 7 – –
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
2-3 Trees, Trees Red-Black Trees
Lecture1 introductions and Tree Data Structures 11/12/20151.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Trees 2: Section 4.2 and 4.3 Binary trees. Binary Trees Definition: A binary tree is a rooted tree in which no vertex has more than two children
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Lecture 15 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Foundation of Computing Systems Lecture 4 Trees: Part I.
KNN & Naïve Bayes Hongning Wang
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Decision Trees DEFINITION: DECISION TREE A decision tree is a tree in which the internal nodes represent actions, the arcs represent outcomes of an action,
Strategies for Spatial Joins
Indexing Structures for Files and Physical Database Design
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Backtracking And Branch And Bound
Binary Search Tree (BST)
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
Chapter 6 Transform-and-Conquer
Extra: B+ Trees CS1: Java Programming Colorado State University
Spatial Indexing I Point Access Methods.
B+-Trees.
B+-Trees.
Lecture 22 Binary Search Trees Chapter 10 of textbook
CMPS 3130/6130 Computational Geometry Spring 2017
KD Tree A binary search tree where every node is a
Heap Chapter 9 Objectives Upon completion you will be able to:
Orthogonal Range Searching and Kd-Trees
Chapter 6 Transform and Conquer.
K Nearest Neighbor Classification
Indexing and Hashing Basic Concepts Ordered Indices
Instance Based Learning
Chapter 6: Transform and Conquer
Advanced Implementation of Tables
Advanced Implementation of Tables
Algorithms CSCI 235, Spring 2019 Lecture 26 Midterm 2 Review
Presentation transcript:

Forms of Retrieval Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases

Sources: –Textbook, Chapter 7 –Davenport & Prusack’s book on Advanced Data Structures –Samet’s book on Data Structures

Range Search Red light on? Yes Beeping? Yes … Transistor burned! Space of known problems

k-d Trees Idea: Partition of the case base in smaller fragments Representation of a k-dimensional space in a binary tree Similar to a decision tree: comparison with nodes During retrieval:  Search for a leaf, but  Unlike decision trees backtracking may occur

Definition: k-d Trees Given:  K types: T 1, …, T k for the attributes A 1, …, A k  A case base CB containing cases in T 1  …  T k  A parameter b (size of bucket) A K-D tree T(CB) for a case base CB is a binary tree defined as follows:  If |CB| < b then T(CB) is a leaf node (a bucket)  Else T(CB) defines a tree such that:  The root is marked with an attribute A i and a value v in A i and  The 2 k-d trees T({c  CB: c.i-attribute < v}) and T({c  CB: c.i-attribute  v}) are the left and right subtrees of the root

BWB-Check Ball-With in-Bounds check:  Suppose that algorithm reaches a leave node M (with at most b cases) while searching for the most similar case to P  Let c be a case in B such that dist(c,P) is the smallest  Then c is a candidate NN for P  For each boundary B of M, dist(P,B) > dist(c,P) then c is the NN  But if for any boundary B of M, if dist(P,B) < dist(c,P) then the algorithm needs to backtrack and check if in the regions of B, there is a better candidate  For computing distance, simply use: f -1 be the inverse of the distance-similarity compatible function:  distance(P,C) = f -1 (sim(P,C))

BOB-Check Ball-Out of-Bounds check:  Used during backtracking  Checks if for the boundary B defined in the node: dist(P,B) < dist(c,P)  Where c is our current candidate for best case (e.g., the closest case to P in the initial bucket)  If the condition is true, The algorithm needs to check if in those boundary’s regions, there is a better candidate

Example (0,0) (0,100) (25,35) Omaha (5,45) Denver (35,40) Chicago (50,10) Mobile (90,5) Miami Atlanta (85,15) (80,65) Buffalo (60,75) Toronto (100,0) A1A1 <35  35 Denver Omaha A2A2 <40  40 A1A1 <85  85 Mobile Atlanta Miami A1A1 <60  60 Chicago Toronto Buffalo Notes: Priority lists are used for computing kNN P(32,45)

Using Decision Trees as Index AiAi v1v1 v2v2 … vnvn Standard Decision Tree AiAi v1v1 v2v2 … vnvn Variant: InReCA Tree unknown Can be combined with numeric attributes AiAi v1v1 >v1v2>v1v2 … >v n unknown Notes: Supports Hamming distance May require backtracking (using BOB-check)  Operates in a similar fashion as k-d trees Priority lists are used for computing kNN

Properties of Retrieval with Indexed Cases Advantage: Disadvantages:  Efficient retrieval  Incremental: don’t need to rebuild index again every time a new case is entered   -error does not occur  Cost of construction is high  Only work for monotonic similarity relations