Presentation is loading. Please wait.

Presentation is loading. Please wait.

Forms of Retrieval Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases.

Similar presentations


Presentation on theme: "Forms of Retrieval Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases."— Presentation transcript:

1 Forms of Retrieval Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases

2 Sources: –Textbook, Chapter 7 –Davenport & Prusack’s book on Advanced Data Structures –Samet’s book on Data Structures

3 Range Search Red light on? Yes Beeping? Yes … Transistor burned! Space of known problems

4 k-d Trees Idea: Partition of the case base in smaller fragments Representation of a k-dimensional space in a binary tree Similar to a decision tree: comparison with nodes During retrieval:  Search for a leaf, but  Unlike decision trees backtracking may occur

5 Definition: k-d Trees Given:  K types: T 1, …, T k for the attributes A 1, …, A k  A case base CB containing cases in T 1  …  T k  A parameter b (size of bucket) A K-D tree T(CB) for a case base CB is a binary tree defined as follows:  If |CB| < b then T(CB) is a leaf node (a bucket)  Else T(CB) defines a tree such that:  The root is marked with an attribute A i and a value v in A i and  The 2 k-d trees T({c  CB: c.i-attribute < v}) and T({c  CB: c.i-attribute  v}) are the left and right subtrees of the root

6 BWB-Check Ball-With in-Bounds check:  Suppose that algorithm reaches a leave node M (with at most b cases) while searching for the most similar case to P  Let c be a case in B such that dist(c,P) is the smallest  Then c is a candidate NN for P  For each boundary B of M, dist(P,B) > dist(c,P) then c is the NN  But if for any boundary B of M, if dist(P,B) < dist(c,P) then the algorithm needs to backtrack and check if in the regions of B, there is a better candidate  For computing distance, simply use: f -1 be the inverse of the distance-similarity compatible function:  distance(P,C) = f -1 (sim(P,C))

7 BOB-Check Ball-Out of-Bounds check:  Used during backtracking  Checks if for the boundary B defined in the node: dist(P,B) < dist(c,P)  Where c is our current candidate for best case (e.g., the closest case to P in the initial bucket)  If the condition is true, The algorithm needs to check if in those boundary’s regions, there is a better candidate

8 Example (0,0) (0,100) (25,35) Omaha (5,45) Denver (35,40) Chicago (50,10) Mobile (90,5) Miami Atlanta (85,15) (80,65) Buffalo (60,75) Toronto (100,0) A1A1 <35  35 Denver Omaha A2A2 <40  40 A1A1 <85  85 Mobile Atlanta Miami A1A1 <60  60 Chicago Toronto Buffalo Notes: Priority lists are used for computing kNN P(32,45)

9 Using Decision Trees as Index AiAi v1v1 v2v2 … vnvn Standard Decision Tree AiAi v1v1 v2v2 … vnvn Variant: InReCA Tree unknown Can be combined with numeric attributes AiAi v1v1 >v1v2>v1v2 … >v n unknown Notes: Supports Hamming distance May require backtracking (using BOB-check)  Operates in a similar fashion as k-d trees Priority lists are used for computing kNN

10 Properties of Retrieval with Indexed Cases Advantage: Disadvantages:  Efficient retrieval  Incremental: don’t need to rebuild index again every time a new case is entered   -error does not occur  Cost of construction is high  Only work for monotonic similarity relations


Download ppt "Forms of Retrieval Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases."

Similar presentations


Ads by Google