Efficient Case Retrieval Sources: –Chapter 7 –www.iiia.csic.es/People/enric/AICom.html –www.ai-cbr.org.

Slides:



Advertisements
Similar presentations
K-Nearest Neighbors (kNN) Given a case base CB, a new problem P, and a similarity metric sim Obtain: the k cases in CB that are most similar to P according.
Advertisements

Forms of Retrieval Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
CMPT 225 Priority Queues and Heaps. Priority Queues Items in a priority queue have a priority The priority is usually numerical value Could be lowest.
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Binary Search Trees vs. Binary Heaps. Binary Search Tree Condition Given a node i –i.value is the stored object –i.left and i.right point to other nodes.
Lec 6 Feb 17, 2011  Section 2.5 of text (review of heap)  Chapter 3.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Week 10: Heap and Priority queue. Any feature here?
PQ, binary heaps G.Kamberova, Algorithms Priority Queue ADT Binary Heaps Gerda Kamberova Department of Computer Science Hofstra University.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
CS Data Structures Chapter 5 Trees. Additional Binary Tree Operations (1/7)  Copying Binary Trees  we can modify the postorder traversal algorithm.
Priority Queues, Heaps & Leftist Trees
Fundamental Structures of Computer Science March 02, 2006 Ananda Guna Binomial Heaps.
Module 04: Algorithms Topic 07: Instance-Based Learning
Geoff Holmes and Bernhard Pfahringer COMP206-08S General Programming 2.
Efficient Case Retrieval Sources: –Chapter 7 – –
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
Data Structures Week 8 Further Data Structures The story so far  Saw some fundamental operations as well as advanced operations on arrays, stacks, and.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Data Structure Introduction.
CS 61B Data Structures and Programming Methodology July 21, 2008 David Sun.
Algorithms and data structures Protected by
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
CMPS 3130/6130 Computational Geometry Spring 2015
CHAPTER 5 PRIORITY QUEUES (HEAPS) §1 ADT Model Objects: A finite ordered list with zero or more elements. Operations:  PriorityQueue Initialize( int.
Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue. Nov 4,
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
8 January Heap Sort CSE 2011 Winter Heap Sort Consider a priority queue with n items implemented by means of a heap  the space used is.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part R3. Priority Queues.
Data Structures and Algorithms Searching Algorithms M. B. Fayek CUFE 2006.
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Heaps © 2010 Goodrich, Tamassia. Heaps2 Priority Queue ADT  A priority queue (PQ) stores a collection of entries  Typically, an entry is a.
Today’s Material Sorting: Definitions Basic Sorting Algorithms
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Heaps, Heap Sort, and Priority Queues. Background: Binary Trees * Has a root at the topmost level * Each node has zero, one or two children * A node that.
Sorting With Priority Queue In-place Extra O(N) space
Data Structures and Algorithms for Information Processing
Heaps (8.3) CSE 2011 Winter May 2018.
Priority Queues An abstract data type (ADT) Similar to a queue
CMPS 3130/6130 Computational Geometry Spring 2017
Priority Queues Chuan-Ming Liu
Sorting.
ADT Heap data structure
original list {67, 33,49, 21, 25, 94} pass { } {67 94}
Chapter 8 – Binary Search Tree
Priority Queues.
Priority Queues.
Heap Sort CSE 2011 Winter January 2019.
Priority Queues An abstract data type (ADT) Similar to a queue
Dr.Surasak Mungsing CSE 221/ICT221 Analysis and Design of Algorithms Lecture 05-2: Analysis of time Complexity of Priority.
CENG 351 Data Management and File Structures
Amortized Analysis and Heaps Intro
CO4301 – Advanced Games Development Week 4 Binary Search Trees
Heaps & Multi-way Search Trees
A Heap Is Efficiently Represented As An Array
Heaps Section 6.4, Pg. 309 (Section 9.1).
CMPT 225 Lecture 16 – Heap Sort.
Presentation transcript:

Efficient Case Retrieval Sources: –Chapter 7 – –

Themes Sequential Retrieval Two-Step Retrieval Retrieval with Indexed Cases

Problem Description Input: a collection of cases CB = {C 1, …C n } and a new problem P Output:  The most similar case (nearest neighbor, NN): A case C i in CB such that sim(C i, P) is maximal, or  A collection of k most similar cases (k-nearest neighbors, k-NN) in CB {C i_1,…, C i_k }, or  A sufficiently similar case: case C i in CB such that sim(C i, P) > th where th is a predefined threshold

Priority Queues Typical example: These priorities may override the FIFO policy of the queues Operations supported in a priority queue: Insert a new element Extract/Delete of the element with the lowest priority printing in a Unix/Linux environment. Printing jobs have different priorities.

Implementing Priority Queues: HEAPS A heap is a binary tree T such that: The key at each node is less or equal than the key of its children T is complete We assume that each node has a key, K, and other information, I. K is the priority of I.

(Non) Examples > 9! Tree is not complete Complexity of insert, remove is O(log 2 n). Top element: constant

Sequential Retrieval The case base is organized as a a sequence of cases with no order among themselves (i.e., no indexing) Retrieving the most similar or one that passes the threshold condition is straightforward Let us discuss the straightforward procedure What is the complexity of this procedure? How about if we want to retrieve the k most similar cases?  How to implement an efficient procedure to retrieve the k-most similar cases:  Use heaps!

Sequential Retrieval (II) How to implement an efficient procedure to retrieve the m most similar cases to a problem P:  Keep all candidate cases in a heap, H  Initially fill H with the first k cases. Each with a priority:  When looking at a case C in the case base, compared it against the one on the top, T, of the heap H  If then remove T from H and add C with a priority sim(C,P) sim(, P) sim(C,P) > sim(T,P)

Properties of Sequential Retrieval Complexity of retrieval of straightforward k most similar = O(  k )  Complexity of retrieval of k most similar with heaps: O(  log 2 (k))  Drawbacks:  If is too large, retrieval time may take too long (despite the low complexity)  Retrieval time is independent from the problem  Advantages:  Independent of the similarity  Simple; doesn’t require complex indexing code

Two-Step Retrieval The Mac/Fac approach (many are called/few are chosen): 1.Compute the set CAND with all cases in CB such that SIM(C,P) holds 2.Search for similar case in CAND  The question is how to define the relation SIM such that 1. Computing the relation SIM should require much less effort than computing the similarity metric, sim 2. SIM doesn’t eliminate good candidates

Possible Definitions of SIM  Partial equality: SIM(C,P) iff C and P have the same value on a pre-selected attribute  Relaxed similarity: SIM(C,P) iff the values of attributes in C and P are sufficiently similar  Domain-specific SIM: (example: in the restaurant domain if P indicates the type of restaurant, one may pre-select all cases of the same type) (example: if sim = sqr((x-v) 2 + (y-v) 2 ) ) one could define SIM iff |x-v| < 10) (example: Process Planning domain) Heuristics

Example of a “Good” SIM pa 1 pa 2 Any case for which pa 1 is not ordered before pa 2 should not be retrieved One can use these dependencies as criterion for construction SIM (Munoz-Avila & Huellen, ICCBR-1995) Problem:

Two-Step Retrieval With Databases Cases are stored in a database (e.g., MySQL) as regular records In the first step, a pre-selection is made based on cases matching some attributes (e.g., hyper-rectangles): SELECT a 1,..., a m FROM CaseBase WHERE (a 1 ≥ min 1 AND a 1 ≤ max 1 )... AND (a m ≥ min m AND a m ≤ max m ) Find k nearest neighbors from pre-selected cases Process can be refined by iteratively relaxing the query (see Section 7.6)

Properties of Two-Step Retrieval Advantage:  Disadvantages:  May lead to a significant reduction in retrieval time May result in  -error: cases very similar to the problem are discarded Obtaining SIM maybe difficult

Homework 1.Construct an example of a k-d tree and a query case such as the BOB test succeeds and hence the NN retrieval algorithm needs to backtrack. Indicate the NN for the query. 2.For the same k-d tree as 1, make a query case such as the BWB succeeds and hence the NN retrieval algorithm doesn’t need to backtrack. Indicate the NN for the query. 3.Describe a domain of your choice, describe cases in that domain including the attributes, their types and the case’s similarity metric, sim. Describe a SIM relation in that domain that will not result in the  -error.