CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 8: Organizational Structures and Retrieval Algorithms This chapter deals with how to find.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Chapter 4: Trees Part II - AVL Tree
Multidimensional Indexing
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Multidimensional Data
Lecture 11 CSS314 Parallel Computing
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
SASH Spatial Approximation Sample Hierarchy
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Primary Indexes Dense Indexes
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Linked Lists Tonga Institute of Higher Education.
Binary Tree. Contiguous versus Linked List Insertion in Contiguous list needs a lot of move. For big chunks of records it is time consuming. Linked List.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Union-find Algorithm Presented by Michael Cassarino.
CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 11: Adaptation Methods and Strategies Adaptation is the process of modifying a close, but.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 7: Methods for Index Selection The indexes of a case allow us to retrieve it when we need.
Quicksort This is probably the most popular sorting algorithm. It was invented by the English Scientist C.A.R. Hoare It is popular because it works well.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
OCR A Level F453: Data structures and data manipulation Data structures and data manipulation a. explain how static data structures may be.
Spatial Data Management
Subject Name: File Structures
Indexing Structures for Files and Physical Database Design
CS522 Advanced database Systems
Indexing and hashing.
Multidimensional Access Structures
Dynamic Domain Allocation
Spatial Indexing I Point Access Methods.
COMP 430 Intro. to Database Systems
External Methods Chapter 15 (continued)
Case-Based Reasoning CBR Cycle CBR Problem Issues
Chapter 9: Matching and Ranking Cases
Multidimensional Indexes
Lecture 28: Index 3 B+ Trees
Files Management – The interfacing
Richard Anderson Spring 2016
CS703 - Advanced Operating Systems
B+-trees In practice, B-trees are not used much as defined earlier.
Presentation transcript:

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 8: Organizational Structures and Retrieval Algorithms This chapter deals with how to find and retrieve cases from the case base for use in later problem solving or situation assessment You want to find the cases that are most useful for your present purposes Typically, but not always, these are the most similar cases We use matching and ranking procedures to compare cases and determine which will be most useful Matching and ranking is covered in the next chapter

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling2 High Level Overview At a high level, we want to: 1)Assess the new situation. Find the important features of it, that is, its indexes 2)Search the case base for partially matching cases 3)Retrieve the cases found 4)Choose the best case(s) The last two steps may be sequential or interleaved Inserting new cases works similarly The first two steps are the same Instead of retrieving cases, you insert the new case near the closest matching case How you search a case base depends on its organization

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling3 Example from MEDIATOR

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling4 Example from MEDIATOR (continued)

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling5 Flat Memory, Serial Search The organization shown at the bottom of the last slide is called flat memory, serial search All cases are stored in an array-like structure A new case is matched against each case in the case base The best match or matches are returned Details come later, but the goodness of a match depends on: 1)How closely the cases match along each dimension 2)How important each dimension is This is the simplest organization and retrieval method Inserting new cases is also simple - they can be put anywhere at all Note that case base organization is independent of case structure The cases themselves don’t have to be simple to use flat memory, serial search

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling6 Speeding Up Flat Memory, Serial Search A drawback of this approach has been speed There are three ways this approach can be made faster, short of choosing a different approach 1)Shallow indexing: Create a separate small file containing only indexes. The indexes point to cases that include them. Search the small index file and only fully consider cases pointed to by the indexes. This achieves speed at the expense of accuracy 2)Partitioning the case base: Divide the case base into smaller case bases along some important dimension. CHEF can do this, because stir-fry, noodle, and souffle dishes are independent. This works if the partitions are truly disjoint and if the partitions don’t grow too large themselves 3)Parallel processing: Search in parallel using multiple processors. The only down side to this is the expense of the multi-processor system

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling7 Hierarchical Memory Organizations In a shared feature network, you cluster cases together, so that cases that share many features are near each other in memory You build a tree with features on internal nodes and cases as leaves It’s quicker to search a tree than a list It’s harder to build the tree than the list, because new cases must be inserted in the right places, and the tree needs to stay balanced as it grows If you don’t check every case, you can’t guarantee that you don’t miss a good one

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling8 A Shared Feature Network for MEDIATOR

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling9 Prioritized Shared Feature Network A twist on the shared feature network is the prioritized shared feature network Here, you organize the tree with the most important feature at the root and more important features higher in the tree This helps to ensure that you consider all the cases that match on the most important features

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling10 A Prioritized Shared Feature Network for MEDIATOR

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling11 Discrimination Networks The most popular type of hierarchical memory organization is the discrimination network Cases are organized by features that help to tell cases apart Internal nodes ask questions, and cases are filed under the nodes according to their answers to the questions

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling12 Discrimination Network for MEDIATOR

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling13 More on Discrimination Networks One advantage of this approach is that machine learning algorithms can build discrimination trees automatically Another advantage is that this approach is especially natural for troubleshooting The main disadvantage of this approach is that when you don’t know the answer to a question, you get stuck In medical domains, for example, you seldom have answers to all of the questions, but you still want to do partial matching based on incomplete information Multiple discrimination networks help to counter this problem Here, you use multiple trees with the questions arranged in different orders and search all trees in parallel

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling14 More on Discrimination Networks CHEF used multiple discrimination networks CHEF’s indexes were the internal nodes of the tree The indexes were used both to tell what was important about cases and as physical pointers to cases for reasons of efficiency Hierarchical memory organizations are not as important for reasons of efficiency as they used to be However, when you use a flat memory organization, it’s important that the information that would otherwise be stored in hierarchical indexes gets moved into the cases themselves