Chapter 9: Matching and Ranking Cases

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

10-1 Chapter 10 Facilities Layout and Location McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
CLUSTERING PROXIMITY MEASURES
Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.
Chapter 1 A First Look at Statistics and Data Collection.
Copyright © 2006 Pearson Education Canada Inc Course Arrangement !!! Nov. 22,Tuesday Last Class Nov. 23,WednesdayQuiz 5 Nov. 25, FridayTutorial 5.
Severity Index for patients with Liver Cirrhosis disease Health Decision Analysis Yokabid Worku
CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 3: Reasoning Using Cases In this chapter, we look at how cases are used to reason We’ve already.
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
1 Mutli-Attribute Decision Making Scott Matthews Courses: / /
CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 11: Adaptation Methods and Strategies Adaptation is the process of modifying a close, but.
CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 8: Organizational Structures and Retrieval Algorithms This chapter deals with how to find.
Variables It is very important in research to see variables, define them, and control or measure them.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 7: Methods for Index Selection The indexes of a case allow us to retrieve it when we need.
ESTIMATING WEIGHT Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257 RG712.
Experimental Design AP Statistics Exam Review Session #4 Spring
Intro to Data Structures Concepts ● We've been working with classes and structures to form linked lists – a linked list is an example of something known.
Job Evaluation & Base Wage Systems
JOB EVALUATION MAGNETIC CONTACTORS 1/26/2018.
AP Statistics Exam Review Topic #4
Classroom Assessments Checklists, Rating Scales, and Rubrics
Nuts and Bolts of Assessment
Pharmaceutical Statistics
Optimization: Doing the
EGR 2261 Unit 10 Two-dimensional Arrays
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Ananya Das Christman CS311 Fall 2016
Data Mining – Algorithms: Instance-Based Learning
Probability and Statistics
Chapter 2: Input, Processing, and Output
Instance Based Learning
Chapter 11 Object-Oriented Design
Being Angry - examples Think privately of a recent occasion when you felt angry. B. Continue thinking about: 1. what (rather than who) exactly made you.
Analytic Hierarchy Process (AHP)
CHAPTER 4 Designing Studies
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
NATURE OF Measurement.
Helping Children Learn
A Scoring Model for Job Selection
Classroom Assessments Checklists, Rating Scales, and Rubrics
Chapter Topics 2.1 Designing a Program 2.2 Output, Input, and Variables 2.3 Variable Assignment and Calculations 2.4 Variable Declarations and Data Types.
Artificial Intelligence Lecture No. 5
PROBABILITY AND STATISTICS
Measurement and Scaling: Fundamentals and Comparative Scaling
Fermi Questions Enrico Fermi.
EXCEL BOOKS 14-1 JOB EVALUATION.
Probability and Statistics
CS 1120: Computer Science II Software Life Cycle
Chapter 5 Measuring Results and Behaviors
Map Generalization and Data Classification Gary Christopherson
Nearest Neighbors CSC 576: Data Mining.
Chapter 14: Decision Making Considering Multiattributes
Optimization: Doing the
CS 1120: Computer Science II Software Life Cycle
Chapter 2: Input, Processing, and Output
Chapter 8: Estimating with Confidence
INSTRUCTIONS for PSYCHOMETRIC TESTS.
Multicriteria Decision Making
EGR 2131 Unit 12 Synchronous Sequential Circuits
Basic Terminologies in Statistics
Retrieval Performance Evaluation - Measures
Sequence alignment, E-value & Extreme value distribution
Being Angry - examples Think privately of a recent occasion when you felt angry. B. Continue thinking about: 1. what (rather than who) exactly made you.
  Introduction When faced with the task of designing and building a VEX robot, students (and teachers) will often immediately want to pick up their tools.
  Introduction When faced with the task of designing and building a VEX robot, students (and teachers) will often immediately want to pick up their tools.
Presentation transcript:

Chapter 9: Matching and Ranking Cases Matching is the process of comparing two cases to each other and determining their degree of similarity Ranking is the process of ordering partially matching cases according to the goodness of match, or usefulness To compute the degree of match between cases, you need to: Determine which features of two cases correspond to each other Compute the degree of match between each pair of corresponding features Determine how important each feature is in assigning an overall degree of match CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Types of Matching Schemes Dimensional matching is the ability to compare two individual features Aggregate matching is the abilitly to compare two whole cases Aggregate matching involves dimensional matching Dimensional matching can be used alone, as in traversing a hierarchical memory structure and comparing the dimensions stored at each node In static matching schemes, the matching criteria are established in advance and hard coded In dynamic matching schemes, the criteria may change according to the present purpose Sometimes, you can hardcode different schemes and choose among them dynamically Sometimes, important features are determined on the fly, during situation assessment Some flexibility can be achieved by determining the important features in advance, but weighting them differently each time CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Types of Matching Schemes (continued) In absolute matching, you compute a score for how well each cases matches the new one, independently of all the other cases In relative matching, you arrange the cases in order from best to worst, without quantifying the goodness of each one This requires dynamically comparing and contrasting cases to each other, and so is more difficul than absolute matching If you use absolute matching, ranking becomes trivial Any sort routine can arrange cases from best to worst based on their absolute scores CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Input to Matching and Ranking Functions The inputs to matching and ranking functions are: The new case, analyzed in terms of its important features, or indexes The purpose you have in using the new case -- you can skip this is your system always performs the same task The recalled cases -- this may be a subset of the case base or all cases The indexes of the recalled cases Reasonable criteria for determining the goodness of match You may want the best case, all relevant cases, or any case that could be adapted to your purpose CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Feature Correspondence To match and rank cases, you need to know which features correspond to each other In some domains, this is very easy Example: To help a buyer select a new car, desired price corresponds to actual price, desired make and model correspond to actual make and model, and so on In some domains, this can be hard CASEY had to compute correspondences, because its problem description was just a list of patient symptoms -- symptoms that were not identical could still correspond, due to the nature of heart disease CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Computing Similarity Among Corresponding Features Next, you need to determine how similar the values are for corresponding features You are usually looking for some measure of distance on a qualitative or quantitative scale Most systems hard code this on a feature by feature basis Example: A user might be asked if a desired restaurant should be Inexpensive, Moderate, Expensive, or Very Expensive This might be translated to < $15, $15 - $30, $30 - $50, and > $50, with each restaurant classified as belonging to one category If a user asks for a category, say Moderate, and a restaurant is in that category, we have an exact match If the restaurant is one category away, say Inexpensive or Expensive, we have a partial match If the restaurant is more than one category away, we have no match You can use four or five categories if it makes sense for your domain CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling Numeric Features When values for features are naturally represented as numbers, you may still need special routines for comparing the numbers Pitfalls to avoid are using absolute comparison and ranges Example: Say your feature is age and two patients are ten years apart If one is 60 and the other 70, you have at least a partial match and maybe a pretty good match If one is 1 and the other 11, you may have no match at all Example: Say you try to set up ranges, like young < 30, old > 50, and middle-aged everything in between Then, 31-year-olds will match 49-year-olds better than they match 29-year-olds We deal with this using normalization and/or point ranges We could say ages within 5 years are close matches for adults and ages within 1 year are close matches for children CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Abstraction Hierarchies When qualitative or quantitative measures don’t suit your domain, you may need to organize values hierarchically How can a system tell if spinach is closer to broccoli or to hamburger? In general, the higher up you have to go in a hierarchy to find a node in common, the worse match you have Exactly how you traverse the hierarchy to find degree of match will depend on your domain CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Example: Abstraction Hierarchy Food Vegetable Fruit Meat Lamb Pork Green Yellow Citrus Berry Beef Veal Orange Hamburger Roast Spinach Steak Strawberry Chop Broccoli Squash Peas CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Importance of Features Besides considering how well features match, we also need to consider how important each feature is In an Ecommerce application, you could ask the user how important each feature is to them In some systems, the same features keep the same importance In other systems, features change in importance depending on the task at hand Kolodner uses the example of determining a salary for a professional baseball player If you want to hire a fielder, then how well he bats is important If you want to hire a pitcher, then batting is unimportant, but the speed of his fast ball becomes very important CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Matching and Importance We need to get a handle on two things at once How close features match How important it is for them to match Numeric conventions are used to indicate degree of match and degree of importance 0 means no match, 1 means exact match, and numbers in between indicate degree of partial match 0 means unimportant, 1 means of utmost importance, and numbers in between indicate degree of importance Note: This is not something that can be carried out to many decimal places. We often use rough estimates like .25, .5, and .75. The nearest neighbor algorithm is often used in practice to combine feature similarity and importance CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling

Pitfalls of Applying Nearest Neighbor The example was purposely chosen to point out some pitfalls of the nearest neighbor algorithm The obvious problem is that the three old cases are equally similar to the new case. When this happens: It’s possible that the cases really are very similar, and our case base is just too small to contain distinguishable players It’s also possible that we’re comparing the wrong features, computing the wrong comparison values, or using the wrong importance weights In our example, we’re using the wrong features for comparison We are comparing RBIs and strikeouts without considering how many games a player has been in or how many at bats he’s had We need to consider the ratio of successful attempts to opportunities Moral of Story: It’s easy to crunch numbers, but it’s not easy to know which numbers to crunch CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling