Case Based Reasoning Lecture 7: CBR Competence of Case-Bases.

Slides:



Advertisements
Similar presentations
Lecture 5: Reuse, Adaptation and Retention
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
SASH Spatial Approximation Sample Hierarchy
1 Refining the Basic Constraint Propagation Algorithm Christian Bessière and Jean-Charles Régin Presented by Sricharan Modali.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing.
Segmentation Divide the image into segments. Each segment:
ROC Curves.
Active Appearance Models Suppose we have a statistical appearance model –Trained from sets of examples How do we use it to interpret new images? Use an.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Sampling Methods.
Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.
Ensemble Learning (2), Tree and Forest
RESEARCH A systematic quest for undiscovered truth A way of thinking
1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Case-Based Solution Diversity Alexandra Coman Héctor Muñoz-Avila Dept. of Computer Science & Engineering Lehigh University Sources: cbrwiki.fdi.ucm.es/
Mean-shift and its application for object tracking
Case Base Maintenance(CBM) Fabiana Prabhakar CSE 435 November 6, 2006.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Basic Data Mining Technique
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
Tahir Mahmood Lecturer Department of Statistics. Outlines: E xplain the role of sampling in the research process D istinguish between probability and.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
April 14, 2015Applied Discrete Mathematics Week 10: Equivalence Relations 1 Properties of Relations Definition: A relation R on a set A is called transitive.
Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Strategies for Distributed CBR Santi Ontañón IIIA-CSIC.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
20. september 2006TDT55 - Case-based reasoning1 Retrieval, reuse, revision, and retention in case-based reasoning.
Review HW: E1 A) Too high. Polltakers will never get in touch with people who are away from home between 9am and 5pm, eventually they will eventually be.
Data Mining and Decision Support
Coverage and Scheduling in Wireless Sensor Networks Yong Hwan Kim Korea University of Technology and Education Laboratory of Intelligent.
CLUSTERING PARTITIONING METHODS Elsayed Hemayed Data Mining Course.
1 © 2011 Professor W. Eric Wong, The University of Texas at Dallas Requirements-based Test Generation for Functional Testing W. Eric Wong Department of.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Developing a diagnostic system through integration of fuzzy case-based reasoning and fuzzy ant colony system Expert Systems with Applications 28(2005)
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Sampling Chapter 5. Introduction Sampling The process of drawing a number of individual cases from a larger population A way to learn about a larger population.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Spatial Data Management
Semi-Supervised Clustering
3.3. Case-Based Reasoning (CBR)
K Nearest Neighbor Classification
Critical Issues with Respect to Clustering
Finding replicated web collections
Taxonomy of Problem Solving and Case-Based Reasoning (CBR)
The European Statistical Training Programme (ESTP)
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
COSC 4335: Other Classification Techniques
Case-Based Reasoning BY: Jessica Jones CSCI 446.
Authors: Barry Smyth, Mark T. Keane, Padraig Cunningham
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Graphical solution A Graphical Solution Procedure (LPs with 2 decision variables can be solved/viewed this way.) 1. Plot each constraint as an equation.
Chapter 13: Item nonresponse
Optimization under Uncertainty
Presentation transcript:

Case Based Reasoning Lecture 7: CBR Competence of Case-Bases

Outline The Utility Problem & Case Deletion A First Model of Case Competence Case Competence Categories Competence-Preserving Deletion A Second Model of Case Competence Competence Groups Reading

Case-Base Maintenance Case redundancy duplicates or unnecessary near neighbours in case-base may evolve during retain redundant cases may not harm decision making, but can slow down the system consult domain experts are potentially redundant cases harming performance? or may they be useful in future? Case utilisation statistics how many times is each case retrieved? if case never retrieved over a period of time may be redundant if case retrieved very frequently may indicate poor case coverage

The Utility Problem The utility problem occurs when cost of searching for relevant knowledge outweighs benefit of applying knowledge In CBR large case-bases mean expensive retrieval To cope with CBR utility problem delete any cases that do not affect the competence to solve problems the performance (time) i.e. lean case-bases

A First Case Competence Model All cases are not equal Case Competence Categories Pivotal cases contribute to competence Auxiliary cases contribute to performance Intermediate categories Spanning cases Support cases Competence-Preserving Deletion Categorise cases Order for deletion in terms of contribution to competence Smyth & Keane

Case Competence – The Basics Ideal measure of case coverage the set of target problems that it solves For a case c and a target problem t solves(c,t) means c solves t c is retrieved for t c can be adapted to solve t For a case c and a target problem set T coverage(c)={t T : solves(c,t)} Infeasible to generate set of all targets T space of target problems is too vast coverage( ) = { s} Target Case

Case Competence – The Basics Practical Measure of Case Coverage the set of cases in the case-base that it solves assumes case-base C is a representative sample of T coverage(c)={c C : solves(c,c)} the set of cases in the case-base C that c is retrieved for c can be adapted to solve Case coverage( ) = { s}

Case Competence – The Basics Reachability of a target problem t set of cases in C that provide a solution for it reachability(t) = {c C : solves(c,t)} Interested in reachability(c) for c C reachability(c)={c C : solves(c,c)} reachability( ) = { s} Case

Competence Categories: Pivotal A case is pivotal if it is reachable by no other case but itself pivotal (c) iff reachable(c) = {c} Pivotal cases are generally outliers too isolated to be solved by any other case Target problems falling within the region of a pivot can be solved only by that pivot Deletion of pivotal cases reduces competence Pivotal Auxiliary Coverage Set

Competence Categories: Auxiliary A case is auxiliary if its coverage set is a subset of the coverage of one of its reachable cases auxiliary(c) iff c reachable(c) coverage (c) coverage (c) Auxiliary cases tend to lie in clusters of cases Deletion of auxiliary cases makes no difference If one is deleted then a nearby case can be used to solve any target that the deleted auxiliary could solve Pivotal Auxiliary

Competence Categories: Spanning Spanning Cases have coverage sets that link (span) other regions of the problem space coverage(2) spans coverage of 1 & 3 no more coverage than 1 & 3 but if 3 deleted, 2 is needed Spanning cases do not directly affect competence But if cases from linked regions deleted the spanning case may be necessary Pivotal Spanning Auxiliary

Competence Categories: Support Support cases special kind of spanning case exist in groups each support case provides similar coverage to others in group Deletion of any case in support group does not reduce competence Deletion of all in group equivalent to deleting pivot 1 2 3

A Second Case Competence Model Competence Group collection of related cases Two cases belong to the same group if coverage sets overlap i.e., the two cases exhibit shared coverage Every case belongs to one and only one competence group Smyth & McKenna Group 2 Group 1 Coverage(1)={1,2} Coverage(2)={1,2,3} Coverage(3)={3} Coverage(4)={4} 2

A Second Case Competence Model Group Coverage is proportional to size of group larger groups cover more target problems inversely proportional to density of cases denser groups cover smaller regions Case-base Coverage = Group Coverage Predicted competence = case-base coverage How does real competence relate to predicted competence?

Predicted vs True Competence Experiments 1000 different cases 300 chosen randomly as unseen problems Other 700 used to build case-bases True competence % accuracy on unseen problems compared with predicted competence

Competence Holes What is a competence hole? any uncovered region of the target space What makes a competence hole interesting? size of the hole relevance to target problems

Types of Competence Holes Type 1 - Lost coverage Insufficient cases within case-base. Type 2 - No lost coverage Due to domain constraints – impossible value combinations.

Identifying Interesting Holes Methodology Competence groups that are close may merge into a single group Missing cases are competence rich spanning cases Search for new spanning cases in the regions between nearby competence groups

Identifying Interesting Holes Boundary Cases Each pair of groups has pair of maximally similar cases g H and h G ( ) for G,H Each group has n-1 boundary cases corresponding to the n-1 other groups

Identifying Interesting Holes For each group search for new spanning cases between it and its nearest neighbour group New case is between boundary cases

Case Authoring Building new cases to fill the competence holes in the case-base Methodology Generate a new case from the feature values of boundary pair cases For Nominal Features Choose Most frequent value For Continuous Features Choose Mean value

Summary Competence Competence groups Competence holes Competence based maintenance Case deletion Case authoring Boundary cases Spanning cases between boundary cases Increasing the competence of case-bases

Reading Research papers B. Smyth, M.T. Keane. Remembering To Forget – A Competence-Preserving Case Deletion Policy for CBR Systems. In Proceedings of IJCAI, pp , Canada, keane.pdf keane.pdf B. Smyth, E. McKenna. Building Compact Competent Case-bases. In proceedings of ICCBR, Munich, Germany. pp Springer Verlag, / smyth99building.pdf / smyth99building.pdf