1 Automated Discovery of Recommendation Knowledge David McSherry School of Computing and Information Engineering University of Ulster +

Slides:



Advertisements
Similar presentations
Lecture 5: Reuse, Adaptation and Retention
Advertisements

Thinking Like an Economist
Basic I/O Relationship Knowledge-based: "Tell me what fits based on my needs"
Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
CHAPTER 13 Inference Techniques. Reasoning in Artificial Intelligence n Knowledge must be processed (reasoned with) n Computer program accesses knowledge.
Fast Algorithms For Hierarchical Range Histogram Constructions
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Active Learning and Collaborative Filtering
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Artificial Intelligence MEI 2008/2009 Bruno Paulette.
A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Case-based Reasoning System (CBR)
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
Recommender systems Ram Akella November 26 th 2008.
A Web-based Intelligent Hybrid System for Fault Diagnosis Gunjan Jha Research Student Nanyang Technological University Singapore.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Efficient Case Retrieval Sources: –Chapter 7 – –
CBR in Medicine Jen Bayzick CSE435 – Intelligent Decision Support Systems.
Strong Method Problem Solving.
1 Using R for consumer psychological research Research Analytics | Strategy & Insight September 2014.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Case-Based Recommendation Presented by Chul-Hwan Lee Barry Smyth.
Case-Based Solution Diversity Alexandra Coman Héctor Muñoz-Avila Dept. of Computer Science & Engineering Lehigh University Sources: cbrwiki.fdi.ucm.es/
Perfect Competition *MADE BY RACHEL STAND* :). I. Perfect Competition: A Model A. Basic Definitions 1. Perfect Competition: a model of the market based.
13: Inference Techniques
Gary MarsdenSlide 1University of Cape Town Case Study - Nokia 5110 We will try to put together what we have learnt to date by looking at a cell- phone,
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Demand for Repeated Insurance Contracts with Unknown Loss Probability Emilio Venezian Venezian Associates Chwen-Chi Liu Feng Chia University Chu-Shiu.
Uncertainty Management in Rule-based Expert Systems
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
PLUG IT IN 5 Intelligent Systems. 1.Introduction to intelligent systems 2.Expert Systems 3.Neural Networks 4.Fuzzy Logic 5.Genetic Algorithms 6.Intelligent.
AI in Knowledge Management Professor Robin Burke CSC 594.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.
Chapter 6 - Basic Similarity Topics
20. september 2006TDT55 - Case-based reasoning1 Retrieval, reuse, revision, and retention in case-based reasoning.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Internal and External Sorting External Searching
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Artificial Intelligence
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
MIS2502: Data Analytics Advanced Analytics - Introduction
School of Computer Science & Engineering
Rule Induction for Classification Using
CS 9633 Machine Learning Concept Learning
Case-Based Reasoning.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Advanced Embodiment Design 26 March 2015
K Nearest Neighbor Classification
Action Association Rules Mining
Case-Based Reasoning BY: Jessica Jones CSCI 446.
Nearest Neighbors CSC 576: Data Mining.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

1 Automated Discovery of Recommendation Knowledge David McSherry School of Computing and Information Engineering University of Ulster +

2 Overview  Approaches to retrieval in recommender systems  Rule-based retrieval (of cases) in Rubric  Automating the discovery of recommendation rules  Role of default preferences in rule discovery  Related work  Conclusions +

3 The Recommendation Challenge Often we expect salespersons to make reliable recommendations based on limited information: ☺ I’m looking for a 3-bedroom detached property To recommend an item with confidence, a salesperson has to consider: The customer’s known preferences The available alternatives All features of the recommended item  including features not mentioned by the customer

4 Are Recommender Systems Reliable? Features not mentioned in the user’s query are typically ignored in: Nearest neighbour (NN) retrieval Decision tree approaches Multi-criterion decision making  Assumed (or default) preferences are sometimes used for attributes like price  But for many attributes, no assumptions can be made about the user’s preferences

5..., reasonably priced,..., Preferences Pyramid Known preferences beds = 3 type = detached..., location = A,..., Unknown preferences Default preferences

6 CBR Recommender Systems  Descriptions of available products (e.g. houses) are stored as cases in a product dataset e.g. Loc Beds Type Weight: (3) (2) (1) Case 1: A 3 semi Case 2: B 4 det Case 3: B 3 det and retrieved in response to user queries

7 Inductive Retrieval Bedrooms?Case 2 (B, 4, det) 4 Type? 3 Case 3 (B, 3, det) det Case 1 (A, 3, semi) semi Not only are the user’s unknown preferences ignored - the user is prevented from expressing them

8 Inductive Retrieval Bedrooms?Case 2 (B, 4, det) 4 Type? 3 Case 3 (B, 3, det) det Case 1 (A, 3, semi) semi The recommended case exactly matches the user’s known preferences - but what if she prefers location A?

9 The standard CBR approach is to recommend the most similar case  The similarity of a case C to a query Q over a subset A Q of the product attributes A is: where w a is the weight assigned to a Nearest Neighbour Retrieval

10 Incomplete Queries in NN Loc Beds Type (3) (2) (1) Q : 3det Sim Case 1: A 3semi 2 Case 2: B 4 det 1 Case 3: B 3 det3 most-similar(Q) = {Case 3}

11 Incomplete Queries in NN Loc Beds Type (3) (2) (1) Q : 3det Sim Case 1: A 3semi 2 Case 2: B 4 det 1 Case 3: B 3 det3 most-similar(Q) = {Case 3}  Again, Case 3 is a good recommendation if the user happens to prefer location B

12 Incomplete Queries in NN Loc Beds Type (3) (2) (1) Q* : A3det Sim Case 1: A 3semi 5 Case 2: B 4 det 1 Case 3: B 3 det3 most-similar(Q*) = {Case 1}  But not if she prefers location A

13 Rule-Based Retrieval in Rubric  In rule-based retrieval, a possible recommendation rule for Case 3 might be: Rule 1: if beds = 3 and type = det then Case 3  Given a target query, a product dataset, and a set of recommendation rules, Rubric:  Retrieves the case recommended by the first rule that covers the target query  If none of the available rules covers the target query, it abstains from making a recommendation

14  For any case C and query Q, we say that Q → C is a dominance rule if: most-similar(Q*) = {C} for all extensions Q* of Q  As Rule 1 is not a dominance rule for Case 3, it is potentially unreliable: Rule 1: if beds = 3 and type = det then Case 3 Dominance Rules

15 A Dominance Rule for Case 3 Loc Beds Type (3) (2) (1) Q :B3 Sim Case 1: A 3semi 2 Case 2: B 4 det 3 Case 3: B 3 det5 most-similar(Q) = {Case 3}

16 A Dominance Rule for Case 3 Loc Beds Type (3) (2) (1) Q :B3 Sim Max Case 1: A 3semi 2 3 Case 2: B 4 det 34 Case 3: B 3 det5  As Cases 1 and 2 can never equal the similarity of Case 3, a dominance rule for Case 3 is: Rule 2: if loc = B and beds = 3 then Case 3

17 Coverage of a Dominance Rule  A dominance rule Q → C can be applied to any query Q* such that Q  Q* since by definition: most-similar(Q*) = {C}  Also by definition, most-similar(Q**) = {C} for any extension Q** of Q*  So no other case can equal the similarity of C regardless of the user’s unknown preferences

18 The Role of Case Dominance  A given case C 1 dominates another case C 2 with respect to a query Q if: Sim(C 1, Q*) > Sim(C 2, Q*) for all extensions Q* of Q (McSherry, IJCAI-03)  So Q → C is a dominance rule if and only if C dominates all other cases with respect to Q  This is not the same as Pareto dominance

19 Identifying Dominated Cases  A given case C 1 dominates another case C 2 with respect to a query Q if and only if: (McSherry, IJCAI-03)  Cases dominated by a given case can thus be identified with modest computational effort

20 Dominance Rule Discovery (McSherry & Stretch, IJCAI-05) Our algorithm targets maximally general dominance rules Q → C such that Q  description(C) B, 3, det B 3 det B, 3 B, det 3, det nil Case 3 dominates Case 1 and Case 2 with respect to this query Description of Case 3

21 Complexity of Rule Discovery Our discovery algorithm is applied with each case in turn as the target case For a product dataset with n cases and k attributes, where n  2 k, the worst-case complexity is: O(k  n 2  2 k ) If n < 2 k, the worst-case complexity is: O(k  n  2 2k )

22 In a dataset with k attributes, the number of rules discovered for a target case can never be more than k C  k/2  (McSherry & Stretch, IJCAI-05) With 1,000 products and 9 attributes, the maximum number of discovered rules is 126,000 Rule-set sizes tend to be much smaller in practice Maximum Rule-Set Size No. of Attributes: Maximum:

23 Digital Camera Case Base Source:McCarthy et al. (IUI-2005) No of cases: 210 Attributes:make (9), price (8), style (7), resolution (6), optical zoom (5), digital zoom (1), weight (4), storage type (2), memory (3) Discovered Rule: if make = toshiba and style = ultra compact and optical zoom = 3 then Case 201

24 Discovered Rule-Set Sizes Digital Camera Case Base (k = 9)

25 Lengths of Discovered Rules Digital Camera Case Base (k = 9)

26 Limitations of Discovered Rules Example Rule if make = sony and price = 336 and style = compact and resolution = 5 and weight = 236 then Case 29 Problem Exact numeric values (e.g., price, weight) make the rule seem unnatural/unrealistic They also limit its coverage Solution Assume the preferred price and weight are the same for all users

27 LIB and MIB Attributes  A less-is-better (LIB) attribute is one that most users would prefer to minimise e.g. price, weight  A more-is-better (MIB) attribute is one that most users would prefer to maximise e.g. resolution, optical zoom, digital zoom, memory  Often in NN retrieval, LIB and MIB attributes are treated as nearer-is-better attributes: ☺ How much would you like to pay? 300

28 LIB and MIB Attributes  A less-is-better (LIB) attribute is one that most users would prefer to minimise e.g. price, weight  A more-is-better (MIB) attribute is one that most users would prefer to maximise e.g. resolution, optical zoom, digital zoom, memory  Often in NN retrieval, LIB and MIB attributes are treated as nearer-is-better attributes: ☺ How much would you like to pay? 300 This doesn’t make sense, as it implies that the user would prefer to pay 310 than 280

29 Role of Default Preferences in Rule Discovery (McSherry & Stretch, AI-2005)  We assume the preferred value of a LIB/MIB attribute is the lowest/highest value in the case base  These preferences are represented in a default query: Q D : price = 106, memory = 64, resolution = 14, optical zoom = 10, digital zoom = 8, weight = 100  In the dominance rules Q → C now targeted by our algorithm, Q includes the default preferences in Q D  Thus the assumed preferences are implicit in the discovered rules

30 Similarity to the Default Query  We use the standard measure for numeric attributes: where x is the value in a given case and y is the preferred value  For a LIB attribute:

31 Digital Camera Case Base No of cases: 210 Attributes:make, price, style, resolution, optical zoom, digital zoom, weight, storage type, memory LIB attributes:price, weight MIB attributes:resolution, optical, digital, memory Discovered Rule: if make = sony and style = compact then Case 29

32 Q D  {sony, compact, memory stick} Q D  {sony, compact} Q D  {sony, memory stick} Q D  {compact, memory stick} Q D  {sony} Q D  {compact} Q D  {memory stick} Q D Reduced Complexity of Rule Discovery (e.g., from 512 candidate queries to 8) Dominance Rule Discovery for Case 29

33 Reduced Length of Discovered Rules DPs = Default Preferences

34 Recommendability of Cases  Only 56 of the 210 cases can be the most similar case for any query that includes the default query Q D  The reason is that most cases are dominated with respect to the default query  For most of the 56 non-dominated cases, only a single dominance rule was discovered  The discovered rules cover 29% of all queries over the attributes make, style, and storage type

35 Retrieving Stories for Case-Based Teaching (Burke & Kass, 1996)  Rule-based retrieval of stories or lessons learned by experienced salespersons  Retrieval is conservative, opportunistic, and non- mandatory  A story is retrieved at the system’s initiative and only if highly relevant  By design, retrieval in Rubric is also conservative and non-mandatory (and potentially opportunistic)  Easily combined with NN retrieval of a less strongly recommended case if no rule covers a given query

36 Incremental Nearest Neighbour (iNN) (McSherry, IJCAI-03, AICS-05, AIR 2005)  A conversational CBR approach in which:  Question selection is goal driven (i.e., maximise number of cases dominated by a target case)  Dialogue continues until it can be safely terminated (i.e., no other case can exceed the similarity of the target case)  Relevance of any question can be explained (e.g., ability to confirm the target case)  Recommendations can be justified (i.e., unknown preferences cannot affect the outcome)

37 Demand Driven Discovery of Recommendation Knowledge in Top Case Top Case:What is the preferred make? User:sony Top Case:The target case is: Case 40: sony, 455, ultra compact, 5, 4, 4, 298, MS, 32 What is the preferred style? User:why Top Case:Because if style = ultra compact this will confirm Case 40 as the recommended case What is the preferred style? User:compact Top Case:The recommended case is: Case 29: sony, 336, compact, 5, 3, 4, 236, MS, 32

38 Conclusions  Benefits of retrieval based on dominance rules:  Provably reliable because account is taken of the user’s unknown preferences  Benefits of default preferences:  An often dramatic reduction in average length of the discovered rules  Increased coverage of queries representing the user’s personal preferences  Reduced complexity of rule discovery

39 References Burke, R. and Kass, A. (1996) Retrieving Stories for Case-Based Teaching. In Leake, D. (ed.) Case-Based Reasoning: Experiences, Lessons & Future Directions. Cambridge, MA: AAAI Press, McCarthy, K., Reilly, J., McGinty, L. and Smyth, B. (2005) Experiments in Dynamic Critiquing. Proceedings of the International Conference on Intelligent User Interfaces, McSherry, D. (2003) Increasing Dialogue Efficiency in Case-Based Reasoning without Loss of Solution Quality. Proceedings of the 18 th International Joint Conference on Artificial Intelligence, McSherry, D. (2005) Explanation in Recommender Systems. Artificial Intelligence Review 24 (2) McSherry, D. (2005) Incremental Nearest Neighbour with Default Preferences. Proceedings of the 16 th Irish Conference on Artificial Intelligence and Cognitive Science, 9-18 McSherry, D. and Stretch, C. (2005) Automating the Discovery of Recommendation Knowledge. Proceedings of the 19 th International Joint Conference on Artificial Intelligence, 9-14 McSherry, D. and Stretch, C. (2005) Recommendation Knowledge Discovery. Proceedings of the 25 th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence

40 Acknowledgements  Thanks to:  Eugene Freuder, Barry O’Sullivan, Derek Bridge, Eleanor O’Hanlon (4C)  Chris Stretch (co-author, IJCAI-05 and AI-2005)  Kevin McCarthy, Lorraine McGinty, James Reilly, Barry Smyth (UCD) for the digital camera case base

41 Compromise-Driven Retrieval (McSherry, ICCBR-03, UKCBR-05)  Similarity and compromise (unsatisfied constraints) play complementary roles  Queries can include upper/lower limits for LIB/MIB attributes (used only in assessment of compromise)  Every case in the product data set is covered by one of the recommended cases  That is, one of the recommended cases is at least as similar and involves the same or fewer compromises