Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approaches to Modeling and Learning User Preferences Marie desJardins University of Maryland Baltimore County Presented at SRI International AI Center.

Similar presentations


Presentation on theme: "Approaches to Modeling and Learning User Preferences Marie desJardins University of Maryland Baltimore County Presented at SRI International AI Center."— Presentation transcript:

1

2 Approaches to Modeling and Learning User Preferences Marie desJardins University of Maryland Baltimore County Presented at SRI International AI Center March 10, 2008 Joint work with Fusun Yaman, Michael Littman, and Kiri Wagstaff

3 Overview  Representing Preferences  Learning Planning Preferences  Preferences over Sets  Directions / Conclusions

4 Representing Preferences

5 What is a Preference?  (Partial) ordering over outcomes Feature vector representation of “outcomes” (aka “objects) Example: Taking a vacation. Features:  Who (alone / family)  Where (Orlando / Paris)  Flight type (nonstop / onestop / multistop )  Cost (low / medium / high)  …  Languages: Weighted utility function CP-net Lexicographic ordering

6 Weighted Utility Functions  Each value v ij of feature f i has an associated utility u ij  Utility U j of object o j = : U j = ∑ i w j u ij  Commonly used in preference elicitation  Easy to model  Independence of features is convenient  Flight example: U(flight) =.8*u(Who) +.8*u(Cost)+.6*u(Where) +.4*u(Flight Type) …

7 CP-Nets  Conditional Preference Network Intuitive, graphical representation of conditional preferences under a ceteris paribus (“all-else- being equal”) assumption who family > alone where family : Orlando > Paris alone : Paris > Orlando I prefer to take a vacation with my family, rather than going alone If I am with my family, I prefer Orlando to Paris If I am alone, I prefer Paris to Orlando

8  Every CP-net induces a preference graph on outcomes:  The partial ordering of outcomes is given by the transitive closure of the preference graph Induced Preference Graph family > alone family : Orlando > Paris alone : Paris >Orlando who where alone  Orlando family  Orlando alone  Paris family  Paris

9 Lexicographic Orderings  Features are prioritized with a total ordering f 1, …, f k  Each value of each feature is prioritized with a total ordering, v i1 …v im  To compare o 1 and o 2 : Find the first feature in the feature ordering on which o 1 and o 2 differ Choose the outcome with the preferred value for that feature  Travel example: Who > Where > Cost > Where > Flight-Type > …  Family > Alone  Orlando > Florida …  Cheap > Expensive

10 Representation Tradeoffs  Each representation has some limitations  Additive utility functions can’t capture conditional preferences, and can’t easily represent “hard” constraints or preferences  CP-nets, in general, only give a partial ordering, can’t model integer/real features easily, and can’t capture tradeoffs  Lexicographic preferences can’t capture tradeoffs, and can’t represent conditional preferences

11 Learning Planning Preferences

12 Planning Algorithms  Domain-independent Inputs: initial state, goal state, possible actions Domain-independent but not efficient  Domain-specific Works for only one domain (Near-) optimal reasoning Very fast  Domain-configurable Use additional planning knowledge to customize the search automatically Broadly applicable and efficient

13 Domain Knowledge for Planning  Provide search control information Hierarchy of abstract actions (HTN operators) Logical formulas (e.g., temporal logic)  Experts must provide planning knowledge May not be readily available Difficult to express knowledge declaratively

14 Learning Planning Knowledge  Alternative: Learn planning knowledge by observation (i.e., from example plans)  Possibly even learn from a single complex example DARPA’s Integrated Learning Program  Our focus: Learn preferences at various decision points Charming Hybrid Adaptive Ranking Model  Currently: Learns preferences over variable bindings  Future: Learn goal and operator preferences

15 HTN: Hierarchical Task Network  Objectives are specified as high-level tasks to be accomplished  Methods describe how high-level tasks are decomposed down to primitive tasks travel(X,Y) short-distance travel long-distance travel buyTicket(Ax,Ay) fly(Ax,Ay)travel(X,Ax) getTaxi(X) rideTaxi(X,Y) payDriver travel(Ay,Y) Primitive actions High-level tasks travel(X,Y) HTN operators

16 CHARM: Charming Hybrid Adaptive Ranking Model  Learns preferences in HTN methods Which objects to choose when using a particular method?  Which flight to take? Which airport to choose? Which goal to select next during planning? Which method to choose to achieve a task?  By plane or by train?  Preferences are expressed as lexicographic orderings A natural choice for many (not all) planning domains

17 Summary of CHARM  CHARM learns a preference rule for each method. Given: an HTN, initial state, and the plan tree Find: an ordering on variable values for each decision point (planning context)  CHARM has two modes Gather training data for each method  Orlando = (tropical, family-oriented, expensive) is preferred to Boise = (cold, outdoors-oriented, cheap) Learn preference rule in each method

18 Preference Rules  A preference rule is a function that returns, given two objects represented as vectors of attributes.  Assumption: Preference rules are lexicographic For every attribute there is a preferred value There is a total order on the attributes representing the order of importance A warm destination is preferred to a cold one. Among destinations of the same climate, an inexpensive one is better than an expensive one….

19 Learning Lexicographic Preference Models  Existing algorithms return one of many models consistent with the data  The worst case performance of such algorithms is worse than random selection Higher probability of poor performance if there are fewer training observations  A novel democratic approach: Variable Voting Sample the possible consistent models  Implicit sampling: models that satisfy certain properties are permitted to vote Preference decision is based on the majority of votes

20 Variable Voting  Given a partial order, <, on the attributes and two objects, A and B: D={ attributes that are different in A and B } D*={ most salient attributes in D with respect to < } The object with the largest number of preferred values for the attributes in D* is the preferred object X1X1 X2X2 X3X3 X4X4 X5X5 A10100 B00111

21 Learning Variable Ranks  Initially, all attributes are equally important  Loop until ranks converge: Given two objects, predict a winner using the current beliefs If the prediction was wrong, decrease the importance of the attribute values that led to the wrong prediction  The importance of an attribute never goes beyond its actual place in the order of attributes  Mistake bounds algorithm, learns from its mistakes Mistake bound is O( n 2 ), where n is the number of attributes

22 Democracy vs. Autocracy VariableVoting

23 Preferences Over Sets

24 Preferences over Sets  Subset selection applications: Remote sensing, sports teams, music playlists, planning  Ranking, like a search engine? Doesn’t capture dependencies between items  Encode, apply, learn set-based preferences + Complementarity + Redundancy

25 User Preferences  Depth: utility function (desirable values)  Diversity: variety and coverage Geologist: near + far views (context) Example: prefer images with with more rock than sky Rock: 25% Soil: 75% Sky: 0% Rock: 10% Soil: 50% Sky: 40%

26 Encoding User Preferences  DD-PREF: a language for expressing preferred depth and diversity, for sets utility. Sky Soil Rock Depth Diversity or?

27 Finding the Best Subset  Maximize where Depth Diversity utility of subset s per-item utility diversity value of s per-feature diversity (1 - skew) subset valuation subsetpreference

28 Learning Preferences from Examples  Hard for users to specify quantitative values (especially with more general quality functions)  Instead, adopt a machine learning approach 1. Users provide example sets with high valuation 2. System infers: Utility functions Desired diversity Feature weights 3. Once trained, the system can select subsets of new data (blocks, images, songs, food)

29  Depth: utility functions Probability density estimation: KDE (kernel density estimation) [Duda et al., 01]  Diversity: average of observed diversities  Feature weights: minimize difference between computed valuation and true valuation BFGS bounded optimization [Gill et al., 81] Learning a Preference Model % Sky % Rock

30 Results: Blocks World  Compute valuation of sets chosen by true preference, learned preference, and random selection  As more training sets are available, performance increases (learned approximates true) MosaicTower Lower baseline

31 Rover Image Experiments  Methodology Six users: 2 geologists, 4 computer scientists Five sets of 20 images each  Each user selects a subset of 5 images from each set  Evaluation Learn preferences on (up to 4) examples, select a new subset from a held-out set Metrics:  Valuation of the selected subset  Functional similarity between learned preferences

32 Learned Preferences Subset of 5 images, chosen by a geologist, from 20 total Learned diversities: Rock0.8 Soil0.9 Sky0.5 Learned feature weights: Rock0.3 Soil0.1 Sky1.0 Learned utility functions: Sky Soil Rock

33 Subset Selection Subset of 5 images, chosen by a geologist, from 20 total 5 images chosen from 20 images, using greedy DD-Select and learned prefs 5 images chosen by the same geologist from the same 20 new images

34 Current Work  Extending to document data Text (discrete) features Menu World: Chinese restaurant dish selection  How do you combine multiple preferences with different priorities? Rover: dust devils, carbonate rocks, cross-bedding Priorities that can change over time

35 Future Directions

36  Hybrid preference representation Decision tree with lexicographic orderings at the leaves Permits conditional preferences How to learn the “splits” in the tree?  Support operator, goal orderings for planning  Incorporate concept of set-based preferences into planning domains


Download ppt "Approaches to Modeling and Learning User Preferences Marie desJardins University of Maryland Baltimore County Presented at SRI International AI Center."

Similar presentations


Ads by Google