Download presentation
Presentation is loading. Please wait.
Published byCecil Montgomery Modified over 8 years ago
1
LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Nan Li, William Cushing, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA nan.li.3@asu.edunan.li.3@asu.edu, wcushing@asu.edu, rao@asu.edu, Sungwook.Yoon@asu.eduwcushing@asu.edurao@asu.edu Sungwook.Yoon@asu.edu
2
Preferred Behavior: P bus : Getin(bus, source), Buyticket(bus), Getout(bus, dest)1 P train : Buyticket(train), Getin(train, source), Getout(train, dest)4 P plane : Buyticket(plane), Getin(plane, source), Getout(plane, dest) 12 USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Obfuscated Behavior: P bus : Getin(bus, source), Buyticket(bus), Getout(bus, dest) 6 P train : Buyticket(train), Getin(train, source), Getout(train, dest) 5 P plane : Buyticket(plane), Getin(plane, source), Getout(plane, dest) 3 Train tickets are too expensive for me. Maybe I should just take bus. I prefer travel by train.
3
LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Rescale observed plans Undo the filtering caused by feasibility constraints Base learner Acquires true user preferences based on adjusted plan frequencies Rescaled Plans: P plane * 12 P train * 4 P bus * 1 Base Learner Input Plans: P plane * 3 P train * 5 P bus * 6 IJCAI ‘09 User Preference Model
4
RESCALE OBSERVED PLANS Situation 1: P plane * 3 P train * 1 Situation 2: P train * 4 P bus * 1 Situation 3: P bus * 5 Situation All: P plane * 12 P train * 4 P bus * 1 Situation 2: P train Situation 1: P train Situation 2: P bus Situation 3: P bus Clustering Transitive Closure
5
EVALUATION Ideal: User studies (too hard) Our approach: Assume H* represents user preferences Generate “worst case” random solution plans using H* (– H* Sol) Pick the selected plan using H* ( Sol O) From O, learn H lO using the original algorithm, H lE using the extended algorithm ( O H lO, O H lE ) Compare H lO and H lE Randomly generate plan pairs Ask H lO and H lE to pick the preferred plan Use H* to check whether the answer is correct or not H* S 1 : P 1, S 2 : P 2, … S n : P n Learner HlHl H*
6
RATE OF LEARNING AND SIZE DEPENDENCE Rate of LearningSize Dependence Extended algorithm captures nearly full user preferences, with increasing training data. Original algorithm performs slightly worse than random chance. Extended algorithm outperforms original algorithm with various domain sizes Randomly Generated Domains
7
“BENCHMARK” DOMAINS H* : Move by plane or truck Prefer plane Prefer fewer steps Score: No rescaling: 0.342 Rescaling: 0.847 H* : Get the laser cannon Shoot rock until adjacent to gold Get a bomb Use the bomb to remove last wall Score: No rescaling: 0.605 Rescaling: 0.706 Logistics Planning Gold Miner
8
CONCLUSIONS Learn user plan preferences obfuscated by feasibility constraints Adjust the observed frequencies of plans to fit user’s true preference Evaluate predictive power Use “worst case” model Show rescaling before learning is significantly more effective
9
Hitchhike? No way! P bus : Getin(bus, source), Buyticket(bus), Getout(bus, dest)2 P train : Buyticket(train), Getin(train, source), Getout(train, dest)8 P hike : Hitchhike(source, dest)0 LEARNING USER PLAN PREFERENCES
10
TWO TALES OF HTN PLANNING Abstraction Efficiency Top-down o Preference handling o Quality o Bottom-up How should Learning proceed? Most existing work Our work
11
LEARNING USER PLAN PREFERENCES AS pHTNs Given a set O of plans executed by the user Find a generative model, H l H l = argmax H p (O | H) Probabilistic Hierarchical Task Networks (pHTNs) S 0.2, A1 B1 S 0.8, A2 B2 B1 1.0, A2 A3B2 1.0, A1 A3 A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
12
LEARNING pHTNs HTNs can be seen as providing a grammar of desired solutions Actions Words Plans Sentences HTNs Grammar HTN learning Grammar induction pHTN learning by probabilistic context free grammar (pCFG) induction Assumptions: parameter-less, unconditional S 0.2, A1 B1 S 0.8, A2 B2 B1 1.0, A2 A3B2 1.0, A1 A3 A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
13
A TWO-STEP ALGORITHM Greedy Structure Hypothesizer: Hypothesizes the schema structure Expectation-Maximization (EM) Phase: Refines schema probabilities Removes redundant schemas Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
14
GREEDY STRUCTURE HYPOTHESIZER Structure learning Bottom-up Prefer recursive to non-recursive
15
EM PHASE E Step: Plan parse tree computation Most probable parse tree M Step: Selection probabilities update s: a i p, a j a k
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.