Presentation is loading. Please wait.

Presentation is loading. Please wait.

L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing.

Similar presentations


Presentation on theme: "L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing."— Presentation transcript:

1 L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA nan.li.3@asu.edunan.li.3@asu.edu, rao@asu.edu, Sungwook.Yoon@asu.edurao@asu.eduSungwook.Yoon@asu.edu Special Thanks to William Cushing A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?

2 I T ’ S A BIRD, IT ’ S A ROCKET, IT ’ S A PLANE... What is more efficient, and, more expressive than vanilla planning? … HTN Planning! So efficient that all ``real’’ planners use HTNs..and yet undecidable ! (vanilla planning is P-Space complete) HTN Planning Impossible?

3 T WO T ALES O F HTN P LANNING Abstraction Efficiency Top-down o Preference handling o Quality o Bottom-up Learning Most work o Our work

4 Hitchhike ? No way! P bus : Getin(bus, source), Buyticket(bus), Getout(bus, dest)2 P train : Buyticket(train), Getin(train, source), Getout(train, dest)8 P hike : Hitchhike(source, dest)0 L EARNING U SER P LAN P REFERENCES

5 L EARNING U SER P REFERENCES AS P HTN S Given a set O of plans executed by the user Find a generative model, H l H l = argmax H p (O | H) Probabilistic Hierarchical Task Networks (pHTNs) S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

6 LEARNING pHTNs HTNs can be seen as providing a grammar of desired solutions Actions  Words Plans  Sentences HTNs  Grammar HTN learning  Grammar induction pHTN learning by probabilistic context free grammar (pCFG) induction Assumptions: parameter-less, unconditional S  0.2, A1 B1 S  0.8, A2 B2 B1  1.0, A2 A3B2  1.0, A1 A3 A1  1.0, Getin A2  1.0, Buyticket A3  1.0, Getout

7 A T WO -S TEP A LGORITHM Greedy Structure Hypothesizer: Hypothesizes the schema structure Expectation-Maximization (EM) Phase: Refines schema probabilities Removes redundant schemas Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

8 G REEDY S TRUCTURE H YPOTHESIZER Structure learning Bottom-up Prefer recursive to non-recursive

9 EM P HASE E Step: Plan parse tree computation Most probable parse tree M Step: Selection probabilities update s: a i  p, a j a k

10 E VALUATION Ideal: User studies (too hard) Our approach: Assume H* represents user preferences Generate observed plans using H* ( H*  O) Learn H l from O ( O  H l ) Compare H* and H l ( H*  T*, H l  T l ) Syntactic similarity is not important, only distribution is Use KL-Divergence between distributions T*, T l KL-Divergence measures distance between distributions Domains Randomly Generated Logistics Planning, Gold Miner H* P1,P2,…PnP1,P2,…Pn Learner HlHl

11 RATE OF LEARNING AND CONCISENESS Rate of LearningConciseness More training plans, better schemas. Small domains, 1 or 2 more non-primitive actions Large domains, much more non-primitive actions Refine structure learning? Randomly Generated Domains

12 EFFECTIVENESS OF EM Compare greedy schemas with learned schemas EM step is very effective in capturing user preferences Randomly Generated Domains

13 “BENCHMARK” DOMAINS H* : Move by plane or truck  Prefer plane  Prefer fewer steps KL Divergence: 0.04 Recovers plane > truck less steps > more steps H* : Get the laser cannon Shoot rock until adjacent to gold Get a bomb Use the bomb to remove last wall KL Divergence: 0.52 Reproduces basic strategy Logistics PlanningGold Miner

14 C ONCLUSIONS & E XTENSIONS Learn user plan preferences Learned HTNs capture preferences rather than domain abstractions Evaluate predictive power Compare distributions rather than structure Preference obfuscation Poor graduate student who prefers to travel by plane usually travels by car Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09


Download ppt "L EARNING P ROBABILISTIC H IERARCHICAL T ASK N ETWORKS TO C APTURE U SER P REFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing."

Similar presentations


Ads by Google