AI – Week 17 Machine Learning Applied to AI Planning: LOCM Lee McCluskey, room 2/09
Will we always need to engineer knowledge bases for planners? (pddl Pipes World) E.G. …… ( :durative-action PUSH-START :parameters( ?pipe – pipe ?batch-atom-in – batch-atom ?from-area - area ?to-area - area ?first-batch-atom – batch-atom ?product-batch-atom-in – product ?product-first-batch - product) :duration (= ?duration (/ 1 (speed ?pipe))) :condition (and (over all (normal ?pipe)) (at start (first ?first-batch-atom ?pipe)) (at start (connect ?from-area ?to-area ?pipe)) (at start (on ?batch-atom-in ?from-area)) (at start (not-unitary ?pipe)) (at start (is-product ?batch-atom-in ?product-batch-atom-in)) (at start (is-product ?first-batch-atom ?product-first-batch)) (at start (may-interface ?product-batch-atom-in ?product-first-batch))) :effect (and (at end (push-updating ?pipe)) (at end (not (normal ?pipe))) (at end (first ?batch-atom-in ?pipe)) (at start (not (first ?first-batch-atom ?pipe))) (at end (follow ?first-batch-atom ?batch-atom-in)) (at start (not (on ?batch-atom-in ?from- area))) ) ) ENGINEERED WITH VARIABLE DOMAINS, RELATIONS, PROPERTIES, CONDITIONS..
Machine Learning applied to AI Planning Automated Knowledge Acquisition: learning the domain model. One Promising Direction: Give a learning system a number of PLANS, or let it “observe” plans. Get the learning system to learn (“mine”) the details of the actions by “inducing” the operator schema This could be termed “process mining”. Example Applications: n Learn effect of operating system instructions n Learn moves / rules in a game n Learn meaning of actions in a work-flow n Learn meaning of business processes
Learning PDDL Domain Models: Where would training plans come from? Training plan scripts could come from several types of activity: 1. (Goal Directed) Solutions from current planners using existing domain models 2. Random plan scripts generated using existing domain models 3. Harvested plan scripts from human activities such as game playing 4. Recorded or logged plan scripts from computer or natural processes
5 Example System: LOCM learning of object- centred models
Will use “tyre world” as a running example (:action remove_wheel :parameters (?Wheel1 - wheel ?Hub2 - hub ?Jack3 - jack) :precondition (and (wheel_state2 ?Wheel1 ?Hub2) (hub_state1 ?Hub2 ?Jack3 ?Wheel1) (jack_state1 ?Jack3 ?Hub2)) :effect (and (wheel_state1 ?Wheel1) (not (wheel_state2 ?Wheel1 ?Hub2)) (hub_state0 ?Hub2 ?Jack3) (not (hub_state1 ?Hub2 ?Jack3 ?Wheel1)) (jack_state0 ?Jack3 ?Hub2) (not (jack_state1 ?Jack3 ?Hub2))) ) Trace of Changing a Car Wheel Meaning of “remove_wheel” Action in Planning-ready format fetch_jack jack1 boot1 remove_wheel wheel0 hub0 jack0 jack_up hub1 jack1 put_on_wheel wheel2 hub0 jack0 fetch_wrench wrench0 boot0 jack_down hub1jack1 putaway_wrench wrench1 boot0 ………………..etc LOCM
Inducing Action Semantics from Traces7 LOCM assumptions (‘sort’ sort of means ‘class’) fetch_jack jack1 boot1 remove_wheel wheel0 hub0 jack0 jack_up hub1 jack1 put_on_wheel wheel2 hub0 jack0 fetch_wrench wrench0 boot0 jack_down hub1jack1 putaway_wrench wrench1 boot0 The behaviour of objects in a sort can be represented by a FSM. The output state of an object is the same as the input state of the object in the next action Each occurrence of the same action has the same number of objects of the same sort as arguments The same name is used for the same action Objects that occur together over 2 or more actions indicate associations between object sorts
LOCM - assumptions INPUTS: traces of “plans” e.g. one plan in the tyre world might be: open(c1); fetchjack(j1,c1); fetchwrench(wr1,c1); close(c1); open(c2); fetchwrench(wr2,c2); fetchjack(j2, c2); close(c2); close(c3); open(c3) OUPUTS: PDDL Domain Model LOCM Assumptions to do with regularity: 1. Each sequence contains action names followed by a list of parameters which are objects used by that action 2. Different instances of actions have the same number of parameters in the same order, and of the same type (sort) 3. Sequences are SOUND (actions can be executed in turn) 4. The objects referred to by the training plans can thus be partitioned into a set of distinct “sorts”. Cresswell, S.N., McCluskey, T.L. and West, Margaret M. (2013) Acquiring planning domain models using LOCM. Knowledge Engineering Review. ISSN ,
LOCM- more assumptions open(c1); fetchjack(j1,c1); fetchwrench(wr1,c1); close(c1); open(c2); fetch wrench(wr2,c2); fetchjack(j2, c2); close(c2); close(c3); open(c3) LOCM Assumptions to do with behaviour: 1. All objects of the same sort behave in the same way and states of an object can be described by an FSM 2. For every specific object: the output state of one action is the SAME as the input state of the next action affecting it 3. For each action instance, an object it is applied to always starts and finishes in the same state (transitions are 1-1) boot0 (closed ) boot1 (open) open.1 close.1 fetch_jack.2 fetch_wrench.2 INDUCED STATES OF CAR “BOOT” SO RT
LOCM- create state machines open(c1); fetchjack(j1,c1); fetchwrench(wr1,c1); close(c1); open(c2); fetchwrench(wr2,c2); fetchjack(j2, c2); close(c2); close(c3); open(c3) From first 4 actions – there are 8 possible states of a boot (e.g. c1) :S1- S8 S1 => open.1 => S2 S3 => close.1 => S4 S5 => fetch jack.2 => S6 S7 => fetch wrench.2 => S8 These collapse to 2 STATES when applying behaviour assumption 2&3. boot0 (closed ) boot1 (open) open.1 close.1 fetch_jack.2 fetch_wrench.2 INDUCED STATES OF CAR “BOOT” SO RT
LOCM – inductive generalisation n New example: open(c1); putawayjack(j1, c1); close(c1); open(c2); putawayjack(j2, c2); open(1); fetchjack(j1,c1); fetchwrench(wr1, c1); fetchjack(j2, c2); close(c1); Consider one action A taking an object x of sort S1 into a state T and another action B taking object x out of that state. Assume that A and B both also refer to some other sort S2. If *every time* in training it is observed that when A and then B are executed on the same object of sort S1, the SAME object of sort S2 is recorded, then we induce that state T has an association with objects of sort S2 Example above: putawayjack(j2; c2) …. fetchjack(j2,c2): same object c2 is referred to, hence induce association with sort “boot”. Not so for eg fetchjack(j2,c1).. putaway(j1,c1) – can’t say that boot is associated with jack. Cresswell, S.N., McCluskey, T.L. and West, Margaret M. (2013) Acquiring planning domain models using LOCM. Knowledge Engineering Review. ISSN ,
LOCM – Parameterised FSM => PDDL (:action fetch_jack :parameters (?Jack1 - jack ?Boot2 - boot) :precondition (and (zero_state0) (jack_state4 ?Jack1 ?Boot2) (boot_state0 ?Boot2)) :effect (and (jack_state3 ?Jack1) (not (jack_state4 ?Jack1 ?Boot2))) ) boot1 (closed ) boot0 (open) open.1 close.1 fetch_jack.2 fetch_wrench.2 INDUCED STATES OF EVERY SORT – BOOT, JACK, WHEEL, HUB, NUTS, WRENCH … (:action open_container :parameters (?Boot1 - boot) :precondition (and (zero_state0) (boot_state1 ?Boot1)) :effect (and (boot_state0 ?Boot1) (not (boot_state1 ?Boot1))) ) ) )
Inducing Domain Models - Game Example Send to home.. Move to Free cell.... Move to Free Cell.... Move to column.... Send to home... (:action sendtohome : parameters (?card - card ?suit - suitsort ?vcard - denomination ?homecard - card ?vhomecard ?cols ?ncols - denomination) :precondition (and (clear ?card) (bottomcol ?card) (home ?homecard) (suit ?card ?suit) (suit ?homecard ?suit) (value ?card ?vcard) (value ?homecard ?vhomecard) (successor ?vcard ?vhomecard) (colspace ?cols) (successor ?ncols ?cols)) : effect (and (home ?card) (colspace ?ncols) (not (home ?homecard)) (not (clear ?card)) (not (bottomcol ?card)) (not (colspace ?cols)))) Trace of FreeCell Game Meaning of “Send to home” Action in Planning-ready format Induction
14 Problems/Future Work: n When is the induction finished – how big have the training sequences to be? n LOCM can’t (yet) induce static knowledge n Need to find “naturally occurring” sources of planning traces n What are the theoretical limits to the expressiveness of the induced language?
Conclusion n KE is hard and inflexible – techniques that can learn knowledge are important n Learning mechanisms like LOCM can exploit - regularity in training examples - physical constraints - inductive generalisations about associations between objects - assumptions about the form of actions - assumptions about the state change behaviour of groups of objects in order to learn structures such as PDDL