Learning in the Large Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Tommi Jaakkola
Learning in the Large Three Subprojects Learning to behave in huge domains Transfer of learned knowledge across problems and domains Learning to recognize objects and interpret scenes
Learning in the Large Three Subprojects Learning to behave in huge domains Transfer of learned knowledge across problems and domains Learning to recognize objects and interpret scenes
Learning in the Large Learning Objective Learn to act effectively in highly complex dynamic domains –Learn models of complex world dynamics involving objects, properties, and relations –Learn “meta-cognition” strategies for deciding how to focus computational attention for action selection Learning is crucial for both problems because human designers are unable to build appropriate models by hand
Learning in the Large What Is Being Learned? Learning probabilistic dynamic rules pickup(X): on(X,Y), clear(X), table(Z), inhand-nil 0.8 : inhand(X), ¬on(X,Y), clear(Y), ¬clear(X) ¬inhand-nil 0.2: ¬on(X,Y), clear(Y), on(X,Z) Important goal is to learn partial models: some aspects will be easy to learn to predict, others will take longer Take advantage of partial models as soon as they’re learned
Learning in the Large How is it Being Learned? Search in rule space –logic-based methods for learning structure –convex optimization for probabilities Effectiveness of learned models tested using planner to select actions Learning is automatic Amount of data needed depends on the frequency and reliability of phenomenon being modeled
Learning in the Large How is the Knowledge Represented? Probabilistic dynamics rules No background knowledge currently, but it would be easy to build in some rules Knowledge is task-independent (though we may use utility to focus learning) Models can account for only parts of the state evolution; and they’re probabilistic Currently, no
Learning in the Large What is the Domain? Currently: physics simulator of blocks world Would like simulation of more complex environment, e.g., –battlefield –disaster relief –making breakfast
Learning in the Large How is Progress Being Measured? First, human inspection of rules for plausibility Second by performance of agent using rules for planning Nothing changes in the experimental set-up except the learned rules Metrics: –utility gained by the agent –computation speed Easily done overnight on a workstation
Learning in the Large What are the Technical Milestones? Defined by model sophistication rather than overt performance in the task –Learn rules with quantifiers –Learn to ground symbolic predicates in perception –Learn rules in partially observable environments –Postulate hidden causes –Focus rule-learning based on utility
Learning in the Large What is Being Learned? Learning to formulate small planning problem, from a huge state space and competing goals –what are useful subgoals? –when is it appropriate to ignore certain aspects of the domain? learning inference planning perceptionaction
Learning in the Large How is it Being Learned? Learning parameters in abstract models –partial observability makes it hard –gradient descent works, but may be weak –take advantage of Russell’s methods? Compare speed and utility of resulting action- selection system Learning is automatic Amount of data needed depends on the frequency and reliability of phenomenon being modeled
Learning in the Large How is the Knowledge Represented? Parameters in strategies for building abstractions Currently most of the abstraction structure is hand-coded The knowledge depends on the distribution of problems an agent has to solve, but not on particular low-level tasks Uncertainty isn’t represented explicitly, but is handled implicitly in statistical learning We are learning at multiple levels of abstraction
Learning in the Large What is the Domain? Nethack Would like more complex simulated domain
Learning in the Large What are the Technical Milestones? Meta-learning –Learn parameters in hand-built abstractions for MDPs –Learn new abstractions for MDPs –Learn to compose abstractions –Do it all for POMDPs