Learning Integrated Symbolic and Continuous Action Models Joseph Xu & John Laird May 29, 2013.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Pattern Recognition and Machine Learning
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Data Mining Classification: Alternative Techniques
Support Vector Machines and Margins
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Indian Statistical Institute Kolkata
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
CMPUT 466/551 Principal Source: CMU
Classification Neural Networks 1
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Formation et Analyse d’Images Session 8
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Topological Mapping using Visual Landmarks ● The work is based on the "Team Localization: A Maximum Likelihood Approach" paper. ● To simplify the problem,
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Artificial Neural Networks
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Boosting for tumor classification
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Ensembles of Classifiers Evgueni Smirnov
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
What we didn’t have time for CS664 Lecture 26 Thursday 12/02/04 Some slides c/o Dan Huttenlocher, Stefano Soatto, Sebastian Thrun.
New SVS Implementation Joseph Xu Soar Workshop 31 June 2011.
Machine Learning Chapter 4. Artificial Neural Networks
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Benk Erika Kelemen Zsolt
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Joseph Xu Soar Workshop Learning Modal Continuous Models.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
Computacion Inteligente Least-Square Methods for System Identification.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Semi-Supervised Clustering
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deep Feedforward Networks
Learning with Perceptrons and Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Classification Neural Networks 1
Joseph Xu Soar Workshop 31 June 2011
Hairong Qi, Gonzalez Family Professor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Learning Integrated Symbolic and Continuous Action Models Joseph Xu & John Laird May 29, 2013

Action Models ?

Benefits Accurate action models allow for – Internal simulation – Backtracking planning – Learning policies via trial-and-error without incurring real-world cost 3 AgentWorld AgentModel Exploration Reward Exploration World Policy

Requirements Model learning should be Accurate Predictions made by model should be close to reality Fast Learn from few examples General Models should make good predictions in many situations Online Models shouldn’t require sampling entire space of possible actions before being useful 4

Continuous Environments Discrete objects with continuous properties – Geometry, position, rotation Input and output are vectors of continuous numbers Agent runs in lock-step with environment Fully observable 5 Output Input EnvironmentAgent px py rx ry A B pz rz px pypz 0.0 rx ryrz AB

Action Modeling in Continuous Domains 6

Locally Weighted Regression 7

? x k nearest neighbors Weighted Linear Regression 8

LWR Shortcomings LWR generalizes based on proximity in pose space Smoothes together qualitatively distinct behaviors Generalizes with examples that are closer in absolute coordinates instead of similarity in object relationships 9 ?

LWR Shortcomings LWR generalizes based on proximity in pose space Smoothes together qualitatively distinct behaviors Generalizes with examples that are closer in absolute coordinates instead of similarity in object relationships 10

Our Approach Type of motion depends on relationships between objects, not absolute positions Learn models that exploits relational structure of the environment – Segment behaviors into qualitatively distinct linear motions (modes) – Classify which mode is in effect using relational structure Flying mode (no contact) Ramp rolling mode (touching ramp) Bouncing mode (touching flat surface) 11

Learning Multi-Modal Models 12 ~intersect(A,B) above(A,B) ball(A) Relational State intersect(A,B) above(A,B) ball(A) Time mode I mode II Segmentation Classification Relational Mode Classifier Continuous state B A B A RANSAC + EMFOIL Scene Graph

Predict with Multi-Modal Models 13 ~intersect(A, B) Relational State mode I mode II Relational Mode Classifier Continuous state B A Scene Graph prediction ~intersect(A,B) above(A,B) ball(A)

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise ball bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 RANSAC t = vy – 0.98

RANSAC Discover new modes 1.Choose random set of noise examples 2.Fit line to set 3.Add all noise examples that also fit line 4.If set is large (>40), create a new mode with those examples 5.Otherwise, repeat. 15 New mode Remaining noise

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise ball bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 RANSAC t = vy – 0.98

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise bx06, by06, vy06~(b,p), ~(b,r)t06 bx05, by05, vy05~(b,p), ~(b,r)t05 bx04, by04, vy04~(b,p), ~(b,r)t04 bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 t = vy – 0.98 ball EM

Expectation Maximization Simultaneously learn: – Association between training data and modes – Parameters for mode functions Expectation – Assume mode functions are correct – Compute likelihood that mode generated data point Maximization – Assume likelihoods are correct – Fit mode functions to maximize likelihood Iterate until convergence to local maximum 18

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise bx06, by06, vy06~(b,p), ~(b,r)t06 bx05, by05, vy05~(b,p), ~(b,r)t05 bx04, by04, vy04~(b,p), ~(b,r)t04 bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 t = vy – 0.98 ball bx07, by07, vy07(b,p), ~(b,r)t07 Clause: ~(b,p) FOIL

Learn classifiers to distinguish between two modes (positives and negatives) based on relations Outer loop: Iteratively add clauses that cover the most positive examples – Inner loop: Iteratively add literals that rule out negative examples Object names are variablized for generality 20 Clause# pos. ex. ~intersect(target, any)16221 ~z-overlap(target, any)6162 east-of(target, x)36 ~x-overlap(target, any)21 east-of(x, target)9 ~ontop(target, any) & above(target, x)2

FOIL FOIL learns binary classifiers, but there can be many modes Use one-to-one strategy: – Learn classifier between each pair of modes – Each classifier votes between two modes – Mode with most votes wins

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise bx06, by06, vy06~(b,p), ~(b,r)t06 bx05, by05, vy05~(b,p), ~(b,r)t05 bx04, by04, vy04~(b,p), ~(b,r)t04 bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 t = vy – 0.98 ball bx09, by09, vy09(b,p), ~(b,r)t09 bx08, by08, vy08(b,p), ~(b,r)t08 bx07, by07, vy07(b,p), ~(b,r)t07 RANSAC t = vy Clause: ~(b,p)

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise bx06, by06, vy06~(b,p), ~(b,r)t06 bx05, by05, vy05~(b,p), ~(b,r)t05 bx04, by04, vy04~(b,p), ~(b,r)t04 bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 t = vy – 0.98 bx13, by13, vy13(b,p), ~(b,r)t13 bx12, by12, vy12(b,p), ~(b,r)t12 bx11, by11, vy11(b,p), ~(b,r)t11 bx10, by10, vy10(b,p), ~(b,r)t10 bx09, by09, vy09(b,p), ~(b,r)t09 bx08, by08, vy08(b,p), ~(b,r)t08 bx07, by07, vy07(b,p), ~(b,r)t07 t = vy ball Clause: (b,p) FOIL Clause: ~(b,p)

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise bx06, by06, vy06~(b,p), ~(b,r)t06 bx05, by05, vy05~(b,p), ~(b,r)t05 bx04, by04, vy04~(b,p), ~(b,r)t04 bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 t = vy – 0.98 bx13, by13, vy13(b,p), ~(b,r)t13 bx12, by12, vy12(b,p), ~(b,r)t12 bx11, by11, vy11(b,p), ~(b,r)t11 bx10, by10, vy10(b,p), ~(b,r)t10 bx09, by09, vy09(b,p), ~(b,r)t09 bx08, by08, vy08(b,p), ~(b,r)t08 bx07, by07, vy07(b,p), ~(b,r)t07 t = vy ball bx16, by16, vy16~(b,p), (b,r)t16 bx15, by15, vy15~(b,p), (b,r)t15 bx14, by14, vy14~(b,p), (b,r)t14 RANSAC t = vy – 0.7 Clause: (b,p)Clause: ~(b,p)

staterelationstargstaterelationstargstaterelationstargstaterelationstarg platform ramp noise bx06, by06, vy06~(b,p), ~(b,r)t06 bx05, by05, vy05~(b,p), ~(b,r)t05 bx04, by04, vy04~(b,p), ~(b,r)t04 bx03, by03, vy03~(b,p), ~(b,r)t03 bx02, by02, vy02~(b,p), ~(b,r)t02 bx01, by01, vy01~(b,p), ~(b,r)t01 t = vy – 0.98 bx13, by13, vy13(b,p), ~(b,r)t13 bx12, by12, vy12(b,p), ~(b,r)t12 bx11, by11, vy11(b,p), ~(b,r)t11 bx10, by10, vy10(b,p), ~(b,r)t10 bx09, by09, vy09(b,p), ~(b,r)t09 bx08, by08, vy08(b,p), ~(b,r)t08 bx07, by07, vy07(b,p), ~(b,r)t07 t = vy ball bx16, by16, vy16~(b,p), (b,r)t16 bx15, by15, vy15~(b,p), (b,r)t15 bx14, by14, vy14~(b,p), (b,r)t14 t = vy – 0.7 Clause: (b,r)Clause: (b,p)Clause: ~(b,p) FOIL

Demo Physics simulation with ramp, box, and ball Learn models for x and y velocities link 26

Physics Simulation Experiment 2D physics simulation with gravity 40 possible configurations Training/Testing blocks run for 200 time steps 40 configs x 3 seeds = 120 training blocks Test over all 40 configs using different seed Repeat with 5 reorderings gravity random offset origin 27

Learned Modes ModeExpectedLearned X velocity flying or rolling on flat surface rolling or bouncing on ramp bouncing against vertical surface Y velocity rolling and bouncing on flat surface flying under influence of gravity rolling or bouncing on ramp 28

Prediction Accuracy Compare overall accuracy against single smooth function learner (LWR) 29

Classifier Accuracy Compare FOIL performance against classifiers using absolute coordinates (SVM, KNN) 30 KK

Nuggets Multi-modal approach addresses shortcomings of LWR – Doesn’t smooth over examples from different modes – Uses relational similarity to generalize behaviors Satisfies requirements – Accurate. New modes are learned for inaccurate predictions – Fast. Linear modes are learned from (too) few examples – General. Each mode generalizes to all relationally analogical situations – Online. Modes are learned incrementally and can immediately make predictions 31

Coals Slows down with more learning – keeps every training example Assumes linear modes RANSAC, EM, and FOIL are computationally expensive