Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.

Slides:



Advertisements
Similar presentations
Support Vector Machine
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Reinforcement learning
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Deep Learning Bing-Chen Tsai 1/21.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Advanced topics.
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
An Overview of Machine Learning
Supervised Learning Recap
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
The loss function, the normal equation,
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Deep Learning.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
How to do backpropagation in a brain
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Sparse Kernels Methods Steve Gunn.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
ML Concepts Covered in 678 Advanced MLP concepts: Higher Order, Batch, Classification Based, etc. Recurrent Neural Networks Support Vector Machines Relaxation.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
This week: overview on pattern recognition (related to machine learning)
How to do backpropagation in a brain
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
Machine Learning.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Non-Bayes classifiers. Linear discriminants, neural networks.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Lecture 2: Statistical learning primer for biologists
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Neural networks and support vector machines
Learning Deep Generative Models by Ruslan Salakhutdinov
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
CEE 6410 Water Resources Systems Analysis
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Restricted Boltzmann Machines for Classification
Structure learning with deep autoencoders
Deep Learning for Non-Linear Control
Machine Learning Support Vector Machine Supervised Learning
Linear Discrimination
Introduction to Neural Networks
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning

Three kinds of learning: 1.Supervised learning 2. Unsupervised Learning 3. Reinforcement learning Detailed teacher that provides desired output y for a given input x: training set {x,y}  find appropriate mapping function y=h(x;w) [= W  (x) ] Delayed feedback from the environment in form of reward/ punishment when reaching state s with action a: reward r(s,a)  find optimal policy a=  *(s) Most general learning circumstances Unlabeled samples are provided from which the system has to figure out good representations: training set {x}  find sparse basis functions b i so that x=  i c i b i

Some Pioneers

1.Supervised learning Maximum Likelihood (ML) estimation: Give hypothesis h(y|x;  ), what are the best parameters that describes the training data Bayesian Networks How to formulate detailed causal models with graphical means Universal Learners: Neural Networks, SVM & Kernel Machines What if we do not have a good hypothesis

Fundamental stochastisity Irreducible indeterminacy Epistemological limitations Sources of fluctuations  Probabilistic framework Goal of learning: Make predictions !!!!!!!!!!! learning vs memory

Goal of learning: Plant equation for robot Distance traveled when both motors are running with Power 50

Hypothesis: Learning : Choose parameters that make training data most likely The hard problem : How to come up with a useful hypothesis Assume independence of training examples and consider this as function of parameters (log likelihood) Maximum Likelihood Estimation

Minimize MSE 1.Random search 2.Look where gradient is zero 3.Gradient descent Learning rule:

Nonlinear regression: Bias-variance tradeoff

Nonlinear regression: Bias-variance tradeoff

Feedback control Adaptive control

MLE only looks at data … What is if we have some prior knowledge of  ? Bayes’ Theorem Maximum a posteriori (MAP)

How about building more elaborate multivariate models? Causal (graphical) models (Judea Pearl) Parameters of CPT usually learned from data!

Hidden Markov Model (HMM) for localization

How about building more general multivariate models? 1961: Outline of a theory of Thought-Processes and Thinking Machines Neuronic & Mnemonic Equation Reverberation Oscillations Reward learning Eduardo Renato Caianiello ( ) But: NOT STOCHASTIC (only small noise in weights) Stochastic networks: The Boltzmann machine Hinton & Sejnowski 1983

McCulloch-Pitts neuron Also popular: Perceptron learning rule: ( )

MultiLayer Perceptron (MLP) Universal approximator (learner) but Overfitting Meaningful input Unstructured learning Only deterministic units Stochastic version can represent density functions (just use chain rule)

Linear large margin classifiers  Support Vector Machines (SVM) MLP: Minimize training error SVM: Minimize generalization error (empirical risk)

Linear in parameter learning + constraints Linear in parameters Thanks to Doug Tweet (UoT) for pointing out LIP Linear hypothesis Non-Linear hypothesis SVM in dual form

Linear in parameter learning Primal problem: Dual problem: subject to

Nonlinear large margin classifiers  Kernel Trick Transform attributes (or create new feature values from attributes) and then use linear optimization Can be implemented efficiently with Kernels in SVMs Since data only appear as linear products for example, quadratic kernel

2. Sparse Unsupervised Learning

How to scale to real (large) learning problems Structured (hierarchical) internal representation What are good features Lots of unlabeled data Top-down (generative) models Temporal domain Major issues not addressed by supervised learning

Sparse features are useful What is a good representation?

Horace Barlow Possible mechanisms underlying the transformations of sensory of sensory messages (1961) ``… reduction of redundancy is an important principle guiding the organization of sensory messages …” Sparsness & Overcompleteness The Ratio Club

minimizing reconstruction error and sparsity PCA

Self-organized feature representation by hierarchical generative models

29 Restricted Boltzmann Machine (RBM) Update rule: probabilistic units (Caianello: Neuronic equation) Training rule: contrastive divergence (Caianello: Mnemonic equation) Alternating Gibbs Sampling

30 Geoffrey E. Hinton Deep believe networks: The stacked Restricted Boltzmann Machine

Sparse and Topographic RBM … with Paul Hollensen

…with Pitoyo Hartono Map Initialized Perceptron (MIP)

RBM features