Learning to Learn By Exploiting Prior Knowledge

Slides:

Advertisements

Similar presentations

Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

On-line learning and Boosting

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Dimension reduction (1)

Chapter 4: Linear Models for Classification

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

Confidence-Weighted Linear Classification Mark Dredze, Koby Crammer University of Pennsylvania Fernando Pereira Penn  Google.

Robust Object Tracking via Sparsity-based Collaborative Model

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Discriminative and generative methods for bags of features

Pattern Recognition and Machine Learning

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Reduced Support Vector Machine

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

Presented by Zeehasham Rasheed

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Radial Basis Function Networks

Online Learning Algorithms

Selective Transfer Machine for Personalized Facial Action Unit Detection Wen-Sheng Chu, Fernando De la Torre and Jeffery F. Cohn Robotics Institute, Carnegie.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Cao et al. ICML 2010 Presented by Danushka Bollegala.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Machine Learning CSE 681 CH2 - Supervised Learning.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

An Introduction to Support Vector Machines (M. Law)

Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.

Christopher M. Bishop, Pattern Recognition and Machine Learning.

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Biointelligence Laboratory, Seoul National University

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Chapter 7. Classification and Prediction

Sparse Kernel Machines

Adversarial Learning for Neural Dialogue Generation

Transfer Learning in Astronomy: A New Machine Learning Paradigm

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

Collaborative Filtering Matrix Factorization Approach

CSCI B609: “Foundations of Data Science”

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Presentation transcript:

Learning to Learn By Exploiting Prior Knowledge Tatiana Tommasi Idiap Research Institute École Polytechnique Fédérale de Lausanne Switzerland Oxford, October 22, 2012

Example - Learning Task Training Experience A performance measure “I want to learn Italian” “Bionji”… “ Buonyo” “Buongiorno” An agent learns if its performance at a task improves with experience (Mitchell, 1996) 1

Example – Learning to Learn Tasks Training Experience Performance measures “I want to learn Italian and French” It: “Buongiorno” Fr: “Bonjour” An agent learns to learn if its performance at each tasks improves with experience and with the number of tasks (Thrun, 1996) 2

Does it look like some other fruit? What is this? A fruit 3

Does it look similar to something else? Analogical reasoning: if we already know the appearance of some objects we can use it as reference information when learning something new. 4

Knowledge Transfer Storing knowledge gained while solving one problem and applying it to a different but related problem. Source/Sources Target: Guava Learning to learn: some transfer must occur between multiple tasks with a positive impact on the performance. 5

Domain Adaptation Domain adaptation is needed when the data distribution of the test domain is different from that of the training domain. Source/Sources Target 6

Multi-Task Learning Learning over multiple tasks at the same time by exploiting a symmetric share of information. Task 1 Task 2 Task 3 7

Learning to Learn Sharing Information Knowledge Transfer Domain Adaptation Multi-Task Learning Dynamic Process Online Learning: continuous update of the current knowledge. Active Learning: interactively query an oracle to obtain the desired outputs at new data points. 8

Learning to Learn Sharing Information Knowledge Transfer Exploit Prior Domain Adaptation Multi-Task Learning Dynamic Process Online Learning: continuous update of the current knowledge. Active Learning: interactively query an oracle to obtain the desired outputs at new data points. Exploit Prior Knowledge 8

Knowledge Transfer: Advantages Particularly useful when few target training samples are available: boost the learning process. 9

Knowledge Transfer: Challenges What to Transfer? Specify the form of the knowledge to transfer: instances, features, models. How to Transfer? Define a learning algorithm able to exploit prior knowledge. When to Transfer? Evaluate the task relatedness, keep useful knowledge and reject bad information (avoid negative transfer). 10

My choices What to Transfer? Learning models. How to Transfer? Discriminative learning approach. When to Transfer? Automatic evaluation. Intuition 11

My choices What to Transfer? Learning models. How to Transfer? Discriminative learning approach. When to Transfer? Automatic evaluation. Intuition 11

Target Problem I want to learn … vs Given a set of data Find a function Minimize the structural risk Linear models Feature mapping with Optimization problem 12

Source Problem I already know … vs A source a set of data with Pre-learned model on the source. : solution of the learning problem on the source 13

What to Transfer What to transfer? Discriminative models. Consider J source models : solution of the learning problem on the j-th source expressed as a weighted sum of kernel functions. Use as a reference knowledge when learning What to transfer? Discriminative models. 14

How and When to Transfer How: adaptive regularization. When, how much: reweighted source knowledge. Evaluate the relevance of each source Solve the target learning problem. We name KT the obtained Knowledge Transfer approach. [T. Tommasi and B. Caputo, BMVC 2009] [T. Tommasi et al., CVPR 2010] 15

Solve the target learning problem Use the square loss Solve Adaptive Least-Square Support Vector Machines LS-SVM (Suykens et al, 2002) square loss: predict correctly each sample; not sparse: all the training samples are considered; solution: set of linear equations. 16

Solving Procedure In matricial form where The model parameters can be calculated by matrix inversion Solution: Classifier: 17

Leave-One-Out Prediction We can train the learning method on N samples and obtain as a byproduct the prediction for each training sample as if it was left out from the training set. The Leave-One-Out error is an almost unbiased estimator of the generalization error (Lunz and Brailovsky, 1969). 18

Evaluate the relevance of each source The best values for beta are those producing positive values for for each i. To have a convex formulation we consider and solve 19

Experiments – Mixed Classes Visual Object Classification Caltech-256 Binary problems: object vs non-object Features: PHOG, SIFT, Region Covariance, LBP 10 mixed classes, one target and nine sources. 20

Results – Mixed Classes 21

Experiments – 6 Unrelated Classes Visual Object Classification Caltech-256 Binary problems: object vs non-object Features: PHOG, SIFT, Region Covariance, LBP 6 unrelated classes, one target and five sources. 22

Results – 6 Unrelated Classes 23

Experiments – 2 Unrelated Classes Visual Object Classification Caltech-256 Binary problems: object vs non-object Features: SIFT 2 unrelated classes, one target and one source. 24

Results – 2 Unrelated Classes 25

Transfer Weights and Semantic Similarity Use the vectors b to define a matrix of class dissimilarities. Apply multidimensional scaling (two dimensions). 26

Transfer Weights and Semantic Similarity Use the vectors b to define a matrix of class dissimilarities. Apply multidimensional scaling (two dimensions). 26

Extension: Multiclass Domain Adaptation g = 1, ..., G classes fixed for both source and target; discriminates class g as positive from all the others considered as negative; class prediction Leave-One-Out predictions 27

When and How Much to Transfer We suffer a loss which is linearly proportional to the difference between the confidence of the correct label and the maximum among the confidence of the other labels. Final objective function 28

Three Possible Schemes 1. 29

Three Possible Schemes 2. 30

Three Possible Schemes 3. 31

Application Personalization of a pre-existent model. Task: Hand posture classification. Electrodes applied on the forearm collect sEMG signals. Goals: reduce the training time of a mechanical hand prosthesis through adaptive learning over several known subjects. augment the control abilities over hand prosteses. [T. Tommasi et al, IEEE Transaction on Robotics 2012] 32

Experimental setup 10 healthy subjects 7 sEMG electrodes 3 grasping actions plus rest 33

Experimental results 34

More Subjects and Postures 20 healthy subjects 10 sEMG electrodes 6 actions plus rest 35

Leveraging over source models: Limits Restriction to binary problems (transfer learning) or multiclass with the same set of classes in the source and in the target (domain adaptation). The source and the target models should live in the same space: same features and learning parameters. Batch method, re-evaluate the relevance of each source knowledge every time a new training sample is available. 36

Feature Transfer Use the source models as experts that predict on the target samples. Use the output of the prediction as additional feature elements. Cast the problem in the multi-kernel learning framework (Multiple Kernel Transfer Learning MKTL). Principled multiclass formulation [L. Jie*, T. Tommasi*, B. Caputo, ICCV 2011] 37

Online Learning Combine Online Learning and Knowledge Transfer such that they can get a reciprocal benefit. Avoid to re-evaluate at each step the relevance of source knowledge. Obtain an online learning approach with robust generalization capacity. Transfer Initialized Online Learning (TROL) [T. Tommasi et al, BMVC 2012] 38

Exploit Exisiting Visual Resources (partially overlapping label sets) Cross-Database Generalization Exploit Exisiting Visual Resources (partially overlapping label sets) General case of many visual dataset (tasks) with some common classes. No explicit class alignment. No model already learned, only samples available, eventually represented with different feature descriptors. Define a representation which decomposes in two orthogonal parts: one shared and one private for each task. Use the generic knowledge coded in the shared part when learning on a new target problem. Multi-Task Unaligned Shared Knowledge Transfer (MUST) [T. Tommasi et al, ACCV 2012] 39

Take Home Message It is possible to define learning algorithms that automatically evaluate the relevance of prior knowledge when addressing a new target problem with few training examples. The described approaches consistently outperforms learning from scratch both in transfer learning and domain adaptation problems. It is possible to reproduce artificially different aspects of the “human analogical reasoning process”. 40

Questions ? More details in my thesis... Tatiana Tommasi ttommasi@idiap.ch http://www.idiap.ch/~ttommasi/