Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Neural networks Introduction Fitting neural networks
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
An Introduction of Support Vector Machine
Support Vector Machines
Support Vector Machines and Margins
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
An Overview of Machine Learning
Supervised Learning Recap
CMPUT 466/551 Principal Source: CMU
Computer vision: models, learning and inference
Discriminative and generative methods for bags of features
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Speaker Adaptation for Vowel Classification
Analysis of Classification-based Error Functions Mike Rimer Dr. Tony Martinez BYU Computer Science Dept. 18 March 2006.
1 Regularized Adaptation: Theory, Algorithms and Applications Xiao Li Electrical Engineering Department University of Washington.
1 Regularized Adaptation: Theory, Algorithms and Applications Xiao Li Electrical Engineering Department University of Washington.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
Final review LING572 Fei Xia Week 10: 03/11/
Machine Learning Queens College Lecture 13: SVM Again.
This week: overview on pattern recognition (related to machine learning)
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Classification / Regression Neural Networks 2
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Non-Bayes classifiers. Linear discriminants, neural networks.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Linear Models for Classification
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
EEE502 Pattern Recognition
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machines Optimization objective Machine Learning.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Neuro-Computing Lecture 4 Radial Basis Function Network
Deep Learning for Non-Linear Control
Machine Learning Support Vector Machine Supervised Learning
Discriminative Training
Presentation transcript:

Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle

Xiao Li and Jeff Bilmes University of Washington, Seattle 2 This work … Investigates links between a number discriminative classifiers Presents a general adaptation strategy – “regularized adaptation”

Xiao Li and Jeff Bilmes University of Washington, Seattle 3 Adaptation for generative models Target sample distribution is different from that of training Has long been studied in speech recognition for generative models  Maximum likelihood linear regression  Maximum a posteriori  Eigenvoice

Xiao Li and Jeff Bilmes University of Washington, Seattle 4 Discriminative classifiers  Directly model the conditional relation of a label given features  Often yield more robust classification performance than generative models Popularly used:  Support vector machines (SVM)  Multi-layer perceptrons (MLP)  Conditional maximum entropy models

Xiao Li and Jeff Bilmes University of Washington, Seattle 5 Existing Discriminative Adaptation Strategies SVMs:  Combine SVs with selected adaptation data (Matic 93)  Combine selected SVs with adaptation data (Li 05) MLPs:  Linear input network (Neto 95, Abrash 97)  Retrain both layers from unadapted model (Neto 95)  Retrain part of last layer (Stadermann 05)  Retrain first layer Conditional MaxEnt:  Gaussian prior (Chelba 04)

Xiao Li and Jeff Bilmes University of Washington, Seattle 6 SVMs and MLPs – Links Binary classification (x t y t ) Discriminant function Accuracy-regularization objective Nonlinear transform Empirical riskRegularizer SVM:maximum margin MLP:weight decay MaxEnt:Gaussian smoothing

Xiao Li and Jeff Bilmes University of Washington, Seattle 7 SVMs and MLPs – Differences Nonlinear transform Φ θ Typical loss func. Q Typical training SVMs Reproducing kernel Hinge lossQuadratic prog. MLPs Input-to-hidden layer Log lossGradient descent

Xiao Li and Jeff Bilmes University of Washington, Seattle 8 Adaptation Adaptation data  May be in a small amount  May be unbalanced in classes We intend to utilize  Unadapted model w 0  Adaptation data (x t, y t ), t=1:T

Xiao Li and Jeff Bilmes University of Washington, Seattle 9 Regularized Adaptation Generalized objective w.r.t. adapt data Relations with existing SVM adapt. algs.  hinge loss (retrain SVM)  hard boosting (Matic 93) Margin error

Xiao Li and Jeff Bilmes University of Washington, Seattle 10 New Regularized Adaptation for SVMs Soft boosting – combine margin errors adapt data d0d0 Decision function using adapt data only

Xiao Li and Jeff Bilmes University of Washington, Seattle 11 Regularized Adaptation for SVMs (Cont.) Theorem, for linear SVMs In practice, we use α=1

Xiao Li and Jeff Bilmes University of Washington, Seattle 12 Reg. Adaptation for MLPs Extend this to a two-layer MLP Relations with existing MLP adapt. algs.  Linear input network: μ  ∞  Retrain from SI model: μ=0, ν=0  Retrain last layer: μ=0, ν  ∞  Retrain first layer: μ  ∞, ν=0  Regularized: choose μ,ν on a dev set This also relates to MaxEnt adaptation using Gaussian priors

Xiao Li and Jeff Bilmes University of Washington, Seattle 13 Experiments – Vowel Classification Application: the Vocal Joystick  A voice based computer interface for individuals with motor impairments  Vowel quality  angle Data set (extended)  Train/dev/eval: 21/4/10 speakers  6-fold cross-validation MLP configuration  7 frames of MFCC + deltas  50 hidden nodes Frame-level classification error rate

Xiao Li and Jeff Bilmes University of Washington, Seattle 14 Varying Adaptation Time Err% 4-class8-class SI7.60 ± ± s2s3s1s2s3s

Xiao Li and Jeff Bilmes University of Washington, Seattle 15 Varying # vowels in adaptation (3s each) SI: 32%

Xiao Li and Jeff Bilmes University of Washington, Seattle 16 Varying # vowels in adaptation (3s each) SI: 32%

Xiao Li and Jeff Bilmes University of Washington, Seattle 17 Varying # vowels in adaptation (3s total) SI: 32%

Xiao Li and Jeff Bilmes University of Washington, Seattle 18 Varying # vowels in adaptation (3s total) SI: 32%

Xiao Li and Jeff Bilmes University of Washington, Seattle 19 Summary Drew links between discriminative classifiers Presented a general notion of “regularized adaptation” for discriminative classifiers  Natural adaptation strategies for SVMs and MLPs justified using a maximum margin argument  A unified view of different adaptation algorithms MLP experiments show superior performance especially for class-skewed data