CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Slides:



Advertisements
Similar presentations
Covariance Matrix Applications
Advertisements

Component Analysis (Review)
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Lecture 8,9 – Linear Methods for Classification Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Dimension reduction (1)
An Overview of Machine Learning
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Logistic Regression Principal Component Analysis Sampling TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A A A.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Dimensional reduction, PCA
SVM QP & Midterm Review Rob Hall 10/14/ This Recitation Review of Lagrange multipliers (basic undergrad calculus) Getting to the dual for a QP.
Linear Discriminant Analysis (Part II) Lucian, Joy, Jie.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Principal Components Analysis (PCA) 273A Intro Machine Learning.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Object Orie’d Data Analysis, Last Time
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Additive Data Perturbation: data reconstruction attacks.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Basics of Neural Networks Neural Network Topologies.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Linear Models for Classification
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
CS 189 Brian Chu Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge) brianchu.com.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
1 Bios 760R, Lecture 1 Overview  Overview of the course  Classification and Clustering  The “curse of dimensionality”  Reminder of some background.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CS 189 Brian Chu Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge) brianchu.com.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Background on Classification
LECTURE 10: DISCRIMINANT ANALYSIS
Unsupervised Learning: Principle Component Analysis
Classification Discriminant Analysis
Classification Discriminant Analysis
Introduction PCA (Principal Component Analysis) Characteristics:
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Machine Learning Perceptron: Linearly Separable Supervised Learning
Linear Discrimination
What is Artificial Intelligence?
Presentation transcript:

CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Questions?

Hot stock tips: You should attend more than one section. Each of us has a completely different perspective / background / experience

Feedback

Agenda Dual clarification LDA Generative vs. discriminative models PCA Supervised vs. unsupervised Spectral Theorem / eigendecomposition Worksheet

Dual form exists for any: Any weight vector that is a function of a linear combination of the training examples -gradient descent (additive updates) -Other cases

Covariance matrix = E[X i X j ] – E[X i ]E[X j ]

LDA Assume data for each class is drawn from Gaussian, with different means but same covariance Use that assumption to find a separating decision boundary

Generative vs. discriminative Some key ideas: – Bias vs. variance – Parametric vs. nonparametric – Generative vs. discriminative

Generative vs. discriminative Generative: use P(X|Y) and P(Y)  P(Y|X) Discriminative: skip straight to P(Y|X) – just tell me Y! Q: How are they different? Are these generative or discriminative: – Gaussian classifier, logistic regression, linear regression.

Spectral Theorem / eigendecomposition Any symmetric real matrix X can be decomposed as X = UΛU T where Λ = diag(λ 1,…, λ n ) (on the diagonal are n real eigenvalues) U = [v 1,…, v n ] = n orthonormal eigenvectors – Orthonormal  U T U = UU T = I

PCA Find the principal components (axes of highest variance) Use eigenvectors/eigenvalues (highest eigenvalues of covariance matrix)

Supervised vs. unsupervised LDA = supervised PCA = unsupervised (analysis, dimensionality reduction)

Worksheet Bayes Risk = optimal risk (minimal possible risk) Bayes classifier = what’s our decision boundary?

Worksheet