Lecture 15 Factor Analysis.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Pattern Recognition and Machine Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Image Filtering Readings: Ch 5: 5.4, 5.5, 5.6,5.7.3, 5.8 (This lecture does not follow the book.) Images by Pawan SinhaPawan Sinha formal terminology.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
CHE/ME 109 Heat Transfer in Electronics LECTURE 11 – ONE DIMENSIONAL NUMERICAL MODELS.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Transforms: Basis to Basis Normal Basis Hadamard Basis Basis functions Method to find coefficients (“Transform”) Inverse Transform.
CS Instance Based Learning1 Instance Based Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A Unifying Review of Linear Gaussian Models
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Discriminant Analysis Testing latent variables as predictors of groups.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
PATTERN RECOGNITION AND MACHINE LEARNING
Table of Contents First note this equation has "quadratic form" since the degree of one of the variable terms is twice that of the other. When this occurs,
EM and expected complete log-likelihood Mixture of Experts
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Image Subtraction Mask mode radiography h(x,y) is the mask.
2.2 Product Laws of Exponents. 2.2 Objectives O To model behavior of exponents using function machines O To understand and apply the product laws of exponents.
Jakob Verbeek December 11, 2009
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.
Weight Uncertainty in Neural Networks
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:
Image Subtraction Mask mode radiography h(x,y) is the mask.
Linear Equations Constant Coefficients
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
Feature Mapping FOR SPEAKER Diarization IN NOisy conditions
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
2 Digit by 2 Digit Multiplication
Mean Shift Segmentation
K-means conditional mixture models
Latent Variables, Mixture Models and EM
Probabilistic Models for Linear Regression
with observed random variables
Expectation Maximization
Objective Use multiplication properties of exponents to evaluate and simplify expressions.
Lecture 17 Kalman Filter.
Multivariate Methods Berlin Chen
Image Filtering Readings: Ch 5: 5. 4, 5. 5, 5. 6, , 5
Machine Learning – a Probabilistic Perspective
Test #1 Thursday September 20th
Velocity Motion Model (cont)
Velocity Motion Model (cont)
6.2 Multiplying Powers with the Same Base
Presentation transcript:

Lecture 15 Factor Analysis

Factor Analysis Like clustering is classification without knowing the labels, we have that Factor Analysis is like regression without the targets. Attributes are noisy linear functions of factors. Noise model p(x|y) is Gaussian and prior p(y) is gaussian. All factors can cooperate to generate output (unlike MoG). We use fewer factors than attributes: dimensionality reduction. Its like a sphere of data that gets warped into an ellipse, after which we add independent noise to each attribute. Final model is Gaussian, so we can remove the mean first: centering.

EM E-step: We compute the posteriors using algebra for Gaussians. Since we know that the posteriors must Gaussian we only need to consider the quadratic form in the exponent. There are also some tricks to replace dxd inverses by smaller kxk inverses. (k=#factors, d=#attributes). M-step take derivatives and set to zero: analytic updates We can change A & y together using a rotation and the model will not change.