Maximizing theVariance =

Slides:

Advertisements

Similar presentations

General Linear Model With correlated error terms  =  2 V ≠  2 I.

Advertisements

CSNB143 – Discrete Structure

1 CS 240A : Numerical Examples in Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication Hyperobjects Thanks to Charles E.

Artificial Neural Networks

Support Vector Machines

Dimensionality Reduction PCA -- SVD

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.

© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.

Learning Techniques for Information Retrieval Perceptron algorithm Least mean.

Computer Algorithms Integer Programming ECE 665 Professor Maciej Ciesielski By DFG.

ECE 201 Circuit Theory 11 Energy Calculations for Mutually – Coupled Coils.

Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Support Vector Machines

6 1 Linear Transformations. 6 2 Hopfield Network Questions.

Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.

Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.

Linear Systems Gaussian Elimination CSE 541 Roger Crawfis.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Chapter 4 – Matrix CSNB 143 Discrete Mathematical Structures.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

HMM - Part 2 The EM algorithm Continuous density HMM.

Matrix Notation for Representing Vectors

Dimensionality reduction

Chapter 2 Determinants. With each square matrix it is possible to associate a real number called the determinant of the matrix. The value of this number.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Wei Xu,

An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Applied statistics Usman Roshan.

Support vector machines

FAUST Classifier Pr=P(xod)<a Pv=P(xod)a

Usman Roshan CS 675 Machine Learning

Chapter 3: Maximum-Likelihood Parameter Estimation

Maximize variance - is it wise?

LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION

Chapter 11 Optimization with Equality Constraints

University of Ioannina

R = UT o I ru,i = u o i = f=1..F ru,f * rf,i ^

Dimensionality reduction

Maximizing theVariance =

Maximize variance - is it wise?

An Introduction to Support Vector Machines

Support Vector Machines

Maximize variance - is it wise?

Support Vector Machines Introduction to Data Mining, 2nd Edition by

Outline Parameter estimation – continued Non-parametric methods.

Sampling Distribution of Means: Basic Theorems

Computational Intelligence: Methods and Applications

Principal Component Analysis and Linear Discriminant Analysis

FAUST Classifier Pr=P(xod)<a Pv=P(xod)a

Support Vector Machines

CS 430: Information Discovery

FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.

FAUST Classifier PR=Pxod<a PV=Pxoda d2-line d-line

Usman Roshan CS 675 Machine Learning

Support vector machines

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Support vector machines

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Principal Component Analysis

CS 430: Information Discovery

Design a TM that decides the language

Eigenvalues and Eigenvectors

CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)

Chapter 4. Supplementary Questions

Lecture 20 SVD and Its Applications

Representing Relations Using Matrices

Presentation transcript:

Maximizing theVariance = : xN x1od x2od xNod = Xod=Fd(X)=DPPd(X) d1 dn How do we use this theory? For DPP-gap based Clustering, we can hill-climb akk below to a d that gives us the global maximum variance. Heuristically higher variance means more prominent gaps. Given any table, X(X1, ..., Xn), and any unit vector, d, in n-space, let V(d)≡VarDPPd(X)= (Xod)2 - (Xod)2 = i=1..N(j=1..n xi,jdj)2 - ( j=1..nXj dj )2 N 1 For DPP-gap based Classification, we can start with the table of the C Training Set Class Means: X≡(X1...Xn) where Mk≡MeanVectorOfClass=k . M1 M2 : MC - (jXj dj) (kXk dk) = i(j xi,jdj) (k xi,kdk) N 1 = ijxi,j2dj2 + j<k xi,jxi,kdjdk - jXj2dj2 +2j<k XjXkdjdk N 1 2 Then Xi = Mean(X)i and and XiXj = Mean Mi1 Mj1 . : MiC MjC = jXj2 dj2 +2j<kXjXkdjdk - " = j=1..n(Xj2 - Xj2)dj2 + +(2j=1..n<k=1..n(XjXk - XjXk)djdk ) subject to i=1..ndi2=1 dT o A o d=VarDPPdX≡V(d) V i XiXj-XiX,j : d1 ... dn d1 dn We can write this separating out the diagonal elements or not: These computations are O(C) (the number of classes) and are therefore instantaneous. Once we have the matrix A, we apply UT-2 and hill-climb to obtain a d that maximizes the variance of the class means. + jkajkdjdk V(d)=jajjdj2 ijaijdidj V(d) =  d0, one can hill-climb it to locally maximize the variance, V as follows: d1≡(V(d0)); d2≡(V(d1)):... where 2a11 2a12 ... 2a1n 2a21 2a22 ... 2a2n : ' 2an1 ... 2ann d1 di dn V(d)≡Gradient(V)=2Aod or V(d)= 2a11d1 +j1a1jdj 2a22d2 +j2a2jdj : 2anndn +jnanjdj Ubhaya Theorem1:  k{1,...,n} s.t. d=ek will hill-climb V to its globally maximum. Let d=ek s.t. akk is a maximal diagonal element of A, Ubhaya Theorem2: d=ek will hill-climb V to its globally maximum.