Maximizing theVariance =

Slides:



Advertisements
Similar presentations
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Advertisements

CSNB143 – Discrete Structure
1 CS 240A : Numerical Examples in Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication Hyperobjects Thanks to Charles E.
Artificial Neural Networks
Support Vector Machines
Dimensionality Reduction PCA -- SVD
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Learning Techniques for Information Retrieval Perceptron algorithm Least mean.
Computer Algorithms Integer Programming ECE 665 Professor Maciej Ciesielski By DFG.
ECE 201 Circuit Theory 11 Energy Calculations for Mutually – Coupled Coils.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Support Vector Machines
6 1 Linear Transformations. 6 2 Hopfield Network Questions.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
Linear Systems Gaussian Elimination CSE 541 Roger Crawfis.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Chapter 4 – Matrix CSNB 143 Discrete Mathematical Structures.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
HMM - Part 2 The EM algorithm Continuous density HMM.
Matrix Notation for Representing Vectors
Dimensionality reduction
Chapter 2 Determinants. With each square matrix it is possible to associate a real number called the determinant of the matrix. The value of this number.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Applied statistics Usman Roshan.
Support vector machines
FAUST Classifier Pr=P(xod)<a Pv=P(xod)a
Usman Roshan CS 675 Machine Learning
Chapter 3: Maximum-Likelihood Parameter Estimation
Maximize variance - is it wise?
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Chapter 11 Optimization with Equality Constraints
University of Ioannina
R = UT o I ru,i = u o i = f=1..F ru,f * rf,i ^
Dimensionality reduction
Maximizing theVariance =
Maximize variance - is it wise?
An Introduction to Support Vector Machines
Support Vector Machines
Maximize variance - is it wise?
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Outline Parameter estimation – continued Non-parametric methods.
Sampling Distribution of Means: Basic Theorems
Computational Intelligence: Methods and Applications
Principal Component Analysis and Linear Discriminant Analysis
FAUST Classifier Pr=P(xod)<a Pv=P(xod)a
Support Vector Machines
CS 430: Information Discovery
FAUST Oblique Analytics are based on a linear or dot product, o Let X(X1...Xn) be a table. FAUST Oblique analytics employ.
FAUST Classifier PR=Pxod<a PV=Pxoda d2-line d-line
Usman Roshan CS 675 Machine Learning
Support vector machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support vector machines
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Principal Component Analysis
CS 430: Information Discovery
Design a TM that decides the language
Eigenvalues and Eigenvectors
CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)
Chapter 4. Supplementary Questions
Lecture 20 SVD and Its Applications
Representing Relations Using Matrices
Presentation transcript:

Maximizing theVariance = : xN x1od x2od xNod = Xod=Fd(X)=DPPd(X) d1 dn How do we use this theory? For DPP-gap based Clustering, we can hill-climb akk below to a d that gives us the global maximum variance. Heuristically higher variance means more prominent gaps. Given any table, X(X1, ..., Xn), and any unit vector, d, in n-space, let V(d)≡VarDPPd(X)= (Xod)2 - (Xod)2 = i=1..N(j=1..n xi,jdj)2 - ( j=1..nXj dj )2 N 1 For DPP-gap based Classification, we can start with the table of the C Training Set Class Means: X≡(X1...Xn) where Mk≡MeanVectorOfClass=k . M1 M2 : MC - (jXj dj) (kXk dk) = i(j xi,jdj) (k xi,kdk) N 1 = ijxi,j2dj2 + j<k xi,jxi,kdjdk - jXj2dj2 +2j<k XjXkdjdk N 1 2 Then Xi = Mean(X)i and and XiXj = Mean Mi1 Mj1 . : MiC MjC = jXj2 dj2 +2j<kXjXkdjdk - " = j=1..n(Xj2 - Xj2)dj2 + +(2j=1..n<k=1..n(XjXk - XjXk)djdk ) subject to i=1..ndi2=1 dT o A o d=VarDPPdX≡V(d) V i XiXj-XiX,j : d1 ... dn d1 dn We can write this separating out the diagonal elements or not: These computations are O(C) (the number of classes) and are therefore instantaneous. Once we have the matrix A, we apply UT-2 and hill-climb to obtain a d that maximizes the variance of the class means. + jkajkdjdk V(d)=jajjdj2 ijaijdidj V(d) =  d0, one can hill-climb it to locally maximize the variance, V as follows: d1≡(V(d0)); d2≡(V(d1)):... where 2a11 2a12 ... 2a1n 2a21 2a22 ... 2a2n : ' 2an1 ... 2ann d1 di dn V(d)≡Gradient(V)=2Aod or V(d)= 2a11d1 +j1a1jdj 2a22d2 +j2a2jdj : 2anndn +jnanjdj Ubhaya Theorem1:  k{1,...,n} s.t. d=ek will hill-climb V to its globally maximum. Let d=ek s.t. akk is a maximal diagonal element of A, Ubhaya Theorem2: d=ek will hill-climb V to its globally maximum.