Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Slides:



Advertisements
Similar presentations
Face Recognition Sumitha Balasuriya.
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Component Analysis (Review)
Machine Learning Lecture 8 Data Processing and Representation
CLUSTERING PROXIMITY MEASURES
Face Recognition By Sunny Tang.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Chapter 5 Orthogonality
Multiple View Geometry Projective Geometry & Transformations of 2D Vladimir Nedović Intelligent Systems Lab Amsterdam (ISLA) Informatics Institute,
Neural Computation Prof. Nathan Intrator
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Face Recognition Jeremy Wyatt.
Distance Measures Tan et al. From Chapter 2.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Neural Computation Prof. Nathan Intrator
Distance Measures Tan et al. From Chapter 2. Similarity and Dissimilarity Similarity –Numerical measure of how alike two data objects are. –Is higher.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Today Wrap up of probability Vectors, Matrices. Calculus
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Orthogonal Matrices and Spectral Representation In Section 4.3 we saw that n  n matrix A was similar to a diagonal matrix if and only if it had n linearly.
Summarized by Soo-Jin Kim
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Alignment Introduction Notes courtesy of Funk et al., SIGGRAPH 2004.
CHAPTER FIVE Orthogonality Why orthogonal? Least square problem Accuracy of Numerical computation.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Computational Intelligence: Methods and Applications Lecture 4 CI: simple visualization. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 3 Histograms and probabilities. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Mathematics for Computer Graphics. Lecture Summary Matrices  Some fundamental operations Vectors  Some fundamental operations Geometric Primitives:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Computational Intelligence: Methods and Applications Lecture 37 Summary & review Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Computer Graphics and Image Processing (CIS-601).
Chapter 2: Getting to Know Your Data
Neuronal Goal Neuronal Goal n-dimensional vectors m-dimensional vectors m < n transform from 2 to 1 dimension Ex: We look for axes which minimise projection.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Discriminant Analysis
Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,
Neural Computation Prof. Nathan Intrator
Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.
Principle Component Analysis and its use in MA clustering Lecture 12.
Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
Pattern Recognition Mathematic Review Hamid R. Rabiee Jafar Muhammadi Ali Jalali.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 7 Matrix Mathematics
LECTURE 04: DECISION SURFACES
Lecture 2-2 Data Exploration: Understanding Data
Introduction to Data Mining
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Computational Intelligence: Methods and Applications
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
Computational Intelligence: Methods and Applications
ECE 417 Lecture 4: Multivariate Gaussians
Numerical Analysis Lecture14.
Matrix Algebra and Random Vectors
Computational Intelligence: Methods and Applications
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Computational Intelligence: Methods and Applications
Computational Intelligence: Methods and Applications
Presentation transcript:

Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch

Chernoff faces Humans have specialized brain areas for face recognition. For d < 20 represent each feature by changing some face elements. Interesting applets: (Chernoff park) 20Faces.htm

Fish view Other shapes may also be used to visualized data, for example fish.

Ring visualization (SunBurst) Show tree-like hierarchical representation in form of rings.

Other EDA techniques NIST Engineering Statistics Handbook has a chapter on exploratory data analysis (EDA). Unfortunately many visualization programs are written for X-Windows only, are in Fortran, or S or R languages. Sonification: data converted to sounds! Example: sound of EEG dataExample: sound of EEG data. Java Voice Think about potential applications! More:

CI approach to visualization Scatterograms: project all data on two features. Find more interesting directions to create projections. Linear projections: Principal Component Analysis, Discriminant Component Analysis, Projection Pursuit – “define interesting” projections. Non-linear methods – more advanced, some will appear later. Statistical methods: multidimensional scaling. Neural methods: competitive learning, Self-Organizing Maps. Kernel methods, principal curves and surfaces. Information-theoretic methods.

Distances in feature spaces Data vector, d-dimensions X T = (X 1,... X d ), Y T = (Y 1,... Y d ) Distance, or metric function, is a 2-argument function that satisfies: Distance functions measure (dis)similarity. Popular distance functions: Euclidean distance (L 2 norm) Manhattan (city-block) distance (L 1 norm)

Two metric functions Equidistant points in 2D: Euclidean case: circle or sphere Manhattan case: square Identical distance between two points X, Y: imagine that in 10 D ! X1X1 X2X2 X1X1 X2X2 X X Y Y All points in the shaded area have the same Manhattan distance to X and Y! isotropic non-isotropic

Linear transformations 2D vectors X in a unit circle with mean (1,1); Y = A  X, A = 2x2 matrix The shape and the mean of data distribution is changed. Scaling (diagonal a ii elements); rotation (off-diag), mirror reflection. Distances between vectors are not invariant: ||Y 1 -Y 2 || ≠ ||X 1 -X 2 ||

Invariant distances Euclidean distance is not invariant to linear transformations Y = A  X, scaling of units has strong influence on distances. How to select scaling/rotations for simplest description of data? Orthonormal matrices: A T A = I, are inducing rigid rotations. To achieve full invariance requires therefore standardization of data (scaling invariance) and should use covariance matrix. Mahalanobis metric will replace A T A by inverse of the covariance matrix.

Data standardization For each vector component X (j)T = ( X 1 (j),... X d (j) ), j=1.. n calculate mean and std: n – number of vectors, d – their dimension Vector of mean feature values. Averages over rows.

Standard deviation Calculate standard deviation: Why n  1, not n ? If true mean was known it should be n, but if the mean is calculated the formula with n  1 converges to the true variance!the true variance Transform X => Z, standardized data vectors: Vector of mean feature values. Variance = square of standard deviation (std), sum of all deviations from the mean value.

Standardized data Std data: zero mean and unit variance. Standardize data after making data transformation. Effect: data is invariant to scaling only; for diagonal transformations distances after standardization are invariant, are based on identical units. Note: it does not mean that all data models are performing better! How to make data invariant to any linear transformations?

Std example Before std After std Mean and std are shown using a colored bar; minimum and max values may extend outside. Some features (ex. yellow), have large values; some (ex: gray) have small values; this may depend on units used to measure them. Standardized data have all mean 0 and  =1, thus contribution from different features to similarity or distance calculation is comparable.