Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9 Data Analysis Martin Russell.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Covariance Matrix Applications
Component Analysis (Review)
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
Factor Analysis Continued
Machine Learning Lecture 8 Data Processing and Representation
PCA + SVD.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Principal Component Analysis
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Ch. 10: Linear Discriminant Analysis (LDA) based on slides from
Principal Component Analysis Principles and Application.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 11: Clustering Martin Russell.
Techniques for studying correlation and covariance structure
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Chapter 2 Dimensionality Reduction. Linear Methods
Eigen Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Canonical Correlation Analysis and Related Techniques Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.
Principal Component vs. Common Factor. Varimax Rotation Principal Component vs. Maximum Likelihood.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Image Enhancement Band Ratio Linear Contrast Enhancement
PREDICT 422: Practical Machine Learning
Information Management course
Exploring Microarray data
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Lecture: Face Recognition and Feature Reduction
Unsupervised Learning: Principle Component Analysis
Additive Data Perturbation: data reconstruction attacks
A principled way to principal components analysis
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Techniques for studying correlation and covariance structure
Covariance Vs Correlation Matrix
Introduction PCA (Principal Component Analysis) Characteristics:
Recitation: SVD and dimensionality reduction
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Parallelization of Sparse Coding & Dictionary Learning
Matrix Algebra and Random Vectors
SVD, PCA, AND THE NFL By: Andrew Zachary.
Confidence Ellipse for Bivariate Normal Data
Principal Components Analysis
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Factor Analysis (Principal Components) Output
Principal Component Analysis
Canonical Correlation Analysis and Related Techniques
Eigen Decomposition Based on the slides by Mani Thomas
Presentation transcript:

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9 Data Analysis Martin Russell

Slide 2 EE3J2 Data Mining Objectives  To review basic data analysis  To review the notions of mean, variance and covariance  To explain Principle Components Analysis (PCA)

Slide 3 EE3J2 Data Mining Example from speech processing Plot of high-frequency energy vs low- frequency energy, for 25 ms speech segments, sampled every 10ms

Slide 4 EE3J2 Data Mining Basic statistics Sample mean Sample variance ‘y’ Sample variance ‘x’ ‘y’ max ‘y’ min ‘x’ min ‘x’ max

Slide 5 EE3J2 Data Mining Basic statistics  Denote samples by X = x 1, x 2, …,x T, where x t = (x t 1, x t 2, …, x t N )  The sample mean  (X) is given by:

Slide 6 EE3J2 Data Mining More basic statistics  The sample variance  (X) is given by:

Slide 7 EE3J2 Data Mining Covariance  As the x value increases, the y value also increases  This is (positive) co-variance  If y decreases as x increases, the result is negative covariance

Slide 8 EE3J2 Data Mining Definition of covariance  The covariance between the m th and n th components of the sample data is defined by:  In practice it is useful to subtract the mean  (X) from each of the data points x t. The sample mean is then 0 and

Slide 9 EE3J2 Data Mining Data with mean subtracted Implies positive covariance

Slide 10 EE3J2 Data Mining Sample data rotated through 2  Implies negative covariance

Slide 11 EE3J2 Data Mining Data with covariance removed

Slide 12 EE3J2 Data Mining Principle Components Analysis  PCA is the technique which I used to diagonalise the sample covariance matrix  The first step is to write the covariance matrix in the form: where D is diagonal and U is a matrix corresponding to a rotation  Can do this using SVD (see lecture 8) or eigenvalue decomposition

Slide 13 EE3J2 Data Mining PCA continued  U implements rotation through angle  e 1 is the first column of U d 11 is the variance in the direction e 1 e 2 is the second column of U d 22 is the variance in the direction e 2 e1e1 e2e2

Slide 14 EE3J2 Data Mining Summary  Basic data analysis  Means, variance and covariance  Principle Components Analysis