Data Analysis Lecture 8 Tijl De Bie. Dimensionality reduction How to deal with high-dimensional data? How to visualize it? How to explore it? Dimensionality.

Slides:



Advertisements
Similar presentations
Data analysis Lecture 10 Tijl De Bie.
Advertisements

Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Machine Learning Lecture 8 Data Processing and Representation
Unsupervised learning
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Face Recognition and Biometric Systems
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
1 Efficient Clustering of High-Dimensional Data Sets Andrew McCallum WhizBang! Labs & CMU Kamal Nigam WhizBang! Labs Lyle Ungar UPenn.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
1 Data Analysis  Data Matrix Variables ObjectsX1X1 X2X2 X3X3 …XPXP n.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
Unsupervised Learning
Bayesian belief networks 2. PCA and ICA
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #19.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions WK9 – Principle Component Analysis CS 476: Networks.
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
Journal Club Journal of Chemometrics May 2010 August 23, 2010.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.
CSE 446 Dimensionality Reduction and PCA Winter 2012 Slides adapted from Carlos Guestrin & Luke Zettlemoyer.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Dimensionality reduction
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
PREDICT 422: Practical Machine Learning
Dimensionality Reduction
Background on Classification
Unsupervised Learning: Principle Component Analysis
Principal Component Analysis (PCA)
Basic machine learning background with Python scikit-learn
Spectral Clustering.
Principal Component Analysis
Jianping Fan Dept of CS UNC-Charlotte
Principal Component Analysis (PCA)
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
Principal Component Analysis (PCA)
Feature space tansformation methods
Solving absolute value equations visually
INTRODUCTION TO Machine Learning
Presentation transcript:

Data Analysis Lecture 8 Tijl De Bie

Dimensionality reduction How to deal with high-dimensional data? How to visualize it? How to explore it? Dimensionality reduction is one way…

Projections in vector spaces wx Meaning: –||w||*||x||*cos(theta) –For unit norm w: projection of x on w –To express hyperplanes: wx=b –To express halfspaces: wx>b All these interpretations are relevant

Projections in vector spaces [Some drawings…]

Variance of a projection wx=xw is the projection of x on w Let X contain many points x as its rows Projection of all points in X is: –Xw = (x 1 w, x 2 w, …, x n w) Variance of projection on w: –sum i (x i w/||w||) 2 = (wXXw)/(ww) –Or, if ||w||=1, this is: sum i (x i w) 2 = wXXw

Principal Component Analysis Direction / unit vector w with largest variance? –max w wXXw subject to ww=1 Lagrangian: –L(w) = wXXw-lambda(ww-1) Gradient w.r.t. w equal to zero: –2*XXw=2*lambda*w –(XX)*w=lambda*w Eigenvalue problem!

Principal Component Analysis Find w as dominant eigenvector of XX! Then we can project the data on this w For no other projection the variance is larger This projection is the best 1-D representation of the data

Principal Component Analysis Best 1-D representation given by projection on dominant eigenvector Second best w: the second eigenvector and so on…

Technical but important… I havent mentioned: –The data should be centred –That is: the mean of each of the features should be 0 –If that is not the case: subtract from each feature its mean (centering)

Clustering Another way to make sense of high- dimensional data Find coherent groups in the data Points that are: –close to one another within a cluster, but –distant from points in other clusters

Distances between points Distance between points: ||x i -x j || Can we assign points to K different clusters –each of which is coherent –distant from each other? Define the clusters by means of cluster centres m k with k=1,2,…,K

K-means cost function Ideal clustering: –||x i -m k(i) || small for all x i if m k(i) is its cluster centre –sum i ||x i -m k(i) || 2 small Unfortunately: hard to minimise… Simultaneous optimisation of: –k(i) (which cluster centre for which point) –m k (where are the cluster centres) Iterative strategy!

K-means clustering Iteratively optimise centres and cluster assignments K-means algorithm: –Start with random choices of K centres m k –Set k(i)=argmin k ||x i -m k || 2 –Set m k =mean({x i : k(i)=k}) Do this for many different random starts, and pick the best result (with lowest cost)