Unsupervised Learning

Slides:



Advertisements
Similar presentations
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Advertisements

Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
New Methods in Ecology Complex statistical tests, and why we should be cautious!
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis (PCA)
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Unsupervised Learning II Feature Extraction
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Unsupervised Learning
Principal Component Analysis (PCA)
Unsupervised Learning
Mini-Revision Since week 5 we have learned about hypothesis testing:
PREDICT 422: Practical Machine Learning
Exploring Microarray data
Principal Components Analysis
Principal Component Analysis
Recognition: Face Recognition
Principal Component Analysis (PCA)
Principal Component Analysis
Measuring latent variables
Measuring latent variables
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Linear Model Selection and regularization
Dimension reduction : PCA and Clustering
Principal Components Analysis
Dimensionality Reduction
Feature space tansformation methods
CS4670: Intro to Computer Vision
Announcements Project 2 artifacts Project 3 due Thursday night
Announcements Project 4 out today Project 2 winners help session today
Announcements Artifact due Thursday
Principal Component Analysis
Principal Component Analysis
Announcements Artifact due Thursday
Measuring latent variables
The “Margaret Thatcher Illusion”, by Peter Thompson
Unsupervised Learning
Presentation transcript:

Unsupervised Learning STT592-002: Intro. to Statistical Learning Unsupervised Learning Chapter 10 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning

STT592-002: Intro. to Statistical Learning Outline Principle Component Analysis (PCA) What is Clustering? K-Means Clustering Hierarchical Clustering

Supervised vs. Unsupervised Learning STT592-002: Intro. to Statistical Learning Supervised vs. Unsupervised Learning Supervised Learning: both X and Y are known Unsupervised Learning: only X Supervised Learning Unsupervised Learning

Overview of Unsupervised Learning STT592-002: Intro. to Statistical Learning Overview of Unsupervised Learning Focus on two particular types of unsupervised learning: Principal Components Analysis (PCA), a tool used for data visualization or data pre-processing (dimension reduction) before supervised techniques are applied Clustering: a broad class of methods for discovering unknown subgroups in data.

Challenge of Unsupervised Learning STT592-002: Intro. to Statistical Learning Challenge of Unsupervised Learning Unsupervised learning is more challenging. No simple goal for analysis, such as prediction of a response for classification or MSE. More on exploratory data analysis. Hard to assess the results from unsupervised learning, as we did not have any ground truth.

Examples of Unsupervised Learning STT592-002: Intro. to Statistical Learning Examples of Unsupervised Learning Eg: A cancer researcher might assay gene expression levels in 100 patients with breast cancer, and look for subgroups among the breast cancer samples, or among the genes, in order to obtain a better understanding of the disease. Eg: Online shopping site: identify groups of shoppers with similar browsing and purchase histories, as well as items of interest within each group. Then an individual shopper can be preferentially shown the items likely to be interested, based on the purchase histories of similar shoppers. A search engine choose search results to display to a particular individual based on the click histories of other individuals with similar search patterns.

Principle Component Analysis (PCA) STT592-002: Intro. to Statistical Learning Principle Component Analysis (PCA) Review Chap 6.3, page 231-233

STT592-002: Intro. to Statistical Learning PCA Ideas: a large set of correlated variables, principal components allow us to summarize this set with a smaller # of representative variables for original variability Recall: PCA serves for -- Dimension reduction: data pre-processing before supervised techniques are applied Lossy data compression Feature extraction A tool for data visualization

STT592-002: Intro. to Statistical Learning PCA https://www.google.com/search?q=PCA&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjx--iywp3UAhXESiYKHZGxAx0Q_AUICygC&biw=1229&bih=665 https://www.microsoft.com/en-us/research/people/cmbishop/?from=http%3A%2F%2Fresearch.microsoft.com%2F~cmbishop%2Fprml

STT592-002: Intro. to Statistical Learning PCA Two common used definitions of PCA Orthogonal Projection of the data onto a lower dimensional linear space, known as the principle subspace, such that the variance of the projected data is maximized; Equivalently, as linear projection that minimizes average projection cost, where mean squared distance between data points and their projections. https://www.microsoft.com/en-us/research/people/cmbishop/?from=http%3A%2F%2Fresearch.microsoft.com%2F~cmbishop%2Fprml

STT592-002: Intro. to Statistical Learning USArrests Example X1 X2 X3 X4 Violent Crime Rates by US State This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas. A data frame with 50 observations on 4 variables. Murder: Murder arrests (per 100,000) Assault: Assault arrests (per 100,000) UrbanPop: Percent urban population Rape: Rape arrests (per 100,000) Q: Summarize a set of (X1, X2, X3, and X4) into a smaller # of representative variables for original variability.

STT592-002: Intro. to Statistical Learning PCA

STT592-002: Intro. to Statistical Learning PCA

STT592-002: Intro. to Statistical Learning PC loading and scores PC loading vectors as the directions in feature space along which the data vary the most. PC scores as projections along these directions. PCA LDA Scores Loadings PC1, PC2, …, PC_(p-1) where p is the # of variable of X. LD1, LD2, … LD_(c-1) where c is the # of classes of Y

STT592-002: Intro. to Statistical Learning Top/right: scale for loadings PCA calculation Standardize to mean 0 and SD=1 ## PC Scores for California: temp=c(0.2782682, 1.262814, 1.758923, 2.06782); pr.out_x1=sum(temp*c(0.5358995, 0.5831836, 0.2781909, 0.5434321)) pr.out_x1 ##2.498613 pr.out_x2=sum(temp*c(-0.4181809, -0.1879856, 0.8728062, 0.1673186)); pr.out_x2 ##1.5272427

STT592-002: Intro. to Statistical Learning PCA Biplot The figure represents both the principal component scores and the loading vectors in a single biplot display. PCA loading: large positive scores on 1st component: California, Nevada and Florida, have high crime rates; While states like North Dakota, with negative scores on the first component, have low crime rates. California also has a high score on 2nd component, indicating a high level of urbanization, while the opposite is true for states like Mississippi. States close to zero on both components, such as Indiana, have approximately average levels of both crime and urbanization. 1st Component: Serious Crime 2nd Component: Level of Urbanization

STT592-002: Intro. to Statistical Learning PCA Biplot A biplot uses points to represent the scores of the observations on the principal components, and it uses vectors to represent the coefficients of the variables on the principal components. Interpreting Vectors: Vectors point away from origin in some direction. A vector points in direction which has the highest squared multiple correlation with the principal components. The length of the vector is proportional to the squared multiple correlation between the fitted values for the variable and the variable itself. 1st Component: Serious Crime 2nd Component: Level of Urbanization http://forrest.psych.unc.edu/research/vista-frames/help/lecturenotes/lecture13/biplot.html

STT592-002: Intro. to Statistical Learning More on PCA Scaling of variables: In general, we shall scale the data before performing PCA. However, don’t scale the data if the variables may be measured in the same units (eg: gene data).

STT592-002: Intro. to Statistical Learning More on PCA Uniqueness of the Principal Components: Each principal component loading vector is unique, up to a sign flip. But flipping the sign has no effect as the direction does not change. The Proportion of Variance Explained (PVE):

STT592-002: Intro. to Statistical Learning More on PCA Deciding How Many Principal Components to Use: A n × p data matrix X has min(n − 1, p) distinct PCs. Goal: choose smallest # of PCs to explain a sizable amount of the variation in the data. Use the Scree Plot.

STT592-002: Intro. to Statistical Learning More on PCA Scree Plot: Find the elbow in the scree plot. Figure 10.4, one might conclude: a fair amount of variance is explained by first two PCs. There is an elbow after 2nd component. After all, 3rd principal component explains less than 10% of the variance in the data, and the fourth principal component explains less than half that and so is essentially worthless.

Another Interpretation of Principal Components STT592-002: Intro. to Statistical Learning Another Interpretation of Principal Components Principal components provide low-dimensional linear surfaces that are closest to the observations.

Another Interpretation of Principal Components STT592-002: Intro. to Statistical Learning Another Interpretation of Principal Components The first principal component loading vector has a very special property: it is the line in p-dimensional space that is closest to the n observations (using average squared Euclidean distance as a measure of closeness)

Another Interpretation of Principal Components STT592-002: Intro. to Statistical Learning Another Interpretation of Principal Components The first two principal components of a data set span the plane that is closest to the n observations, in terms of average squared Euclidean distance.

Another Interpretation of Principal Components STT592-002: Intro. to Statistical Learning Another Interpretation of Principal Components Principal components provide low-dimensional linear surfaces that are closest to the observations.