Data analysis Lecture 10 Tijl De Bie.

Slides:



Advertisements
Similar presentations
Data Analysis Lecture 8 Tijl De Bie. Dimensionality reduction How to deal with high-dimensional data? How to visualize it? How to explore it? Dimensionality.
Advertisements

Machine Learning Homework
CHAPTER 13: Alpaydin: Kernel Machines
Component Analysis (Review)
Longin Jan Latecki Temple University
Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Component Analysis
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Logistic Regression Principal Component Analysis Sampling TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A A A.
Principal Component Analysis
Principal Component Analysis IML Outline Max the variance of the output coordinates Optimal reconstruction Generating data Limitations of PCA.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan.
Proceedings of the 2007 SIAM International Conference on Data Mining.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Sp’10Bafna/Ideker Classification (SVMs / Kernel method)
Summarized by Soo-Jin Kim
Presented By Wanchen Lu 2/25/2013
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Neural Computation Prof. Nathan Intrator
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Principle Component Analysis and its use in MA clustering Lecture 12.
Dimensionality reduction
Principal Component Analysis (PCA)
Feature Selection and Extraction Michael J. Watts
Scikit-Learn Intro to Data Science Presented by: Vishnu Karnam A
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Usman Roshan Dept. of Computer Science NJIT
Principal Component Analysis
Principal Component Analysis (PCA)
PREDICT 422: Practical Machine Learning
Dimensionality Reduction
Background on Classification
Dimensionality reduction
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Dimensionality reduction
Machine Learning Dimensionality Reduction
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
4.8 The Quadratic Formula and the Discriminant
Ying shen Sse, tongji university Sep. 2016
Hundred Dollar Questions
Introduction PCA (Principal Component Analysis) Characteristics:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Nonlinear Fitting.
Feature space tansformation methods
Somi Jacob and Christian Bach
LECTURE 09: DISCRIMINANT ANALYSIS
Mathematical Sciences
Lecture 8: Image alignment
Principal Component Analysis
8/22/2019 Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal.
Presentation transcript:

Data analysis Lecture 10 Tijl De Bie

Let’s do some real data analysis http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) A biologist comes to you and says: “I have some data on breast cancer here, if you analyse it, I will win the Nobel prize” How to start??

Let’s do some real data analysis Real data is messy: Missing values…   Infer them as the mean of the corresponding feature (this is a basic technique for ‘imputation’) [MATLAB intermezzo]

Let’s do some real data analysis What now?? Let’s visualize the data! How?? 9-dimensional!  Principal Component Analysis (PCA) [MATLAB intermezzo]

Mathematical intermezzo: PCA Two views: Variance maximization Error minimization Solved using eigenvalue problem Do not forget to centre the data (subtract from each feature its mean in the dataset)

Looks interesting… Could we perhaps predict the label from the data? I.e., find a rule that says when a cancer is benign and when it’s malignant (important for therapy and more!) Classification! [MATLAB intermezzo]

Mathematical intermezzo: LSR/FDA Least Squares Regression (LSR) Solved by means of a system of linear equations Xw=y (approx) Missfit: ||Xw-y||2 the mean squared error Fisher Discriminant Analysis: The same thing, if the labels y are -1/1

Could there be more? Perhaps there are more than 2 clusters? Cancers requiring different treatments? Let’s cluster the data! 2-clusters? (Benign vs malign?) More clusters? (Other cancer types?) [MATLAB intermezzo]