Download presentation
Presentation is loading. Please wait.
1
Data analysis Lecture 10 Tijl De Bie
2
Let’s do some real data analysis
A biologist comes to you and says: “I have some data on breast cancer here, if you analyse it, I will win the Nobel prize” How to start??
3
Let’s do some real data analysis
Real data is messy: Missing values… Infer them as the mean of the corresponding feature (this is a basic technique for ‘imputation’) [MATLAB intermezzo]
4
Let’s do some real data analysis
What now?? Let’s visualize the data! How?? 9-dimensional! Principal Component Analysis (PCA) [MATLAB intermezzo]
5
Mathematical intermezzo: PCA
Two views: Variance maximization Error minimization Solved using eigenvalue problem Do not forget to centre the data (subtract from each feature its mean in the dataset)
6
Looks interesting… Could we perhaps predict the label from the data?
I.e., find a rule that says when a cancer is benign and when it’s malignant (important for therapy and more!) Classification! [MATLAB intermezzo]
7
Mathematical intermezzo: LSR/FDA
Least Squares Regression (LSR) Solved by means of a system of linear equations Xw=y (approx) Missfit: ||Xw-y||2 the mean squared error Fisher Discriminant Analysis: The same thing, if the labels y are -1/1
8
Could there be more? Perhaps there are more than 2 clusters?
Cancers requiring different treatments? Let’s cluster the data! 2-clusters? (Benign vs malign?) More clusters? (Other cancer types?) [MATLAB intermezzo]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.