Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data analysis Lecture 10 Tijl De Bie.

Similar presentations


Presentation on theme: "Data analysis Lecture 10 Tijl De Bie."— Presentation transcript:

1 Data analysis Lecture 10 Tijl De Bie

2 Let’s do some real data analysis
A biologist comes to you and says: “I have some data on breast cancer here, if you analyse it, I will win the Nobel prize” How to start??

3 Let’s do some real data analysis
Real data is messy: Missing values…   Infer them as the mean of the corresponding feature (this is a basic technique for ‘imputation’) [MATLAB intermezzo]

4 Let’s do some real data analysis
What now?? Let’s visualize the data! How?? 9-dimensional!  Principal Component Analysis (PCA) [MATLAB intermezzo]

5 Mathematical intermezzo: PCA
Two views: Variance maximization Error minimization Solved using eigenvalue problem Do not forget to centre the data (subtract from each feature its mean in the dataset)

6 Looks interesting… Could we perhaps predict the label from the data?
I.e., find a rule that says when a cancer is benign and when it’s malignant (important for therapy and more!) Classification! [MATLAB intermezzo]

7 Mathematical intermezzo: LSR/FDA
Least Squares Regression (LSR) Solved by means of a system of linear equations Xw=y (approx) Missfit: ||Xw-y||2 the mean squared error Fisher Discriminant Analysis: The same thing, if the labels y are -1/1

8 Could there be more? Perhaps there are more than 2 clusters?
Cancers requiring different treatments? Let’s cluster the data! 2-clusters? (Benign vs malign?) More clusters? (Other cancer types?) [MATLAB intermezzo]


Download ppt "Data analysis Lecture 10 Tijl De Bie."

Similar presentations


Ads by Google