Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.

Similar presentations


Presentation on theme: "Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor."— Presentation transcript:

1 Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu

2 Nearest Centroid Classification Example: small round blue cell tumors of childhood 63 training samples, 25 testing samples 4 classes: BL, EWS, NB, RMS Figure 1 Nearest centroid classification Disadvantage

3

4 Nearest shrunken Centroids A modification of the nearest centroid method Idea: First normalize class centroids by the within- class standard deviation for each gene, shrink each class centroid towards the overall centroid.

5 Details: Mean expression value in class k for gene i ith component of the overall centroid Pooled within class standard deviation for gene i

6 It measures the difference between the gene i in class k and gene i in all classes combined. Idea: a gene that discriminates one class from the rest will have a statistic of large absolute value.

7 Shrink it toward zero to eliminate the genes that do not provide sufficient information. ‘De-noising’ step

8 Choosing the amount of shrinkage Shrinkage amount is allowed to vary over a wide range. 10-fold cross validation ( choose the one that has the smallest error rate) Divide the set of samples (at random)into 10 equal size parts. (classes were distributed proportionally among each of the 10 parts) Fit the model on 90% of the samples and then predict the class label of the remaining 10% (test samples). Repeat 10 times, add together the error (overall error). Figure 2 Figure 1

9

10

11 More Figures Figure 3 Figure 4

12

13 Classification A new sample is classified by comparing its expression profile with each shrunken centroid, over those 43 active genes. Distance function: prior information included.

14 Statistical details: t-statistic Estimates of the class probabilities (Figure 5)Figure 5

15


Download ppt "Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor."

Similar presentations


Ads by Google