Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.

Similar presentations


Presentation on theme: "Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희."— Presentation transcript:

1 Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희

2 Introduction Gene expression: process of transcribing DNA sequence into RNA for protein production Gene expression level: approximate number of copies of RNA in a cell  Correlate with amount of the corresponding protein made.  May provide the additional information for improving cancer classification and diagnosis Class discovery: dividing samples into groups with similar behavior or properties Class prediction: given a set of known classes, determine the correct class for a new patient

3 Definitions Data Class vector: datan samples m genes Gene expression vector

4 Method for Choosing Correlated Genes Metric for gene selection  Predictive gene’s typical expression in one class must be quite different from its typical expression in the other.  Variation of expression in one class must be as little as possible.  Correlation metric:

5 Neighborhood analysis  Whether there are any genes likely to be predictors of given class distinction  Determine if the neighborhood around c holds more gene expression vectors than we’d expect to see by chance(around random permutation of c).

6

7 Choosing a prediction set S  Could simply choose the top k genes by the absolute value of P(g, c).  Choose the top k 1 genes(highly expressed in class 1) and the bottom k 2 genes(highly expressed in class 2). Optimal size of the prediction set  Tradeoff between additional information and robustness and amount of additional noise.  Variant |S| with constraint that k 1 and k 2 are roughly equal. This prediction method is not highly sensitive to the exact number of genes used.

8 Prediction by Weighted Voting Each gene casts a weighted vote  V=weight(g) * distance(x, b)  g: each gene in S, x: new sample in test set.  b: ‘decision boundary’  weight(g)=P(g,c) Tradeoff of reliability vs. utility  PS(‘prediction strength’)  In this paper, PS threshold was 0.3. That is,  Error rate, ‘no call’ rate

9 Application: Classifying Patient Samples Training set: 38 leukemia samples(11 AML, 28 ALL) Test set: 34 samples(14 AML, 20 ALL) ALL/AML distiction

10 About 700 genes above the 1% level in each direction Arbitrarily chose to use 50-gene predictor  36 correct prediction out of 38 training samples.  29 correct prediction out of 34 test samples.

11 Application: Verifying Proposed Classes One needs to show that the class distinctions discovered are real and biologically interesting. Validate the clusters by testing predictability.  If clusters reflect true structure, the distinction should be predictable in additional samples  Examine prediction strengths in cross-validation  Test if distribution of prediction strengths for given class distinction is significantly higher than we’d expect for a random class distinction.

12 Discussion and Conclusion Use similar methods to predict any trait of characteristic at the transcriptional level. Future work  When no one biological pathway is responsible for all the cases in either class.  When there is multiple classes.


Download ppt "Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희."

Similar presentations


Ads by Google