Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: 550.635 Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.

Slides:



Advertisements
Similar presentations
Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
Presenter: Yanlin Wu Advisor: Professor Geman Date: 10/17/2006
Model Assessment, Selection and Averaging
Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University Microarray Data.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
Assuming normally distributed data! Naïve Bayes Classifier.
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Lesson #17 Sampling Distributions. The mean of a sampling distribution is called the expected value of the statistic. The standard deviation of a sampling.
Topic 3: Regression.
Multiclass classification of microarray data with repeated measurements: application to cancer Ka Yee Yeung & Roger E Bumgarner Genome Biology 2003, 4:R83.
Measures of Variability: Range, Variance, and Standard Deviation
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
Gene based diagnostic prediction of cancers by using Artificial Neural Network Liya Wang ECE/CS/ME539.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Classification (Supervised Clustering) Naomi Altman Nov '06.
Review of Statistics Group Results. Which type of statistics? When comparing two group scores-Use the t-test. When comparing more than two scores: Use.
Gene Expression Profiling Illustrated Using BRB-ArrayTools.
ANOVA One Way Analysis of Variance. ANOVA Purpose: To assess whether there are differences between means of multiple groups. ANOVA provides evidence.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Diagnosis using computers. One disease Three therapies.
The Broad Institute of MIT and Harvard Classification / Prediction.
Lecture 13 Chi-square and sample variance Finish the discussion of chi-square distribution from lecture 12 Expected value of sum of squares equals n-1.
Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li University of Zurich Department of Informatics The NIPS 2003 challenge was organized to.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Evaluating Results of Learning Blaž Zupan
Statistical analysis. Types of Analysis Mean Range Standard Deviation Error Bars.
Radiation Detection and Measurement, JU, First Semester, (Saed Dababneh). 1 Counting Statistics and Error Prediction Poisson Distribution ( p.
Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
Class 5 Estimating  Confidence Intervals. Estimation of  Imagine that we do not know what  is, so we would like to estimate it. In order to get a point.
 A standardized value  A number of standard deviations a given value, x, is above or below the mean  z = (score (x) – mean)/s (standard deviation)
Chapter 8 Interval Estimates For Proportions, Mean Differences And Proportion Differences.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Statistical analysis.
Machine Learning – Classification David Fenyő
Classification with Gene Expression Data
Kenneth I. Aston, Ph. D. , Philip J. Uren, Ph. D. , Timothy G
Performance Evaluation 02/15/17
Statistical analysis.
Data Mining K-means Algorithm
Project 4: Facial Image Analysis with Support Vector Machines
CH 5: Multivariate Methods
Hallett, et al., - Supplementary Figure 1
Mean Absolute Deviation
Claudio Lottaz and Rainer Spang
Statistics and Science
Mark Rothmann U.S. Food and Drug Administration September 14, 2018
Kenneth I. Aston, Ph. D. , Philip J. Uren, Ph. D. , Timothy G
Mean Absolute Deviation
Mean Absolute Deviation
Mean Absolute Deviation
Claudio Lottaz and Rainer Spang
Presentation transcript:

Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor Geman By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu

Nearest Centroid Classification Example: small round blue cell tumors of childhood 63 training samples, 25 testing samples 4 classes: BL, EWS, NB, RMS Figure 1 Nearest centroid classification Disadvantage

Nearest shrunken Centroids A modification of the nearest centroid method Idea: First normalize class centroids by the within- class standard deviation for each gene, shrink each class centroid towards the overall centroid.

Details: Mean expression value in class k for gene i ith component of the overall centroid Pooled within class standard deviation for gene i

It measures the difference between the gene i in class k and gene i in all classes combined. Idea: a gene that discriminates one class from the rest will have a statistic of large absolute value.

Shrink it toward zero to eliminate the genes that do not provide sufficient information. ‘De-noising’ step

Choosing the amount of shrinkage Shrinkage amount is allowed to vary over a wide range. 10-fold cross validation ( choose the one that has the smallest error rate) Divide the set of samples (at random)into 10 equal size parts. (classes were distributed proportionally among each of the 10 parts) Fit the model on 90% of the samples and then predict the class label of the remaining 10% (test samples). Repeat 10 times, add together the error (overall error). Figure 2 Figure 1

More Figures Figure 3 Figure 4

Classification A new sample is classified by comparing its expression profile with each shrunken centroid, over those 43 active genes. Distance function: prior information included.

Statistical details: t-statistic Estimates of the class probabilities (Figure 5)Figure 5