Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?

Slides:



Advertisements
Similar presentations
Linear Models for Microarray Data
Advertisements

Lecture 9 Microarray experiments MA plots
Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Experimental Design and Differential Expression Class web site: Statistics for Microarrays.
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
ECS 289A Presentation Jimin Ding Problem & Motivation Two-component Model Estimation for Parameters in above model Define low and high level gene expression.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.
Microarray Normalization
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Glioblastoma Multiforme (GBM) – Subtype Analysis Lance Parsons.
Gene Expression Data Analyses (3)
Differentially expressed genes
. Differentially Expressed Genes, Class Discovery & Classification.
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
1 Test of significance for small samples Javier Cabrera.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Modeling clustered survival data The different approaches.
\department of mathematics and computer science Supervised microarray data analysis Mark van de Wiel.
Multiple Testing Procedures Examples and Software Implementation.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Inference for regression - Simple linear regression
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Multiple testing in high- throughput biology Petter Mostad.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
ONLINE BIOMARKER VALIDATION OF SURVIVAL- ASSOCIATED BIOMARKERS IN BREAST AND OVARIAN CANCER USING MICROARRAY DATA OF 3,862 4,323 PATIENTS Balázs Győrffy.
Essential Statistics in Biology: Getting the Numbers Right
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
1 Use of the Half-Normal Probability Plot to Identify Significant Effects for Microarray Data C. F. Jeff Wu University of Michigan (joint work with G.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Differential Gene Expression Dennis Kostka, Christine Steinhoff Slides adapted from Rainer Spang.
Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Pabio590B – week 1 Microarrays  Overview  Design & hybridization  Data analysis.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Multilevel and multifrailty models. Overview  Multifrailty versus multilevel Only one cluster, two frailties in cluster e.g., prognostic index (PI) analysis,
1 Significance analysis of Microarrays (SAM) Applied to the ionizing radiation response Tusher, Tibshirani, Chu (2001) Dafna Shahaf.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Model adequacy checking in the ANOVA Checking assumptions is important –Normality –Constant variance –Independence –Have we fit the right model? Later.
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Fewer permutations, more accurate P-values Theo A. Knijnenburg 1,*, Lodewyk F. A. Wessels 2, Marcel J. T. Reinders 3 and Ilya Shmulevich 1 1Institute for.
Differential Gene Expression
Significance Analysis of Microarrays (SAM)
Single-Factor Studies
Single-Factor Studies
Significance Analysis of Microarrays (SAM)
Getting the numbers comparable
Hypothesis Testing: Two-Sample Inference
Normalization for cDNA Microarray Data
MiR-520d-3p is an independent positive prognostic factor in ovarian cancer. miR-520d-3p is an independent positive prognostic factor in ovarian cancer.
Presentation transcript:

Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?

Common Questions (1)Which genes are “up” or “down” in different conditions Cancer patient versus Normal Non-invasive cancer versus invasive cancer (2) Which genes can differentiate between cancer sub-types? (3) Which genes relate to the survival of the patient? (4) Which genes may be in the same pathway as a gene of interest?

EOS chips Use Affymetrix GeneChip technology 25mers 8 probes in a probe set 59,000 probe sets ~ 46,000 gene clusters (all human expressed sequences known at time) Normalised distributions of all chips to each other (gamma distribution) Single measure of intensity for each probe set (Tukey’s trimean)

Variance (linear scale) Variance (log scale) mean After the “fix”….. (Add constant and log2) Data after “normalisation” Variance increases with mean

Which genes are differentially expressed between ovarian cancer and normal ovaries? 6 normal ovaries 38 ovarian cancers o 3 mucinous o 5 endometriod o 30 serous

Statistical techniques ranked t-statistics (unequal variance) quantile-quantile plots against normal distribution Westfall and Young permutation test S. Dudoit, Y.H. Yang, M. J. Callow and T.P.Speed. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. August 2000Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Ratios of Cancer/Normal.

t statistic The tstat gets more extreme as  Difference in means  The standard deviation of each of the two samples  The size of the samples 0+ve -ve tstats ranked

Quantile-Quantile Plot R library(sma) or R library(base)

Westfall and Young Permutation tpWY program: / 6 normal ovaries, 38 ovarian cancers Randomise labels (OvCa, N) Compute tstats 100,000 iterations Unadjusted p value: Proportion of iterations where p value adjusted for multiple testing

How many genes were “statistically” significant? Ovarian Cancer Normal (Candidates for antibody therapy?) 110 candidates (adjusted p<0.01) 181 candidates (adjusted p <0.05) Ovarian CancerNormal (Candidates for tumor suppressor genes?) 7 candidates (adjusted p<0.01) 15 candidates (adjusted p<0.05)

High in cancer Excel

Low in cancer Excel How can we deal with (a) Biological variation? (b) More than one cause for cancer?

Which genes are differentially expressed between non-invasive and invasive ovarian cancer? No. samples. Non-invasiveInvasive Mucinous54 Endometriod17 Serous233 Future: Model all variables together Now: ranked t-stats, qqplots

Assume equal variance for t-stats? S 2 non-invasive (n=5) S 2 invasive (n=4) Theoretical quantiles (F distribution) Ratio variances eg.mucinous cancer

What to do when n=2? Assume equal variance? Error model?

Limitations of Westfall & Young permutation method No. samples.No. Permut. Non-invasiveInvasive Mucinous54126 Endometriod17--- Serous Not enough power when small sample sizes?

Mucinous: non-invasive versus invasive R library(base)

Which genes relate to prognosis of patients with prostate cancer? Methods: R survival package & SAS 72 patients with prostate cancer Treatment: Radical prostatectomy 17 relapsed: PSA rise >0.4ng/ml

Baseline hazard: (Independent of gene expression or PSA) Exponential: (Involves Gene & PSA Independent of Time) Cox Proportional Hazards Model

A B relapsed

B Survival Curves: Gene +PSA model High ( >= 25 th percentile) Low (< 25 th percentile). S(t) Time(disease free months)

Probe setHazards Ratiounadjusted p value A 0.26 (95% CI: 0.12 to 0.54) B0.32 (95% CI : 0.16 to 0.67) * False discovery rate for top 50 candidates is 20% (SAM) Hazard Ratio: 75 th /25 th percentile

Summary (1)Which genes are “up” or “down” in different conditions? - ranked t-statistics - qq plots (normal distribution) - Westfall & Young permutations (multiple testing) (2) Which genes relate to the survival of the patient? - Cox proportional hazards - SAM multiple testing

Acknowledgements Garvan –Sue Henshall, Rob Sutherland, Patricia Vanden Bergh EOS –Jordan Hiller, Daniel Afar, Kurt Gish, David Mack Royal Hospital for Women –Nigel Hacker ANU/John Curtin –John Maindonald –Yvonne Pittelkow Walter and Elisa Hall Institute –Terry Speed, Natalie Thorne University of Queensland –Jessica Marr