Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.

Slides:



Advertisements
Similar presentations
Lecture 9 Microarray experiments MA plots
Advertisements

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Microarray Normalization
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
A Statistical Framework for the Design of Microarray Experiments and Effective Detection of Differential Gene Expression by Shu-Dong Zhang, Timothy W.
Getting the numbers comparable
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Data Preprocessing and Clustering Analysis
Differentially expressed genes
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Statistical Analysis of Microarray Data
Gene Expression Data Analyses (2)
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Making Sense of Complicated Microarray Data
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 7 – T-tests Marshall University Genomics Core Facility.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Essential Statistics in Biology: Getting the Numbers Right
CDNA Microarrays MB206.
1 Use of the Half-Normal Probability Plot to Identify Significant Effects for Microarray Data C. F. Jeff Wu University of Michigan (joint work with G.
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
Statistics for Differential Expression Naomi Altman Oct. 06.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Statistical Testing with Genes Saurabh Sinha CS 466.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Example of a Functional Genomics Study Molecular Ecology ,
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Differential Gene Expression
Additional file 8: Estimation of biological variations
Normalization for cDNA Microarray Data
Differential Expression of RNA-Seq Data
Presentation transcript:

Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible RNA: AUGCAUGCUGCUAGCUACGUAUGCAUGCUGCUAGCUACGU cDNA: TACGTACGACGATCGATGCATACGTACGACGATCGATGCA Probe: GCTACGTATGCAT Mix probe with cDNA: probe will find complementary DNA sequence and bind to it. TACGTACGACGATCGATGCATACGTACGACGATCGATGCA GCTACGTATGCAT

Expression microarray:

Statistical analysis of Microarrays: An Introduction Why do replication of arrays? controltreatment

Biological Replication

Technical Replication RNA mixed probe pool

Dye Swap Design What type of replication ?

Background subtraction

Transformation using logarithmic values Assume red and green signal are the same: log 2 (1/1) => 0 (by definition) Assume red signal is twice of green signal: log 2 (2/1) => log 2 (2) =1(b/c 2 1 =2) Assume red signal is half of green signal: log 2 (1/2) => log 2 (1) - log 2 (2) =-1(= 0-1 => -1)

Using logarithmic values Normal scale Logarithmic scale log 2 (2) =1 log 2 (0.5) =-1 unequal arrow distances equal arrow distances, same absolute values for the same-fold up or down regulation

Graphing all array values: the MA plot M: the greater distance from 0= the greater the R/G ratio A: the greater the distance from 0 the darker the spot on the microarray (redder or greener).

Using logarithmic values Two values used in Microarray analysis: M= ratio of red value/green value A= overall spot intensity

The Dye-swap Why? To account for dye bias (Cy5, the red dye fluoresces brighter than Cy3, the green dye. This is unfortunate but impossible to change due to differences in chemical structures of the two dyes).

Normalizing Why? A mathematical way to account for the systematic error due to dye intensity differences. Example: Gene X is 2-fold up-regulated by drought stress R/G :2.0 for gene X (drought/normal) G/R :should be 2.0 as well after swapping the dyes and RNA samples, but let’s say it is 1.9 for gene X (drought/normal).

Normalizing, cont’d Remember : Bottom line: M i is the average of 2 dye-swap array slides for each spot

How do you analyze replicated results? Mean Median Stand Dev (average) (value in middle) (spread around average) X= each data point, x (bar) = average, I= # of data points

Is a gene differentially expressed? In other words: Is the R/G ratio = 0 or not? The test statistic _ x = average of n samples s = SD

Example: Six observations of the same gene: average = SD= 1.28 N=6 Look up p-value for the calculated t-statistic. Here: 9.21% are in the red shaded area.  p= 0.09  Accept null hypothesis: Treatment and control are NOT different, M = 0 Null hypothesis: treatment and control show equal gene expression (M=0) (see next slide, too)

The null hypothesis

Bonferroni Correction Assume you do a stats test for more than one gene: Each time you accept  = 0.05 (5%) uncertainty. That means you accept false positives 5% of the time for each gene. If you accept the same error for two genes it is 1 - ( ) 2 = 0.1 (10% uncertainty). You accept that out of the 2 genes in 10% of cases one is a false positive.. For an array with n= 1000 genes, this means: 1 - ( ) 1000 = This means in 99.99% you WILL make an error in at least one gene. Assume 1000 genes and desired Bonferroni correction of 10%: Use only those genes with a p value = 0.10/1000 =

False Discovery Rate (FDR) Correction Here: * (1/6)= > under 0.05? YES, significant * (2/6)= > under 0.05? NO, not significant Why use FDR? Can use instead of Bonferroni. How? Sort all p-values low to high. Decide on your desired FDR rate (e.g 5%) Rank the genes (here: 1-6) Calculate 0.05 * (i/N) i= rank (here 1-6) N= total number of genes (here 6) If the p-value is < than 0.05*(i/N) then it is a significant gene.