Download presentation
Presentation is loading. Please wait.
Published byEdgar Whitehead Modified over 9 years ago
1
The following slides have been adapted from http://www.tm4.org/http://www.tm4.org/ to be presented at the Follow-up course on Microarray Data Analysis (Nov 20-24 2006, PICB Shanghai) by Peter Serocka
2
MIcroarray Data Analysis System (version 2.19 ) Wei Liang October 2004
3
Microarray Data Flow Image Analysis Database AGED Database Others… Database MAD Raw Gene Expression Data Normalized Data with Gene Annotation Interpretation of Analysis Results.tiff Image File Gene Annotation ScannerPrinter Normalization / Filtering Expression Analysis Data Entry / Management
4
MIDAS is a Normalization and Filtering tool for microarray data analysis!
5
Serves as a data pre-processor for clustering analysis (MeV).
6
Why Normalization and Filtering? Cy3 Cy5 Cy5-cDNA Cy3-cDNA RT cDNA array Cy5 intensity Cy3 intensity Sample2 mRNA Sample1 mRNA Wavelength dependent Intensity dependent Uneven hybridization gel print-tip variations Background variations Image processing algorithm- dependent Systematic experimental error.tiff Image Files Raw Data File
7
Why Normalization and Filtering? We use these intensities to identify biologically relevant patterns of expression by comparing measured levels between states on a gene-by-gene basis. However, before the levels can be appropriately compared, one generally performs a number of transformations on the data to eliminate questionable or low quality data, to adjust the measured intensities to facilitate comparisons, and to select those genes that are significantly differentially expressed. The hypothesis underlying microarray analysis is that the measured intensities for each arrayed gene represent its relative expression level.
8
MIDAS data analysis methods 8 normalization/transformation methods Total Intensity normalization 10 quality control filtering methods Invalid-intensity checking LOWESS (Locfit) normalization Iterative linear regression normalization Iterative log mean centering normalization Ratio Statistics normalization Low intensity filter Standard deviation regularization Slice analysis (non-statistical) In-slide replicates analysis Flip-dye consistency checking Ratio Statistics confidence interval checking Signal/Noise checking Cross-file-trim Spot QC flag checking MA-ANOVA Cross-slide replicates t-test (statistical) Cross-slide one-class SAM (statistical) 3 significant genes identification methods
9
Graphical scripting language
10
Read input files Define analysis pipeline and set parameters for each analysis module Write output files
11
MIDAS data analysis methods 8 normalization/transformation methods Total Intensity normalization 10 quality control filtering methods Invalid-intensity checking LOWESS (Locfit) normalization Iterative linear regression normalization Iterative log mean centering normalization Ratio Statistics normalization Low intensity filter Standard deviation regularization Slice analysis (non-statistical) In-slide replicates analysis Flip-dye consistency checking Ratio Statistics confidence interval checking Signal/Noise checking Cross-file-trim Spot QC flag checking MA-ANOVA Cross-slide replicates t-test (statistical) Cross-slide one-class SAM (statistical) 3 significant genes identification methods
12
Sample data Pair #1 st file name2 nd file name 1NFE005d0001.mevNFE005d00020.mev 2NFE005d0002.mevNFE005d00021.mev 3NFE005d0003.mevNFE005d00022.mev 4NFE005d0004.mevNFE005d00023.mev 5NFE005d0005.mevNFE005d00024.mev 6NFE005d0006.mevNFE005d00025.mev 7NFE005d0007.mevNFE005d00026.mev 9NFE005d0008.mevNFE005d00027.mev 10NFE005d0009.mevNFE005d00028.mev 11NFE005d00010.mevNFE005d00029.mev 12NFE005d00011.mevNFE005d00030.mev 13NFE005d00012.mevNFE005d00031.mev 14NFE005d00013.mevNFE005d00032.mev 15NFE005d00014.mevNFE005d00033.mev 16NFE005d00015.mevNFE005d00034.mev 17NFE005d00016.mevNFE005d00035.mev 18NFE005d00017.mevNFE005d00036.mev 19NFE005d00018.mevNFE005d00037.mev 20NFE005d00019.mevNFE005d00038.mev
13
LOWESS (Locfit) normalization ASD = 0.346 Observations 1.Tilted tails at low intensity end and high intensity end 2. Mean not centered at 0 – intensity dependent R-I plot: logRatio vs. logIntensityProduct
14
LOWESS (Locfit) normalization ASD = 0.346 Gene X If Cy3, Cy5 equally expressed, log 2 (Cy5/Cy3) = 0 Two factors contributed to the up-regulated gene X: 1. Biological factors (we are interested) 2. Experimental factors, e.g. different sensitivity to red and green lasers (we are NOT interested and desire to get rid of.) Exp factor Bio factor
15
ASD = 0.346 Gene X Exp factor Bio factor We need to find a way to extract the experimental factors Approach: Assume similar experimental factors applied to genes closer to each other in the logProd-logRatio plot Predict the Exp factor from a group of locally neighboring data --- equivalent to a curve fitting problem. LOWESS (Locfit) normalization
16
Local linear regression model Tri-cube weight function Least Squares Estimated values of log 2 (Cy5/Cy3) as function of log 10 (Cy3*Cy5) ASD = 0.346
17
LOWESS (Locfit) normalization Use the estimated curve y(x i ) to correct raw data ASD = 0.346 Gene X y(x i ) = Exp factor Bio factor log 2 (R i ’/G i ’) = log 2 (R i /G i ) – y(x i ) log 2 (R i ’/G i ’) = log 2 (R i /G i ) – log 2 2 y(xi) log 2 (R i ’/G i ’) = log2(R i /G i * 1/2y(x i )) R i ’ = R i G i ’ = G i * 2 y(xi)
18
LOWESS (Locfit) normalization SD = 0.346 SD = 0.338 B LOWESS-corrected RI plot
19
Standard deviation regularization Assumption: Within each block and each slide, spots should have the same spread for log(Cy5/Cy3, 2) values SD-Reg scales the (Cy3, Cy5) intensity pair for each spot so that the spot sets within each block or each slide will have the same standard deviation as other blocks or slides.
20
Standard deviation regularization Let a ij be the raw log ratio for the j th spot in i th block (or slide) where N j denotes the number of genes i th block or i th slide, M denotes the number of blocks or slides, a ij denotes the log ratio mean of i th block (or i th slide) a’ ij be the scaled log ratio for the j th spot in i th block (or slide)
21
Standard deviation regularization
22
Flip dye replicates consistency filter The intensities in the file pair are flipped, i.e. R1/G1 ~ G2/R2 or R1~ G2, G1 ~ R2 G1R1 G2R2 Gene1 Gene2 Gene3 Gene4 Gene8 Gene7 Gene6 Gene5 Flip dye experiments help reduce random error
23
Flip dye replicates consistency filter Calculate expression levels for all genes in the flip-dye pair Filter genes with inconsistent expression levels between flip-dye replicates For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs How consistency is measured between replicates?
24
Flip dye replicates consistency filter File 1 File 2 G1R1G2R2 Gene 100% consistency:
25
Flip dye replicates consistency Filter SD cut vs. Threshold cut SD cut Threshold cut Regardless of datasets, always cut the same percentage for the same The percentage to cut depends on the specified log-ratio consistency range -1< < 1 1/2 < < 2
26
Flip dye replicates consistency filter Calculate expression levels for all genes in the flip-dye pair Filter genes with inconsistent expression levels between flip-dye replicates For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs
27
Slice Analysis filter Remove genes with z-scores beyond an interested range
28
Slice Analysis filter Remove genes with z-scores beyond an interested range
29
Slice Analysis filter SD = 0.346 SD = 0.338 B Define a slice window Sliding the window along the log(IntensityProduct) axis Calculate logRatioMean and logRatioSD of data points within each slice window Calculate Z-scores of each data point Z-score = (logRatio-logRatioMean)/ logRatioSD Trim data with Z-scores beyond interested range
30
Slice Analysis filter
31
Analysis packaging myAnalysis.prj
32
MIDAS graphing
33
R-I plot (.prc) Box plot (.box) FlipDye Diagnostic plot (.rrc)Intensity plot (.ity,.lty) Z-score Distribution plot (.his)SAM plot (.sam)
34
MIDAS data viewer
35
Statistical significant genes identification methods Two methods implemented in this release of MIDAS: Cross-slide replicates one-class T-test Cross-slide replicates one-class SAM
36
SAM (Significance Analysis of Microarrays) Tusher, V.G., R. Tibshirani and G. Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA 98: 5116- 5121. A statistical technique for finding significant genes in a set of microarray experiments. Reference: Designs: two-class unpaired two-class paired multi-class unpaired censored survival one-class (available in this release)
37
SAM (Significance Analysis of Microarrays) One-class SAM: Identify genes whose mean expression across experiments are different from a user-specified mean. Assign a score (d) to each gene based on its change in expression relative to the standard deviation of repeated measurements for the gene Genes with scores > a threshold (Δ) are deemed potentially significant For these “deemed potentially significant” genes, the proportion of them likely to have been wrongly identified by chance, or False Discovery Rate (FDR) is estimated The goal is picking a set of differentially expressed genes with a user-satisfied FDR
38
SAM (Significance Analysis of Microarrays) Δ adjustment FDR positively significant genes
39
Automated report generation
41
TM4 MIDAS web page http://www.tigr.org/software/tm4/midas.html http://www.tm4.org/midas.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.