Download presentation
Presentation is loading. Please wait.
Published byMarshall Adrian Cross Modified over 9 years ago
1
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais Herig
2
Summary - Statistics background - Introduction to microarray - Pre-processing microarray data - Statistics analysis - Applications on the LGE - Gene Chip
3
- measurement = truth + error - error = bias + variance Error model Normalization Experimental replicate (techniques and biological) and statistics Bias describe a systematic tendency of the measurement. Ex: dyes Cy3 and Cy5 don´t have the same efficient Variance is often normally distributed, ex : instrumentation imperfection and biological variation Statistics background
4
- Standard deviation Mean : Standard deviation : mean(x) Gaussian function
5
Assume data with one outlier: x = (8, 85, 7, 9, 5, 4, 13, 6, 8) –The mean of all x’s, i.e. (x 1 +x 2 +...+x K )/K, is affected by the outlier: mean(x) = 16.11 (7.5) –The median of all x’s, i.e. the middle value of (x 1 +x 2 +...+x K ), is not (if < 50% values are outliers): x ordered = (4,5,6,7,8,8,9,13,85) median(x) = 8.0 Use the median instead of the mean if you expect artifacts. (If there are a lot of measurements and the errors are symmetrically distributed the median will give the same result as the mean without outliers.) - Mean vs median :
6
- Quantiles Mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value. Q p =30% x=(0,10,40,25,15,50,70,60) x=(0,10,15,25,40,50,60,70) ordered values Quantil(x ; 30%) = (0,10,15) 1º quartil = 10 3º quartil = 60 Median = (25+40)/2 = 32.5
7
Introduction to microarray -Three different microarray technologies : - Spotted cDNA microarrays (500 to 2500 bp) - Spotted oligonucleotide microarrays (30 to 70 bp) - Affymetrix chips (25 bp) - Can be used to : - Differential gene expression studies, gene co-regulation studies, gene function identification studies. time-course studies, dose-response studies, clinical diagnosis, …
8
Two color architecture
9
Probes: 30-meros, 90% até 550 bases downstream extremidade 3’ Targets: 10ug cRNA biotinilado Codelink architecture (one color)
10
higher frequency, more energy lower frequency, less energy excitation red laser green laser emission overlay images Scanning
11
A B C H G F D E 1234 1 234567891011 a b c d e f g h i j k Scarpari, Leandra – 2006 – Tese Doutorado Ludwig flags : (0) Int <= Back (1) Irregular spots (3) Spot ok (4) Saturated Ludwig scanner
12
Codelink flags : (L) near background (C) contaminated (S) saturated (M) masked (G) good Codelink scanner
13
A B C H G F D E 1234 LGE defined flags : (0) – Spot ok (1) – Spot Saturado (2) – Int/Back <= 1.05 (3) – Area <= 110 or 50 (9x9 or 11x11) Defined intensity : -Int Cy3 = Area Cy3 * (median(Int Cy3)- median(Bkgd(Cy3)) -Int Cy5 = Area Cy5 * (median(Int Cy5)- median(Bkgd(Cy5)) LGE scanner
14
Cy3= 3329280; Cy5= 2251624r=0.67 (fold=-1.49) (Target median - Bkgd median) * Area = integrated intensity pixels out pixels in > pixels out pixels in - * =
15
Cy3= 222824; Cy5= 15488r=0.069 fold=-14.5 flag=0 Cy3= 481536; Cy5= 676000r=fold=1.40 flag=0 Cy3= 293664; Cy5= 485368r=1.65 flag=0 Cy3= 6400; Cy5= -3584 NA (sinal:ruído<=1) flag=2 Cy3= 8767720; Cy5= 1349296 r=0.15 fold=-6.7 flag=1
16
Pre-processing microarray data -Bioconductor repository (http://www.bioconductor.org/) -Log intensities R=G Log 2 R=Log 2 G Most genes have low gene expression levels. What happens here?
17
up-regulated genes down-regulated genes non-differentially expressed genes are now along the horizontal line: M = 0 log 2 R - log 2 G = 0 R = G Transformed data {(M,A) i }: M = log 2 (R) - log 2 (G) (minus) A = ½·[log 2 (R) + log 2 (G)] (add) M vs A plot
18
log 2 R = red channel signal log 2 G = green channel signal Density plot
19
1 16 Print-tip box plot
20
Normalization within slides Expectation: Most genes are non-differentially expressed, i.e. most of the data points should be around M=0.
21
Median normalization : which sets the median of log intensity ratios to zero Median value = 0 Lowess normalization : global lowess normalization
22
Print-tip normalization : print-tip group lowess normalization X* ij =(X ij -median(GRID j ))/sd(GRID j ) Scaled print-tip : scaled print-tip group lowess normalization
23
Normalization across slides -QUANTILE QQPlot Mean between 8 slides
24
-LOWESS (applied in one color microarray) Transformed data {(M,A) i }: M = log 2 (Int 1 ) - log 2 (Int 2 ) ; A= ½·[log 2 (Int 1 ) + log 2 (Int 2 )]
25
Statistics analysis - T statistics test The T statistics down-weight the importance of the average if the deviation is large and vice versa; T = mean(x) / SE(x) where SE(x)=std.dev(x)/N (standard error of the mean) The blue gene has the lower T-value than red gene.
26
Top table and volcanoplot Fold change = ratio; if ratio >=1 or -1/ratio; if ratio < 1
27
Cluster data analysis
28
Missing values Bioinformatics (2001) vol 17, n. 6, 520-525 Gene expression microarray experiments can generate data sets with multiple missing expression. Accurate estimation of missing values is an important for efficient data analysis.
29
Applications on the LGE -Codelink (Ana Deckmann) - There is one package in the bioconductor for the codelink - Pipeline used : Read codelink file Normalize between slides : method LOWESS BMC Bioinfomatics 2005, 6:309 Background corrected Bad spot excluded Flags : C,S,M,X and I Clustering and data analyses Replicate validation At least the flags : - GG x GG - GG x LL - LL x GG Statistical analyses Fold change >= 2 P-value <= 0.05
30
LOWESS
32
-Ludwig (Leandra Scarpari) - Reformat file from ScanArray (Ludwig) to ScanAlyze to be compatible with bioconductor package (aroma and limma) - Pipeline used : Background corrected Reformat file Read ScanAlyze file Normalize across slides : method quantile Clustering and data analyses Results were compatible with Ludwig analyses Bad spot excluded Flags : 0, 1, 2 and 4 Normalize within arrays : method lowess Nucleic Acids Research, 2002, Vol 30, No 4 Replicate validation At least flag=3 in 2 internal replicates for each array Statistical analyses Fold change >= 2 P-value <= 0.05
33
LOWESS
34
QUANTILE
36
- LGE (two color) - Reformat file from Scanner LGE to ScanAlyze to be compatible with bioconductor package (aroma and limma) - Pipeline used : Background corrected Reformat file Read ScanAlyze file Normalize within arrays : method lowess Normalize across slides : method quantile Data analyses Bad spot excluded Flag: 2 (Ratio Int/Back < XX) Replicate validation At least flag 3 in 2 internal replicates for each array Statistical analyses Fold change >= 2 P-value <= 0.05
37
LOWESS + QUANTILE
38
- LGE (one color) - Reformat file from Scanner LGE to ScanAlyze to be compatible with bioconductor package (aroma and limma) - Pipeline used : Background corrected Reformat file Read ScanAlyze file Normalize within arrays : method median Normalize across slides : method quantile Clustering and data analyses Bad spot excluded Flag: 2 (Ratio Int/Back < XX) Replicate validation At least flag 3 in 2 internal replicates for each array Statistical analyses Fold change >= 2 P-value <= 0.05
39
MEDIAN + QUANTILE
40
Mais expressos em Op0d Corte/backgroundAmostrap.valueFold changeIdentidadeOrganismo 0,05G1.i106,93E-075,66gnl|Amel_1.1|Contig6992 2e-13Apis mellifera F1.j102,59E-064,05desconhecidoApis mellifera D1.i107,70E-053,08no hits (baixa qualidade) 0,01B1.a20,5153521,21Dunce 2e-39Drosophila melanogaster Mais expressos em Op5d Corte/backgroundAmostrap.valueFold changeIdentidadeOrganismo 0,05H4.b20,00017-3,00gnl|Amel_1.1|Contig4902 2e-55Apis mellifera B3.i30,000992-2,35gnl|Amel_1.1|Contig896 1e-09Apis mellifera H2.d20,001343-2,16gnl|Amel_1.1|Contig10843 1e-16Apis mellifera 0,01H4.h30,015089-2,80Groucho 1.6e-14Anopheles gambiae
41
Gene Chip
46
Fim
47
Comparison of normalization methods for Codelink Bioarray data Differences between pair of arrays in the technical replicates : (1)Array 1 vs array 4 (2)Array4 vs array 5 BMC Bioinfomatics 2005, 6:309
48
- Within slide normalization BeforeAfter Print-tip normalization No norm Print tip Scaled print tip Nucleic Acids Research, 2002, vol 30, No 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.