Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization for cDNA Microarray Data

Similar presentations


Presentation on theme: "Normalization for cDNA Microarray Data"— Presentation transcript:

1 Normalization for cDNA Microarray Data
Originally Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. Ho Kim

2 Normalization To describe the process of removing system variations such as Physical properties of dyes efficiency of dye incorporation Experimental variability in probe coupling and processing procedures Scanner settings

3 Normalization issues Within-slide Paired-slides (dye swap)
What genes to use Location Scale Paired-slides (dye swap) Self-normalization Between slides

4 Within-Slide Normalization
Normalization balances red and green intensities. Imbalances can be caused by Different incorporation of dyes Different amounts of mRNA Different scanning parameters In practice, we usually need to increase the red intensity a bit to balance the green

5 log2R/G -> log2R/G - c = log2R/ (kG)
Methods? Global normalization log2R/G -> log2R/G - c = log2R/ (kG) Standard Practice (in most software) c is a constant such that normalized log-ratios have zero mean or median. Our Preference: c is a function of overall spot intensity and print-tip-group.

6 What genes to use? All genes on the array : when only small portion of gene are expected to be differentially expressed, symmetry is also assumed. Constantly expressed genes (house keeping) : e.g. Beta actin Controls Spiked controls (e.g. synthetic DNA sequences, plant genes) : should have equal red and green intensities Genomic DNA titration series Other set of genes

7 Experiment KO #8 mRNA samples R = Apo A1 KO mouse liver G = Control
(All C57Bl/6) KO #8 Probes: ~6,000 cDNAs, including 200 related to lipid metabolism.

8 M vs. A M = log2(R / G) : log intensity ratio
A = log2(R*G) / 2 : mean log-intensity

9 Normalization - Median
Assumption: Changes roughly symmetric First panel: smooth density of log2G and log2R. Second panel: M vs. A plot with median set to zero

10 Nonpapametric Smoothing (1)
Consider X Y plot. Draw a regression line which requires no parametric assumptions The regression line is not linear The regression line is totally dependent on the data Two components of smoothing Kernal function : How to calculate weighted mean Bandwidth : width of the window (span), determines the smoothness of the regresssion line; wider > smoother

11 Nonpapametric Smoothing (2)
Uniform Kernel

12 Nonpapametric Smoothing (3)
Triangular Kernel

13 Nonpapametric Smoothing (4)
Normal Kernel

14 Nonpapametric Smoothing (5)
Default Lowess line : Span=0.5

15 Nonpapametric Smoothing (6)
Lowess line : Span=0.2

16 Nonpapametric Smoothing (7)
Lowess line : Span=0.1

17 Normalization - lowess
Global lowess Assumption: changes roughly symmetric at all intensities.

18 Normalisation - print-tip-group
Assumption: For every print group, changes roughly symmetric at all intensities.

19 M vs. A - after print-tip-group normalization

20 Effects of Location Normalisation
Before normalisation After print-tip-group normalisation

21 Box Plot IQR=Q3-Q1 Outliers 1.5*IQR Q3 Median(Q2) Q1

22 QQ-plot : to compare sample distribution with other ones (e.g. normal)
T(df=9) vs standard normal

23 Within print-tip-group box plots for print-tip-group normalized M

24 Taking scale into account
Assumptions: All print-tip-groups have the same spread. True ratio is mij where i represents different print-tip-groups, j represents different spots. Observed is Mij, where Mij = ai mij Robust estimate of ai is MADi = medianj { |yij - median(yij) | }

25 Effect of location + scale normalization

26 Effect of location + scale normalization

27 Comparing different normalisation methods

28 Follow-up Experiment 50 distinct clones with largest absolute
t-statistics from the first experiment. 72 other clones. Spot each clone 8 times . Two hybridizations: Slide 1, ttt -> red ctl-> green. Slide 2, ttt -> green ctl->red.

29 Follow-up Experiment

30 Paired-slides: dye swap
Slide 1, M = log2 (R/G) - c Slide 2, M’ = log2 (R’/G’) - c’ Combine by subtract the normalized log-ratios: [ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2  [ log2 (R/G) + (log2 (G’/R’) ] / 2  [ log2 (RG’/GR’) ] / 2 = (M-M`)/2 provided c = c’ Assumption: the separate normalizations are the same.

31 Verify Assumption

32 Result of Self-Normalization
Plot of (M - M’)/2 vs. (A + A’)/2

33 Summary Case 1: A few genes that are likely to change
Within-slide: Location: print-tip-group lowess normalization. Scale: for all print-tip-groups, adjust MAD to equal the geometric mean for MAD for all print-tip-groups. Between slides (experiments) : An extension of within-slide scale normalization (future work). Case 2: Many genes changing (paired-slides) Self-normalization: taking the difference of the two log-ratios. Check using controls or known information.

34 Technical Reports from Terry’s group:
Technical Reports from Terry’s group: /papersindex.html Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Comparison of methods for image analysis on cDNA microarray data. Normalization for cDNA Microarray Data Statistical software R


Download ppt "Normalization for cDNA Microarray Data"

Similar presentations


Ads by Google