Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression

Similar presentations


Presentation on theme: "Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression"— Presentation transcript:

1 Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu

2

3 Rat mesothelioma cells control Rat mesothelioma cells treated with KBrO 2

4 Normalization Method to be improved: 1.Assume that some genes will not change under the treatment under investigation. 2.Identify these core genes in advance of the experiment. 3.Normalize all genes against these genes assuming they do not change

5 Normalization New Method: 1. Assume that some genes will not change under the treatment under investigation. 2. Choose these core genes arbitrarily. 3. Normalize (provisionally) all genes against these genes assuming they do not change. 4. Determine which genes do not change under this normalization. 5. Make this set the new core. If this core differs from the previous core, go to 3. Else, done.

6 I = spot intensity [mRNA] = concentration of specific mRNA c = normalization constant Error Model

7 I = spot intensity [mRNA] = concentration of specific mRNA c = normalization constant  = lognormal multiplicative error Error Model

8 I = spot intensity [mRNA] = concentration of specific mRNA c = normalization constant  = lognormal multiplicative error index 1, i: treatment group index 2, j: replicate within treatment index 3, k: spot (gene) Error Model

9 Y = log spot intensity  = mean log concentration of specific mRNA  = treatment effect (conc. specific mRNA)  = normalization constant  = normal additive error index 1, i: treatment group index 2, j: replicate within treatment index 3, k: spot (gene)

10 Identifiability constraints: Model: Estimate by ordinary least squares:

11 Identifiability constraints: Model: But note: cannot identify between  and 

12 Self-consistency: The weight w k (  ) is small if the kth gene is judged to be changed; close to one if it is judged to be unchanged. Procedure is iterative.

13

14

15

16 Failure of Model

17

18 Generalized Model The normalization  ij (  k ) and the heteroscedasticity function  ij (  k ) are slowly varying functions of the intensity, . Estimate by Local Regression

19 data Local Regression

20 Predict value at x=50: weight, linear regression

21 Predict whole function similarly

22

23 Compare to known true function

24

25

26

27 Simulation-based Validation 1. Reproduce observed bias.

28 Simulation-based Validation 2. Reproduce observed heteroscedasticity.

29 Test based on z statistic:

30 Choice of significance level: expected number of false positives: E(false positives) =  N But minimum detectable difference increases as  gets smaller

31  E(fp)min diffmin ratio 0.052500.9162.5 0.01501.093 0.00151.293.6 0.00010.51.615

32 Validation of method against simulated data 3. Hypothesis testing: Simulated from stated model Proportion changed spots “-fold change” bias “rate false pos.” = mean observed / expected

33 Simulated data: mis-specified model — multiplicative + additive noise

34 Validation of method against simulated data 4. Hypothesis testing: Simulated from “wrong” model: additive + multiplicative noise. Proportion changed spots “-fold change” bias

35 Acknowledgments Lynn Crosby North Carolina State University Kevin Morgan Strategic Toxicological Sciences GlaxoWellcome

36 Santa Fe Institute www.santafe.edu postdoctoral fellowships available (apply before the end of the year) kepler@santafe.edu


Download ppt "Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression"

Similar presentations


Ads by Google