Download presentation
Presentation is loading. Please wait.
Published byJaquan Pass Modified over 9 years ago
1
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry Speed, Walter & Eliza Hall Institute of Medical Research)
2
Summary Summarize the expression level of a probe set by Average Log 2 (PM-BG) PMs need to be normalized Background makes no use of probe-specific MM Evaluate and compare through bias, variance and model fit to AvDiff and the Li & Wong algorithm Use Gene Logic spike-in and dilution study All three expression measures performed well AvLog(PM-BG) is arguably the best of the three
3
SD vs. Avg of Defective Probes
4
Normalization at Probe Level
5
Spike-In Experiments Add concentrations (0.5pM – 100 pM) of 11 foreign species cRNAs to hybridization mixture Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips. Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)
6
Set A: Probe Level Data (12 chips)
7
What Did We Learn? Don’t subtract or divide by MM Probe effect is additive on log scale Take logs
8
Why Remove Background?
9
Background Distribution
10
Average Log 2 (PM-BG) Normalize probe level data Compute BG = background mean by estimating the mode of the MM distribution Subtract BG from each PM If PM-BG < 0 use minimum of positives divided by 2 Take average
11
Expression after Normalization
12
Expression Level Comparison
13
Spike-In B Probe SetConc 1Conc 2Rank BioB-51000.51 BioB-30.525.02 BioC-52.075.04 BioB-M1.037.54 BioDn-31.550.05 DapX-335.73.06 CreX-350.05.07 CreX-512.52.08 BioC-325.01009 DapX-55.01.510 DapX-M3.01.011 Later we consider 23 different combinations of concentrations
14
Differential Expression
18
Observed Ranks GeneAvDiffMAS 5.0Li&WongAvLog(PM-BG) BioB-56211 BioB-316132 BioC-574625 BioB-M30373 BioDn-344564 DapX-323924 7 CreX-333373369 CreX-532763331288 BioC-3270985726816431 DapX-527091021220310 DapX-M16519136 Top 1515610
19
Observed vs True Ratio
20
Dilution Experiment cRNA hybridized to human chip (HGU95) in range of proportions and dilutions Dilution series begins at 1.25 g cRNA per GeneChip array, and rises through 2.5, 5.0, 7.5, 10.0, to 20.0 g per array. 5 replicate chips were used at each dilution Normalize just within each set of 5 replicates For each probe set compute expression, average and SD over replicates, and fit a line to log expression vs. log concentration Regression line should have slope 1 and high R 2
21
Dilution Experiment Data
22
Expression and SD
23
Slope Estimates and R 2
24
Model check Compute observed SD of 5 replicate expression estimates Compute RMS of 5 nominal SDs Compare by taking the log ratio Closeness of observed and nominal SD taken as a measure of goodness of fit of the model
25
Observed vs. Model SE
27
Conclusion Take logs PMs need to be normalized Using global background improves on use of probe-specific MM Gene Logic spike-in and dilution study show all three expression measures performed very well AvLog(PM-BG) is arguably the best in terms of bias, variance and model fit Future: better BG; robust/resistant summaries
28
Acknowledgements Gene Brown’s group at Wyeth/Genetics Institute, and Uwe Scherf’s Genomics Research & Development Group at Gene Logic, for generating the spike-in and dilution data Gene Logic for permission to use these data Francois Collin (Gene Logic) Ben Bolstad (UC Berkeley) Magnus Åstrand (Astra Zeneca Mölndal)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.