Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating Signal with Next Generation Affymetrix Software Earl Hubbell, Ph.D. Principal Statistician, Applied Research.

Similar presentations


Presentation on theme: "Estimating Signal with Next Generation Affymetrix Software Earl Hubbell, Ph.D. Principal Statistician, Applied Research."— Presentation transcript:

1 Estimating Signal with Next Generation Affymetrix Software Earl Hubbell, Ph.D. Principal Statistician, Applied Research

2 Quick Review of AvgDiff Operates on PM-MM Operates on PM-MM Removes largest & smallest values Removes largest & smallest values Removes >3 standard deviation values Removes >3 standard deviation values

3 Areas for improvement AvgDiff Minimally Robust against Minority Probes AvgDiff Minimally Robust against Minority Probes Negative Values Impossible for Concentration or Intensity Negative Values Impossible for Concentration or Intensity Negative Values Indicate Bias Is Larger than True Effect Negative Values Indicate Bias Is Larger than True Effect Incompatible with Standard Log- Transformation Incompatible with Standard Log- Transformation

4 Desirable Properties Robust against minority probes Robust against minority probes Doesn’t yield unphysical results for signal Doesn’t yield unphysical results for signal Reasonable predictor of concentration Reasonable predictor of concentration

5 A simple model for intensity PM Intensity = Real Signal+ Stray Signal PM Intensity = Real Signal+ Stray Signal Real, Stray, PM all non-negative Real, Stray, PM all non-negative log(Real) = log(Affinity) + log(Concentration) + e log(Real) = log(Affinity) + log(Concentration) + e (multiplicative error model) (multiplicative error model)

6 AvgDiff (MAS 4.0) PM PM Stray Estimate = MM Stray Estimate = MM Super-Olympic- Scoring on PM-MM (mean like statistic) Super-Olympic- Scoring on PM-MM (mean like statistic) Making an estimate of signal - observe PM - adjust PM for stray signal - value = statistic(adjusted PM) Signal (MAS 5.0) PM Stray Estimate = CT [best of two estimates] Tukey Biweight on log(PM-CT) (median like)

7 Handling stray signal PM intensities have stray signal component (intensity not due to real signal) PM intensities have stray signal component (intensity not due to real signal) Many MM have similar stray signal to PM Many MM have similar stray signal to PM But some MM are not useful for estimation of stray signal But some MM are not useful for estimation of stray signal Anomalous MM values can be handled with imputation Anomalous MM values can be handled with imputation

8 At zero concentration PM has non-zero intensity As concentration increases, intensity increases

9 Some mismatches don’t tell us about stray signal

10 Model-violating MM values censor real signal information - Impute typical stray signal for such PM probes

11 Removal of stray signal estimate leaves positive values

12 Signal calculation (equations) Signal = Tukey biweight (log(Adjusted PM)) Signal = Tukey biweight (log(Adjusted PM)) Stray = MM (if physically possible) or Stray = MM (if physically possible) or log(Stray) = log(PM)-log(Stray proportion) (if impossible) log(Stray) = log(PM)-log(Stray proportion) (if impossible) Stray proportion = max(SB, positive) Stray proportion = max(SB, positive) SB = Tukey biweight (log(PM)-log(MM)) (“typical” log-ratio) SB = Tukey biweight (log(PM)-log(MM)) (“typical” log-ratio)

13 Is signal a reasonable predictor of concentration? Near linear behavior Near linear behavior Stabilized variance Stabilized variance

14 Average Signal for 12 human spiked transcripts (3x replicate)

15 Signal is near-linear and has stabilized variance in the middle range of concentrations

16 Resistance to outliers Introduce 10% artificial outliers to check robustness Introduce 10% artificial outliers to check robustness Nonparametric correlation to handle both log-scale and linear-scale data Nonparametric correlation to handle both log-scale and linear-scale data Verify data against known spike concentration Verify data against known spike concentration

17 Superior performance against outliers

18 MAS 5.0 more robust against outliers in biological samples Adrenal Kidney Pancreas 1535_at from Hu95A

19 Summary Mas 5.0 Signal is a reasonable predictor of concentration Mas 5.0 Signal is a reasonable predictor of concentration Tukey biweight resists outliers Tukey biweight resists outliers AvgDiff insufficiently robust in biological samples AvgDiff insufficiently robust in biological samples Log-scale transformation now possible Log-scale transformation now possible Continued algorithm development underway... Continued algorithm development underway...

20 Acknowledgements Wei-Min Liu Wei-Min Liu Fred Christians Fred Christians Tom Ryder Tom Ryder Suzanne Dee Suzanne Dee Steve Smeekens Steve Smeekens Paul Kaplan Paul Kaplan Rui Mei Teresa Webster Xiaojun Di Ming-hsiu Ho Jyoti Baid Chris Harrington Tarif Awad


Download ppt "Estimating Signal with Next Generation Affymetrix Software Earl Hubbell, Ph.D. Principal Statistician, Applied Research."

Similar presentations


Ads by Google