Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Routine Approach to Quality Control Peter Haberl 19. 11. 2001.

Similar presentations


Presentation on theme: "A Routine Approach to Quality Control Peter Haberl 19. 11. 2001."— Presentation transcript:

1 A Routine Approach to Quality Control Peter Haberl 19. 11. 2001

2 Content The GDE Controller Playing with negative AvgDiff values - Workflow - Gradients - Distortions - Local defects - Condensing

3 GDE Controller... is part of the GD Expressionist TM system feature data (.CEL files) GD CoBi™ Database Upload server.ABS.REL.CEL DB GD Expressionist™ Controller GD Expressionist™ Analyst Workflow

4 ... extends the conventional data flow GDE ControllerWorkflow

5 GDE ControllerWorkflow login options and thresholds available chip layouts (.CDF files) available experiments (.CEL files)

6 ... detection The Controller is about...... correction... condensing of location dependent systematic effects (gradients) of intensity dependent systematic effects (distortions) of local defects of global gradients of global distortions constructing expression values using different algorithms GDE ControllerWorkflow

7 Gradients: incomplete washing? thermal effects?... ? GDE ControllerGradients

8 Idea: *) (single chip version) *) developed in discussions with H. Seidel (Schering, Berlin)... divide the chip into 4 x 4 sectors (as for the background determination) look at the feature distribution in each sector, in particular at the mode (maximum position) and the width ln ( counts ) ln ( intensity ) GDE ControllerGradients

9 In an iterative process, transform the intensities I(x,y) I’(x,y) = a(x,y) I(x,y) + b(x,y) such that the sector histograms become aligned. scale factor a(x,y) in first step: offset b(x,y) in first step: all sector histograms after first step: all sector histograms after third step: GDE ControllerGradients

10 It was later decided to perform only a multiplicative correction, I(x,y) I’(x,y) = a(x,y) I(x,y) for two reasons: - practical application showed that the scale factor is the dominant effect; - the observable AvgDiff is insensitive to the offset b(x,y). A basic assumption of the ‘single-chip’ version is that the distribution of bright and dark features is random. If this assumption is violated (e.g. for the yeast chip), the ‘single-chip’ version encounters problems. The ‘multi-chip’ version compares the sector histograms not among themselves, but to the sector histograms of a ‘reference chip’. (This is of course only possible if enough ‘similar’ chips are available.) GDE ControllerGradients

11 Result of Gradient Correction: ‘heat map’ of the scale factor a(x,y) originalcorrected GDE ControllerGradients

12 Further example of Gradient Correction: ‘heat map’ of the scale factor original corrected GDE ControllerGradients

13 Distortions: A log-log plot of coding (i.e. PM and MM) features can show a nonlinear relationship when compared to the features of a ‘reference chip’. One of the reasons can be that chips from different chip lots are combined to a series. (Again, the reference chip can only be constructed if enough ‘similar’ chips are available.) GDE ControllerDistortions

14 Idea: divide the reference signal region into stripes containing the same number of points (red lines) in each stripe, determine the median of experiment signals (or – equivalently – the point of maximum density) force this median line to be the diagonal of the new point cloud; this determines the (intensity dependent) transformation reference experiment GDE ControllerDistortions

15 Result of Distortion Correction: impossible to correct GDE ControllerDistortions

16 Reference chip: serves as a ‘virtual standard’ for a given experiment set Both gradient and distortion detection/correction require the concept of a the experiment set should be homogeneous: - chips from the same production lot - probes from the same tissue - a small number of differentially expressed genes - doesn’t change the characteristic pattern the reference chip is computed featurewise (as mean or median) the chips have to be made comparable, for instance with a global logarithmic-mean normalization normalized set reference chip GDE ControllerReference Chip

17 Local defects: There are local defects which are already visible in a global chip view: Aim: Can we reliably detect smaller local defects, if possible automatically? view of outlier locations: GDE ControllerLocal Defects

18 Idea: construct a ‘ratio chip’ by dividing each feature by its counterpart on the reference chip for visualisation purposes, show in - green features which are brighter - red features which are darker - black features that don’t change local defects should show up as speckles of homogeneous color, with diameters of at least several features 0 1 012 y 00 y 01 y 02 y 10 y 11 y 12 reference 0 1 012 x 00 x 01 x 02 x 10 x 11 x 12 experiment x 00 /y 00 0 1 012 x 01 /y 01 x 02 /y 02 x 10 /y 10 x 11 /y 11 x 12 /y 12 ratio chip GDE ControllerLocal Defects

19 differential regulation actual defects GDE ControllerLocal Defects

20 This method can identify defects which would be hard to find... GDE ControllerLocal Defects

21 ... or invisible, even in a zoomed view: GDE ControllerLocal Defects

22 For old (row-wise spotted) chips, there is the danger that differen- tially expressed genes are detected as chip artefacts Application of pattern search algorithms can solve this problem differential regulation GDE ControllerLocal Defects

23 Further example of a local defect: GDE ControllerLocal Defects

24 Defects can have a certain spatial extension: GDE ControllerLocal Defects

25 GDE ControllerLocal Defects Most frequent structures:

26 GDE ControllerLocal Defects... and others:

27 An interactive chip viewer allows to - view identified mask areas - zoom and find out which genes - are affected by masking - manually edit the masked areas GDE ControllerLocal Defects

28 GDE ControllerWorkflow reporting export to database, into analysis software or as.CEL files choose between different condensing algorithms: MAS4, MAS5, GeneData ( = trimmed mean of log(PM) )

29 replicates:large differential expression: log-log plot: correlation of large values is visible only positive values can be displayed Playing with negative AvgDiff values

30 replicates:large differential expression: linear-linear plot: Playing with negative AvgDiff values negative values can be displayed poor resolution for small values large values appear scattered

31 replicates: ‘cube-root’ plot: damping at large values ‘zero density regions’ (artefact) display of positive and negative values y = AvgDiff 3 Playing with negative AvgDiff values

32 ‘lin-log’ transformation: damping of high values interpolates smoothly between linear (for small values) and logarithmic (for large values) behaviour y = sign(x)*ln( 1 + |x| ) sign(x)*ln( 1 + |x| ) = Playing with negative AvgDiff values y = x y = ln(x) x + o(x 3 ), x < x2x2 2 - + ± ln( |x| ) + + o( ), 1 x 1 x 2 < 1 x >> 1 =

33 replicates:large differential expression: ‘lin-log’ plot: Playing with negative AvgDiff values A good choice is x = AvgDiff / Target, i.e. the target intensity sets the scale Lines of constant factors are shown in blue (2), red (5) and green (10)

34 Consider the following ‘experiment’: Construct faked.CEL files, where all PM-MM-pairs are interchanged, and condense them with the old Affymetrix algorithm (ignoring AbsCall). Amusing observation: If one ignores that the scale factor gets negative, (MAS doesn’t: “Failed to analyze due to invalid Scale Factor”) the old (MAS4) algorithm would be invariant under PM MM ! Target TrimmedMean(AvgDiff) SF = Playing with negative AvgDiff values The ‘lin-log’ plots allow to look at positive and negative AvgDiff values simultaneously. But why would we want to look at the negatives at all?

35 perfect group separation: within replicate groups across replicate groups Playing with negative AvgDiff values Original data: the ‘three-tissue-dataset’: 3 groups with 6 replicates each

36 PM MM data: These are log-log-plots of negative AvgDiffs. The good correlation at high values indicates that these numbers are reproducible. The difference between replica groups is not so obvious, but... Playing with negative AvgDiff values

37 ... clustering again results in a complete group separation: Take-home message: The mismatches carry information which can be measured reproducibly and can be used (at least) for pattern comparisons. Playing with negative AvgDiff values


Download ppt "A Routine Approach to Quality Control Peter Haberl 19. 11. 2001."

Similar presentations


Ads by Google