Download presentation
Presentation is loading. Please wait.
Published byFay Hawkins Modified over 8 years ago
1
A Routine Approach to Quality Control Peter Haberl 19. 11. 2001
2
Content The GDE Controller Playing with negative AvgDiff values - Workflow - Gradients - Distortions - Local defects - Condensing
3
GDE Controller... is part of the GD Expressionist TM system feature data (.CEL files) GD CoBi™ Database Upload server.ABS.REL.CEL DB GD Expressionist™ Controller GD Expressionist™ Analyst Workflow
4
... extends the conventional data flow GDE ControllerWorkflow
5
GDE ControllerWorkflow login options and thresholds available chip layouts (.CDF files) available experiments (.CEL files)
6
... detection The Controller is about...... correction... condensing of location dependent systematic effects (gradients) of intensity dependent systematic effects (distortions) of local defects of global gradients of global distortions constructing expression values using different algorithms GDE ControllerWorkflow
7
Gradients: incomplete washing? thermal effects?... ? GDE ControllerGradients
8
Idea: *) (single chip version) *) developed in discussions with H. Seidel (Schering, Berlin)... divide the chip into 4 x 4 sectors (as for the background determination) look at the feature distribution in each sector, in particular at the mode (maximum position) and the width ln ( counts ) ln ( intensity ) GDE ControllerGradients
9
In an iterative process, transform the intensities I(x,y) I’(x,y) = a(x,y) I(x,y) + b(x,y) such that the sector histograms become aligned. scale factor a(x,y) in first step: offset b(x,y) in first step: all sector histograms after first step: all sector histograms after third step: GDE ControllerGradients
10
It was later decided to perform only a multiplicative correction, I(x,y) I’(x,y) = a(x,y) I(x,y) for two reasons: - practical application showed that the scale factor is the dominant effect; - the observable AvgDiff is insensitive to the offset b(x,y). A basic assumption of the ‘single-chip’ version is that the distribution of bright and dark features is random. If this assumption is violated (e.g. for the yeast chip), the ‘single-chip’ version encounters problems. The ‘multi-chip’ version compares the sector histograms not among themselves, but to the sector histograms of a ‘reference chip’. (This is of course only possible if enough ‘similar’ chips are available.) GDE ControllerGradients
11
Result of Gradient Correction: ‘heat map’ of the scale factor a(x,y) originalcorrected GDE ControllerGradients
12
Further example of Gradient Correction: ‘heat map’ of the scale factor original corrected GDE ControllerGradients
13
Distortions: A log-log plot of coding (i.e. PM and MM) features can show a nonlinear relationship when compared to the features of a ‘reference chip’. One of the reasons can be that chips from different chip lots are combined to a series. (Again, the reference chip can only be constructed if enough ‘similar’ chips are available.) GDE ControllerDistortions
14
Idea: divide the reference signal region into stripes containing the same number of points (red lines) in each stripe, determine the median of experiment signals (or – equivalently – the point of maximum density) force this median line to be the diagonal of the new point cloud; this determines the (intensity dependent) transformation reference experiment GDE ControllerDistortions
15
Result of Distortion Correction: impossible to correct GDE ControllerDistortions
16
Reference chip: serves as a ‘virtual standard’ for a given experiment set Both gradient and distortion detection/correction require the concept of a the experiment set should be homogeneous: - chips from the same production lot - probes from the same tissue - a small number of differentially expressed genes - doesn’t change the characteristic pattern the reference chip is computed featurewise (as mean or median) the chips have to be made comparable, for instance with a global logarithmic-mean normalization normalized set reference chip GDE ControllerReference Chip
17
Local defects: There are local defects which are already visible in a global chip view: Aim: Can we reliably detect smaller local defects, if possible automatically? view of outlier locations: GDE ControllerLocal Defects
18
Idea: construct a ‘ratio chip’ by dividing each feature by its counterpart on the reference chip for visualisation purposes, show in - green features which are brighter - red features which are darker - black features that don’t change local defects should show up as speckles of homogeneous color, with diameters of at least several features 0 1 012 y 00 y 01 y 02 y 10 y 11 y 12 reference 0 1 012 x 00 x 01 x 02 x 10 x 11 x 12 experiment x 00 /y 00 0 1 012 x 01 /y 01 x 02 /y 02 x 10 /y 10 x 11 /y 11 x 12 /y 12 ratio chip GDE ControllerLocal Defects
19
differential regulation actual defects GDE ControllerLocal Defects
20
This method can identify defects which would be hard to find... GDE ControllerLocal Defects
21
... or invisible, even in a zoomed view: GDE ControllerLocal Defects
22
For old (row-wise spotted) chips, there is the danger that differen- tially expressed genes are detected as chip artefacts Application of pattern search algorithms can solve this problem differential regulation GDE ControllerLocal Defects
23
Further example of a local defect: GDE ControllerLocal Defects
24
Defects can have a certain spatial extension: GDE ControllerLocal Defects
25
GDE ControllerLocal Defects Most frequent structures:
26
GDE ControllerLocal Defects... and others:
27
An interactive chip viewer allows to - view identified mask areas - zoom and find out which genes - are affected by masking - manually edit the masked areas GDE ControllerLocal Defects
28
GDE ControllerWorkflow reporting export to database, into analysis software or as.CEL files choose between different condensing algorithms: MAS4, MAS5, GeneData ( = trimmed mean of log(PM) )
29
replicates:large differential expression: log-log plot: correlation of large values is visible only positive values can be displayed Playing with negative AvgDiff values
30
replicates:large differential expression: linear-linear plot: Playing with negative AvgDiff values negative values can be displayed poor resolution for small values large values appear scattered
31
replicates: ‘cube-root’ plot: damping at large values ‘zero density regions’ (artefact) display of positive and negative values y = AvgDiff 3 Playing with negative AvgDiff values
32
‘lin-log’ transformation: damping of high values interpolates smoothly between linear (for small values) and logarithmic (for large values) behaviour y = sign(x)*ln( 1 + |x| ) sign(x)*ln( 1 + |x| ) = Playing with negative AvgDiff values y = x y = ln(x) x + o(x 3 ), x < x2x2 2 - + ± ln( |x| ) + + o( ), 1 x 1 x 2 < 1 x >> 1 =
33
replicates:large differential expression: ‘lin-log’ plot: Playing with negative AvgDiff values A good choice is x = AvgDiff / Target, i.e. the target intensity sets the scale Lines of constant factors are shown in blue (2), red (5) and green (10)
34
Consider the following ‘experiment’: Construct faked.CEL files, where all PM-MM-pairs are interchanged, and condense them with the old Affymetrix algorithm (ignoring AbsCall). Amusing observation: If one ignores that the scale factor gets negative, (MAS doesn’t: “Failed to analyze due to invalid Scale Factor”) the old (MAS4) algorithm would be invariant under PM MM ! Target TrimmedMean(AvgDiff) SF = Playing with negative AvgDiff values The ‘lin-log’ plots allow to look at positive and negative AvgDiff values simultaneously. But why would we want to look at the negatives at all?
35
perfect group separation: within replicate groups across replicate groups Playing with negative AvgDiff values Original data: the ‘three-tissue-dataset’: 3 groups with 6 replicates each
36
PM MM data: These are log-log-plots of negative AvgDiffs. The good correlation at high values indicates that these numbers are reproducible. The difference between replica groups is not so obvious, but... Playing with negative AvgDiff values
37
... clustering again results in a complete group separation: Take-home message: The mismatches carry information which can be measured reproducibly and can be used (at least) for pattern comparisons. Playing with negative AvgDiff values
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.