A Routine Approach to Quality Control Peter Haberl 19. 11. 2001.

Slides:



Advertisements
Similar presentations
Point Processing Histograms. Histogram Equalization Histogram equalization is a powerful point processing enhancement technique that seeks to optimize.
Advertisements

QR Code Recognition Based On Image Processing
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Grey Level Enhancement Contrast stretching Linear mapping Non-linear mapping Efficient implementation of mapping algorithms Design of classes to support.
Computational Biology, Part 23 Biological Imaging II Robert F. Murphy Copyright  1996, 1999, All rights reserved.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Computer Vision Lecture 16: Texture
Assumption of normality
Histograms – Chapter 4 Continued.
Image Processing IB Paper 8 – Part A Ognjen Arandjelović Ognjen Arandjelović
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Lecture 4 Linear Filters and Convolution
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
MRI Image Segmentation for Brain Injury Quantification Lindsay Kulkin 1 and Bir Bhanu 2 1 Department of Biomedical Engineering, Syracuse University, Syracuse,
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Introduction to Image Quality Assessment
Packard BioScience. Packard BioScience What is ArrayInformatics?
GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.
Felix Naef & Marcelo Magnasco, GL meeting, Nov Outline Background subtraction Probeset statistics Excursions into.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Despeckle Filtering in Medical Ultrasound Imaging
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
Microarray Preprocessing
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Spectral contrast enhancement
Computer vision.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Multimodal Interaction Dr. Mike Spann
Computer Graphics Texture Mapping
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Bug Localization with Machine Learning Techniques Wujie Zheng
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Microarray - Leukemia vs. normal GeneChip System.
Jorge Cornejal Carlin Baez Edisson Garcia. How to Use LAYERs Illustrator's layers allow you to simplify your work. With layers, you can modify, edit,
Image Processing Edge detection Filtering: Noise suppresion.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
EECS 274 Computer Vision Model Fitting. Fitting Choose a parametric object/some objects to represent a set of points Three main questions: –what object.
CSE 185 Introduction to Computer Vision Feature Matching.
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
Digital Image Processing
1 Mathematic Morphology used to extract image components that are useful in the representation and description of region shape, such as boundaries extraction.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
By Christy Quattrone Click to View Types of Graphs Data Analysis, Grade 5.
Using ArrayStar with a public dataset
Microarray - Leukemia vs. normal GeneChip System.
Digital 2D Image Basic Masaki Hayashi
2-DE gel analysis Harini Chandra
Exploring and Understanding ChIP-Seq data
Getting the numbers comparable
Anastasia Baryshnikova  Cell Systems 
Mapping Gene Expression in Two Xenopus Species: Evolutionary Constraints and Developmental Flexibility  Itai Yanai, Leonid Peshkin, Paul Jorgensen, Marc W.
Volume 12, Issue 6, Pages (December 2003)
Presentation transcript:

A Routine Approach to Quality Control Peter Haberl

Content The GDE Controller Playing with negative AvgDiff values - Workflow - Gradients - Distortions - Local defects - Condensing

GDE Controller... is part of the GD Expressionist TM system feature data (.CEL files) GD CoBi™ Database Upload server.ABS.REL.CEL DB GD Expressionist™ Controller GD Expressionist™ Analyst Workflow

... extends the conventional data flow GDE ControllerWorkflow

GDE ControllerWorkflow login options and thresholds available chip layouts (.CDF files) available experiments (.CEL files)

... detection The Controller is about correction... condensing of location dependent systematic effects (gradients) of intensity dependent systematic effects (distortions) of local defects of global gradients of global distortions constructing expression values using different algorithms GDE ControllerWorkflow

Gradients: incomplete washing? thermal effects?... ? GDE ControllerGradients

Idea: *) (single chip version) *) developed in discussions with H. Seidel (Schering, Berlin)... divide the chip into 4 x 4 sectors (as for the background determination) look at the feature distribution in each sector, in particular at the mode (maximum position) and the width ln ( counts ) ln ( intensity ) GDE ControllerGradients

In an iterative process, transform the intensities I(x,y) I’(x,y) = a(x,y) I(x,y) + b(x,y) such that the sector histograms become aligned. scale factor a(x,y) in first step: offset b(x,y) in first step: all sector histograms after first step: all sector histograms after third step: GDE ControllerGradients

It was later decided to perform only a multiplicative correction, I(x,y) I’(x,y) = a(x,y) I(x,y) for two reasons: - practical application showed that the scale factor is the dominant effect; - the observable AvgDiff is insensitive to the offset b(x,y). A basic assumption of the ‘single-chip’ version is that the distribution of bright and dark features is random. If this assumption is violated (e.g. for the yeast chip), the ‘single-chip’ version encounters problems. The ‘multi-chip’ version compares the sector histograms not among themselves, but to the sector histograms of a ‘reference chip’. (This is of course only possible if enough ‘similar’ chips are available.) GDE ControllerGradients

Result of Gradient Correction: ‘heat map’ of the scale factor a(x,y) originalcorrected GDE ControllerGradients

Further example of Gradient Correction: ‘heat map’ of the scale factor original corrected GDE ControllerGradients

Distortions: A log-log plot of coding (i.e. PM and MM) features can show a nonlinear relationship when compared to the features of a ‘reference chip’. One of the reasons can be that chips from different chip lots are combined to a series. (Again, the reference chip can only be constructed if enough ‘similar’ chips are available.) GDE ControllerDistortions

Idea: divide the reference signal region into stripes containing the same number of points (red lines) in each stripe, determine the median of experiment signals (or – equivalently – the point of maximum density) force this median line to be the diagonal of the new point cloud; this determines the (intensity dependent) transformation reference experiment GDE ControllerDistortions

Result of Distortion Correction: impossible to correct GDE ControllerDistortions

Reference chip: serves as a ‘virtual standard’ for a given experiment set Both gradient and distortion detection/correction require the concept of a the experiment set should be homogeneous: - chips from the same production lot - probes from the same tissue - a small number of differentially expressed genes - doesn’t change the characteristic pattern the reference chip is computed featurewise (as mean or median) the chips have to be made comparable, for instance with a global logarithmic-mean normalization normalized set reference chip GDE ControllerReference Chip

Local defects: There are local defects which are already visible in a global chip view: Aim: Can we reliably detect smaller local defects, if possible automatically? view of outlier locations: GDE ControllerLocal Defects

Idea: construct a ‘ratio chip’ by dividing each feature by its counterpart on the reference chip for visualisation purposes, show in - green features which are brighter - red features which are darker - black features that don’t change local defects should show up as speckles of homogeneous color, with diameters of at least several features y 00 y 01 y 02 y 10 y 11 y 12 reference x 00 x 01 x 02 x 10 x 11 x 12 experiment x 00 /y x 01 /y 01 x 02 /y 02 x 10 /y 10 x 11 /y 11 x 12 /y 12 ratio chip GDE ControllerLocal Defects

differential regulation actual defects GDE ControllerLocal Defects

This method can identify defects which would be hard to find... GDE ControllerLocal Defects

... or invisible, even in a zoomed view: GDE ControllerLocal Defects

For old (row-wise spotted) chips, there is the danger that differen- tially expressed genes are detected as chip artefacts Application of pattern search algorithms can solve this problem differential regulation GDE ControllerLocal Defects

Further example of a local defect: GDE ControllerLocal Defects

Defects can have a certain spatial extension: GDE ControllerLocal Defects

GDE ControllerLocal Defects Most frequent structures:

GDE ControllerLocal Defects... and others:

An interactive chip viewer allows to - view identified mask areas - zoom and find out which genes - are affected by masking - manually edit the masked areas GDE ControllerLocal Defects

GDE ControllerWorkflow reporting export to database, into analysis software or as.CEL files choose between different condensing algorithms: MAS4, MAS5, GeneData ( = trimmed mean of log(PM) )

replicates:large differential expression: log-log plot: correlation of large values is visible only positive values can be displayed Playing with negative AvgDiff values

replicates:large differential expression: linear-linear plot: Playing with negative AvgDiff values negative values can be displayed poor resolution for small values large values appear scattered

replicates: ‘cube-root’ plot: damping at large values ‘zero density regions’ (artefact) display of positive and negative values y = AvgDiff 3 Playing with negative AvgDiff values

‘lin-log’ transformation: damping of high values interpolates smoothly between linear (for small values) and logarithmic (for large values) behaviour y = sign(x)*ln( 1 + |x| ) sign(x)*ln( 1 + |x| ) = Playing with negative AvgDiff values y = x y = ln(x) x + o(x 3 ), x < x2x ± ln( |x| ) + + o( ), 1 x 1 x 2 < 1 x >> 1 =

replicates:large differential expression: ‘lin-log’ plot: Playing with negative AvgDiff values A good choice is x = AvgDiff / Target, i.e. the target intensity sets the scale Lines of constant factors are shown in blue (2), red (5) and green (10)

Consider the following ‘experiment’: Construct faked.CEL files, where all PM-MM-pairs are interchanged, and condense them with the old Affymetrix algorithm (ignoring AbsCall). Amusing observation: If one ignores that the scale factor gets negative, (MAS doesn’t: “Failed to analyze due to invalid Scale Factor”) the old (MAS4) algorithm would be invariant under PM MM ! Target TrimmedMean(AvgDiff) SF = Playing with negative AvgDiff values The ‘lin-log’ plots allow to look at positive and negative AvgDiff values simultaneously. But why would we want to look at the negatives at all?

perfect group separation: within replicate groups across replicate groups Playing with negative AvgDiff values Original data: the ‘three-tissue-dataset’: 3 groups with 6 replicates each

PM MM data: These are log-log-plots of negative AvgDiffs. The good correlation at high values indicates that these numbers are reproducible. The difference between replica groups is not so obvious, but... Playing with negative AvgDiff values

... clustering again results in a complete group separation: Take-home message: The mismatches carry information which can be measured reproducibly and can be used (at least) for pattern comparisons. Playing with negative AvgDiff values