Lecture 3 From Images to Data

Slides:



Advertisements
Similar presentations
NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.
Advertisements

Introduction to Microarray Gene Expression
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Introduction to Affymetrix Microarrays
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Getting the numbers comparable
DNA microarray and array data analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Differentially expressed genes
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
1 Preprocessing for Affymetrix GeneChip Data 1/18/2011 Copyright © 2011 Dan Nettleton.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Felix Naef & Marcelo Magnasco, GL meeting, Nov Outline Background subtraction Probeset statistics Excursions into.
Introduce to Microarray
GeneChips and Microarray Expression Data
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
and analysis of gene transcription
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Preprocessing
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
with an emphasis on DNA microarrays
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
CDNA Microarrays MB206.
Data Type 1: Microarrays
Panu Somervuo, March 19, cDNA microarrays.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Microarray - Leukemia vs. normal GeneChip System.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
GeneChip® Probe Arrays
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Microarray Data Analysis The Bioinformatics side of the bench.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Oligonucleotide Microarray Technology
Other uses of DNA microarrays
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
Microarray - Leukemia vs. normal GeneChip System.
Functional Genomics in Evolutionary Research
Normalization Methods for Two-Color Microarray Data
Introduction to cDNA Microarray Technology
Gene Expression Arrays
The Basics of Microarray Image Processing
Introduction to Microarrays.
Getting the numbers comparable
Normalization for cDNA Microarray Data
Microarray Data Analysis
Non-parametric methods in statistical testing
Data Type 1: Microarrays
Pre-processing AFFY data
Presentation transcript:

Lecture 3 From Images to Data A lot of this is not so relevant NOW, but I think its good basic knowledge.

Recap of: How Microarrays are used to measure gene expression Basic idea: measure the activity level of a gene (its expression level), in a particular cell at a particular time, by measuring the concentration of that gene’s mRNA transcript in the cell’s total RNA. Immobilize DNA probes (oligos, cDNA) onto glass Hybridize labelled target mRNA (in reality cDNA equivalent) with probes on glass, Measure how much binds to each probe (i.e. forms ds DNA). (In two-channel arrays, equal amounts of two differently labelled target cDNAs are hybridized to the probes.) Recall:Dyes are chosen to have different peak emission wavelengths, Cy3’s is 532 nm and Cy5’s is 635 nm.

What is measured in Microarrays? We measure the amount of labeled target cDNA which is bound to the immobilized probe by exciting the labeling molecules (the dye) with a laser, and collecting and counting the photons emitted. In practice the entire glass slide or chip is scanned (excitation emission counting of emitted photons), and the result is a digital image.This then needs to be processed to locate the probes in the image and assign intensity measurements to each of them. There can be from hundreds to millions of different probes on each slide/chip.

The AFFY Chip

Affymetrix GeneChips® Probes = 25 bp sequences Probe Sets = set of probes corresponding to a particular gene or EST. In the past there has been 20 probes/probe set on human chips, 16 on mouse, while there are 11 on Human GeneChips® HG-U133A. Most genes or ESTs contain one probe set, but quite a few have > 1.

DATA AND NOTATION PMijg, MMijg: Intensity for perfect match and mismatch probe in cell j for gene g in chip I i=1…n: From one to hundreds of chips J=1…J: from 16-20 probe pairs g=1…G: from 8000-35000 probe sets Compute SIGNAL/ Expression measure

DATA AND NOTATION PROBE SET 1 PROBE SET 2 Probe Cell Probe Pair

Expression Value Calculation (Signal) The signal represents the amount of transcript in solution Signal is calculated as follows (in brief): - Cell intensities are preprocessed for global background - An ideal mismatch value is calculated and subtracted to adjust PM intensity - The adjusted PM intensities are log transformed to stabilize the variance - The Tukey’s biweight estimator is used to provide a robust mean of the signal - Signal is output as the antilog of the mean signal value - Finally the signal is scaled to generated a normalized data

DETECTION ALGORITHM BACKGROUND: Average of the lowest 2% of the intensities subtracted.

Algorithm First calculate R: ability to detect intended target for each probe R = (PM-MM)/(PM+MM) R near 1 means PM>>MM R near or below 0 means PM <= MM Define: t (default 0.015) as cutoff for R to be “present” for each probe pair

Algorithm contd… Calculate (R-t) for each Probe. Rank Probes according to their (R-t) Values Apply Wilcoxin’s Sign test (non-parametric) to generate the detection p-value.

Wilcoxon Signed Rank Test To test if the median of a distribution q >,<, ≠, q0. Non-parametric equivalent of one sample mean problem. Model: yi = q + ei Procedure for greater than 0 alternative. Subtract q0 from the yi as zi=yi-q0 Calculate absolute values |zi| and define yi = 1 if zi is positive and 0 otherwise Rank the absolute values, Ri Test Statistic, sum of positive ranks, S= Find the corresponding p-value P(S > s) and reject if p-value is small.

Calculating p-values Logic: Lets find the distribution of the Test Statistic. If there are n observations the total number of possible configurations for the ranks is 2n For n=8, there are 256 possible outcomes All positive, S=1+2+…+8 =36 One negative: ( 8 options) with S=35,…,28 Two negative: (28 options) S= 33,…,1 And so on All we need are the extreme outcomes and see how extreme our test statistic is.

Discrimination Score [R] 80 PM MM 10 100 Increasing Tau: reduces false positives but also reduces the number of present calls 1 R t -0.2 MM Intensity/probe pair

Detection Call Detection Call is based on p-value cut offs: Alpha1 and Alpha2 provide boundaries for P,M,A calls Default: a1=0.04, a2=0.06 p<a1: P, p>a2: A, intermediate: M a1 a2 P M A 0.04 0.06 1.00 0.00

Example: s=35 P(S>35)= 1/256=.003 PRESENT PM MM R Zi= R-.15 |Zi| Rank Pos or not 61215.0 283.3 .992 0.842 5.5 1 39000.8 40252.0 -.02 -.170 .170 61246.0 239.0 .842 60345.0 286.0 .991 .841 4 59293.0 190.8 .994 .844 8 54310.5 6314.0 .792 .642 2 50324.8 265.0 .990 .840 3 62199.3 218.0 .993 .843 7 s=35 P(S>35)= 1/256=.003 PRESENT

SIGNAL Calculated using One –step Tukey’s Biweight estimate: robust weighted mean, insensitive to outliers One STEP Tukey’s Biweight Algorithm Let data be xi, let m represent the median of the data. Calculate: Median Absolute Deviation (MAD)= Med |x-m| Ui = (xi –m)/(cMAD+e) Here c is the tuning constant (set at 5), e=.0001 (so that we don’t have division by zero)

Tukey Bi-weight Weights are calculated as: Tukey-Biweight is:

Comments on Biweight Generally used for multiple interations, but here we are using just one iteration of it. Supposedly very robust as an estimator.

Signal Calculation Signal= Tukey Biweight{log(PMj-IM)j} IM= Idealized mismatch which is never greater than PM If MM<PM them IM=MM If MM>PM, then use IM Calculate SB (Specific Background) SB = [ Tbi( log2 (PM) – log2 (MM)) ]

Signal Calculation What is the Idealized Mismatch (IM)? According to Affymetrix the reason for including the MM probe is to provide a value that comprises of most of the cross-hybridizations and stray signals affecting the PM probe. It does contain a portion of the true signal. If MM is less than PM then it can be directly used. If not we calculate the IM. To do so, first calculate the Specific Background (SB) for each probe pair in a probe set:

Specific Background Calculations Calculate y=log2 (PM/MM) Find Median(y) = m Calculate MAD=Median|y-m| Define u= (y-m)/(c*MAD+e) with c=5 and e=.0001 Define w= (1-u2)2 if |u| ≤1, 0 otherwise Tb(y) = Syiwi/ Swi

Example: Specific Background calculations

Signal Calculation First we need to define Idealized Mismatch By default t=.03 v=10

Signal Calculation Contd Vij=Max(PMij-IMij, d) The d is a small positive constant. Signal=Tbi(log2(Vij)) Keep in mind here we are SUBTRACTING IM from PM and not taking a ratio as we did for SB.

Example for Signal Calculation Signal=58611.65, P

Program for Signal Calculation for PM and MM data. Have the data in a csv file called signal1.csv setwd("/myRfolder") hwdata<-read.table(“signal1.csv",header=TRUE,sep=",",na.strings=" ") #SB calculation p1<-hwdata$PM m1<-hwdata$MM y<-log2(p1/m1) md<-median(y) z<-abs(y-md) mz<-median(z) u<-(y-md)/(5*mz+.0001) w<-ifelse(abs(u)<=1,(1-u^2)^2,0) sb=sum(y*w)/sum(w)  sb

#signal calculation begins IM<-ifelse(m1>p1,p1/(2^sb),m1) i1<-IM ys<-log2(p1-i1) ms<-median(ys) zs<-abs(ys-ms) ss<-median(zs) us<-(ys-ms)/(5*ss+.0001) ws<-ifelse(abs(us)<=1,(1-us^2)^2,0) tbs<-sum(ws*ys)/sum(ws) sgnl<-2^tbs

AFFY: SINGLE CHIP EXPRESSION

AFFY: TREATMENT COMPARISON: TWO CHIPS