GeneChips and Microarray Expression Data David Paoletti
The Problem Determine gene expression (activity) What proteins are being produced by a group of cells?
The Assumption The RNA present in the cell determines what proteins are being produced Efficiency
The Why Understanding Toxicology Drug design Evaluation Specificity Response
What is a GeneChip? 1.28 x 1.28 cm glass wafer 500,000 features 24 x 24 m probe site 25 mer oligo, complementary PM: perfect match MM: mismatch 2.5 M copies GeneChip
The Solution
The Gains Speed Possibility Sensitivity Reproducibility
The Process Poly-A RNA Biotin-labeled Antisense cRNA Cells cDNA IVT L AAAA Fragment (heat, Mg2+) Labeled fragments Hybridize Wash/stain Scan L L L
Hybridization and Staining GeneChip Biotin Labeled cRNA Hybridized Array L L L L + L L L + L L L L SAPE Streptavidin- phycoerythrin L
Specialized Equipment
How Features Are Chosen 5’ Gene Sequence 3’ Multiple oligo probes 25 mers Perfect Match Mismatch
Feature Values Remove outermost rows and columns 83 112 96 32 47 382 165 87 55 246 140 93 104 552 187 65 Remove outermost rows and columns Find 75th percentile of remaining values This value is taken as representative of this feature
Background Noise Removal The array is divided into 16 equal sectors For each sector Find the lowest 2% of the feature intensities Average these Subtract this average from the intensity value of all features in the sector
Noise Calculation
Average Difference Intensity For a given gene For each probe pair for the given gene Calculate the difference PM-MM Calculate , for this set If abs( (PM – MM) - ) 3, delete from set Remaining set is pairs in avg
Positive & Negative Probe Pairs PM-MM SDT PM/MM SRT MM-PM SDT MM/PM SRT If both true, mark as positive If both true, mark as negative SDT = Q · STDmult By default, SRT = 1.5, STDmult = 2.0 (low density), 4.0 (high)
Voting Methods for Absolute Call Positive/negative ratio PNR = #pos / #neg Positive fraction PF = #pos / #used Log average ratio
Decision Matrix Absent Marginal Present PNR 3.00 4.00 PF 0.33 0.43 LA 0.90 1.30
Average Difference and Absolute Call Which of these do you base a decision on, for whether a gene is being expressed? Use the absolute call for decision Use average difference to compare those which are present
Conclusions Incredible amalgam of biological and computational processes Allows analyses that would not be performed otherwise Already of proven worth
References Moore, S K; Making chips to probe genes, IEEE Spectrum, March 2001, 54-60. GeneChip Gene Expression Algorithm Training, Part I: Absolute Analysis; Affymetrix. Berberich, S, and McGorry, M; GeneChip protocols; Wright State University.