Using Web-Based Tools for Microarray Analysis Michael Elgart
Outline Introduction to microarrays – why use them and what to expect from their results What are they? Why use them? What types are there? Low level analysis Background correction Normalization Quality control Significance analysis Annotations Functional Analysis: Gene Ontology Promoter Analisys
Outline Introduction to microarrays – why use them and what to expect from their results What are they? Why use them? What types are there? Low level analysis Background correction Normalization Quality control Significance analysis Annotations Functional Analysis: Gene Ontology Promoter Analisys
What is a microarray? A tool for analyzing gene expression that consists of a small membrane or glass slide containing samples of thousands of genes arranged in a regular pattern.
The Boom of Microarray Technology: Number of Publications with Affymetrix Chips 200 400 600 800 1000 1200 Year 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Number of publications
What’s the Point? Large scale (genome-wide) screening Eliminate bias of pre-selecting candidate genes Test multiple hypotheses simultaneously Generate new hypotheses by identifying novel genes associated with experiment Identify novel relationships/patterns among genes
GEO: Public Database Example
Outline Introduction to microarrays – why use them and what to expect from their results What are they? Why use them? What types are there? Low level analysis Background correction Normalization Quality control Significance analysis Annotations Functional Analysis: Gene Ontology Promoter Analisys
What are DNA microarrays? Microarrays are a method of scanning the genome based on an well known property of nucleic acids (hybridization) Complementary strands of DNA/RNA will find each other in solution
Types of DNA Microarray Experiments Some types of experiments that can be done: Measure changes in gene expression RNA hybridizes to DNA Identify genomic gains and losses Genomic DNA hybridizes to DNA Identify mutations in DNA PCR product hybridizes to DNA
Expression Microarray Basics Two parts: Probes: the single stranded DNA molecules on the solid surface Targets: the single stranded labeled population from your experimental source
Microarray Overview Probe
Probe deposition on array Contact printing Ink jet spraying On chip synthesis
Pin Spotting of DNA Arrays Can be automated or manual Relatively cheap but may result in QC issues with spots ~10$ per 100 probe array
Under the microscope
Ink jet spraying
Ink jet sprayed spots on a chip
Affymetrix Will be dealing mainly with this type today, so here is a little more data
On chip synthesis Lithography
Set of probes that identifies a transcript = ProbeSet
Affymetrix: Gene Expression Arrays Transcripts/Genes Arabidopsis Genome 24,000 C. elegans Genome 22,500 Drosophila Genome 18, 500 E. coli Genome 20, 366 Human Genome U133 Plus 47,000 Mouse Genome 39, 000 Yeast Genome 5, 841 (S. cerevisiae) & 5, 031 (S. pombe) Rat Genome 30, 000 Zebrafish 14, 900 Plasmodium/Anopheles 4,300 (P. falciparum) & 14,900 (A. gambiae) Barley (25,500), Soybean (37,500 + 23,300 pathogen), Grape (15,700) Canine (21,700), Bovine (23,000),B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400)
Spots on an Affymetrix chip printed using photolithography
DNA Deposition on Array 2um Taken from Duggan et al, Nature Genetics 21:10
RNA Quality and Quantity 28S rRNA 18S rRNA Degraded sample
Hybridization = expression level The amount of hybridization of RNA to a fragment of DNA representing any gene can be measured if the RNA is labeled with some dye The intensity of hybridization is a surrogate that measures the level of expression of the gene represented by that DNA fragment
Hybridization and Washing of DNA Microarrays Remains one of the most poorly controlled steps in the process Long oligonucleotide probes were designed to standardize the Tms across the slide However, there will be variable efficiency, variable specificity
Slide Scanning Selectable lasers Emission filters with range from 500-700 nm 5 micron resolution Goal is to generate images of the arrays that are used as input for quantitation algorithms
Outline Introduction to microarrays – why use them and what to expect from their results What are they? Why use them? What types are there? Low level analysis Background correction Normalization Quality control Significance analysis Annotations Functional Analysis: Gene Ontology Promoter Analisys
Usually the 75th percentile
Do not use MM data! MAS (3,4,5…) is NOT GOOD Use RMA !!!
Fortunately (?) you don’t do this The result [INTENSITY] NumberCells=4691556 X Y MEAN STDV NPIXELS 0 0 30022.0 4025.9 9 1 0 507.0 48.5 9 2 0 30116.0 4500.7 9 3 0 602.0 97.3 9 4 0 339.0 36.3 9 5 0 491.0 59.1 9 6 0 29208.0 3090.8 9 7 0 877.0 126.0 9 8 0 28683.0 4069.2 9 9 0 645.0 63.6 9 10 0 28536.0 3462.7 9 11 0 473.0 100.5 9 12 0 29509.0 4287.0 9 13 0 667.0 83.2 9 [CEL] Version=3 [HEADER] Cols=2166 Rows=2166 TotalX=2166 TotalY=2166 OffsetX=0 OffsetY=0 GridCornerUL=623 408 GridCornerUR=16090 586 GridCornerLR=15932 15984 GridCornerLL=464 15807 .
So can we just use the data now? Not quite…
Sources of Microarray Data Variability Biological variability in the population No good solution here… At an experimental level, there is variability between preparations and labelling of the sample, variability between hybridisations of the same sample to different arrays, and variability between the signal on replicate features on the same array. Variability between Individuals True gene expression of individual Variability between sample preparations Variability between arrays and hybridisations Variability between replicate features Measured gene expression Expression values in 2 replicas will be different! Can we handle it? 39
Normalization Deals with the fact that the results from identical experiments on two identical microarrays will never be exactly the same. In addition to unavoidable random errors there are also systematic differences caused by: Different incorporation efficiencies of dyes. For example, green colored markers are stronger then red ones (measured as stronger illumination) creating a bias between experiments done with green and red markers. Different amounts of mRNA in the tested sample, causing different expression levels. Difference in experimenter or protocol. Different scanning parameters Differences between chips created in different production batches.
Quantile Normalization Intensity distributions are adjusted to be equivalent Scaling to a target intensity sets the mean signal intensity to the defined value 500 Probe Intensity Probe Intensity Number of Probes Number of Probes
Background Correction Different GC content of probes Location on Chip Effect etc. All this need to be compensated for. The algorythm to do it is RMA
Correct Experimental Design Tree representation of replicate experiments: The first level is at the level of biological replicates This is followed by two independent mRNA extractions In each microarray experiment, each gene (each probe or probe set) is really a separate experiment in its own right Biological Replicates Experiment Replicate 1 Replicate 2 Technical Replicates Extract 1 Extract 2 “We need normalization to be able to look at the biological differences between samples and not technical ones” Elgart M. 43
Reproducibility How big is the difference between sample that was twice hybridized on same type of array? If we look at technical replicas, what do we expect to see?
Summary Statistics Correlation (>2x Diffl Only) % Agree on All using only Top 10,000 brightest probes Correlation (>2x Diffl Only) Red = In Replicates % Agree on 2x Diff’l
Set of probes that identifies a transcript = ProbeSet If all 10 probes give high signal in Treatment and low in Control then all’s well. But what if only 6 of 10 are “positive”? How do we decide whether this gene is expressed?
Set of probes that identifies a transcript = ProbeSet If all 10 probes give high signal in Treatment and low in Control then all’s well. But what if only 6 of 10 are “positive”? How do we decide whether this gene is expressed?
Is this a “hands-on” thing ? Yes. Example :
49
Outline Background correction Normalization Quality control Introduction to microarrays – why use them and what to expect from their results What are they? Why use them? What types are there? Low level analysis Background correction Normalization Quality control Significance analysis Annotations Functional Analysis: Gene Ontology Promoter Analisys