Download presentation
Presentation is loading. Please wait.
Published byMillicent Jordan Modified over 9 years ago
1
Gene Expression Microarrays Microarray Normalization Stat 115 2012
2
Outline Gene expression microarrays –Differential ExpressionDifferential Expression –Spotted cDNA and oligonucleotide arraysSpotted cDNA and oligonucleotide arrays Microarray normalization methods –Median scaling, Lowess, and QnormMedian scaling, Lowess, and Qnorm –MA plotsMA plots Microarray databases 2
3
Central Dogma of Molecular Biology DNA replication DNA RNA Transcription Physiology Folded with function Protein Translation Reverse transcription 3
4
Imagine a Chef Restaurant DinnerHome Lunch Certain recipes used to make certain dishes 4
5
Each Cell Is Like a Chef 5
6
Infant Skin Adult Liver Glucose, Oxygen, Amino Acid Fat, Alcohol Nicotine Healthy Skin Cell State Disease Liver Cell State Certain genes expressed to make certain proteins 6
7
Differential Expression Understand the transcription level of gene(s) under different conditions –Cell types (brain vs. liver) –Developmental (fetal vs. adult) –Response to stimulus (rich vs poor media) –Gene activity (wild type vs. mutant) –Disease states (healthy vs. diseased) 7
8
High Throughput Measures of Gene Expression Measure gene expression: quasi-estimate of the protein level and cell state High throughput: measure mRNA level of all the genes in the genome together Checking what the chef is making in many different situations Different microarrays: –Spotted cDNA microarrays –oligonucleotide arrays 8
9
Microarrays Grow cells at certain condition, collect mRNA population, and label them Microarray has high density sequence specific probes with known location for each gene/RNA Sample hybridized to microarray probes by DNA (A-T, G-C) base pairing, wash non- specific binding Measure sample mRNA value by checking labeled signals at each probe location 9
10
Spotted cDNA Arrays Pat Brown Lab, Stanford University Robotic spotting of cDNA (mRNA converted back to DNA, no introns) Several thousands of probes / array One long probe per gene 10
11
Spotted cDNA Arrays Competing hybridization –Control –Treatment Detection –Green: high control –Red: high treatment –Yellow: equally high –Black: equally low 11
12
Why Competing Hybridization? DNA concentration in probes not the same, probes not spotted evenly 12
13
cDNA Microarray Readout Result often viewed with Excel or wordpad 13
14
Oligonucleotide Arrays GeneChip® by Affymetrix Parallel synthesis of oligonucleotide probes (25- mer) on a slide using photolithographic methods Millions of probes / microarray Multiple probes per gene One-color arrays 14
15
Affymetrix GeneChip Probes 15
16
Labeled Samples Hybridize to DNA Probes on GeneChip 16
17
Shining Laser Light Causes Tagged Fragments to Glow 17
18
Perfect Match (PM) vs MisMatch (MM) (control for cross hybridization) 18
19
Affymetrix Microarray Imagine Analysis Gridding: based on spike-in DNA Affymetrix GeneChip Operating System (GCOS) –cel file XYMEANSTDV NPIXELS 701523311.076.5 16 70252348.010.5 16 –cdf file Which probe at (X,Y) corresponds to which probe sequence and targeted transcript MM probes always (X,Y+1) PM 19
20
Array Platform Comparisons cDNA microarrays: –Two-color assay, comparative hybridization –Cheaper ($50-$200 / chip) –Flexibility of custom-made array: do not need whole sequence Oligonucleotide GeneChip: –One-color assay, absolute expression level –A little more expensive ($200-500 / chip) –Automated: better quality control, less variability –Easier to compare results from different experiments Many more commercial array platforms –Agilent, ABI, Amgen, NimbleGen… –Some use long oligo probes: 30-70 nt 20
21
Experimental Design Issues Replicates: always preferred Biological replicates: repetition of the experiment prior to extracting mRNA –Multiple cell conditions & individuals Technical replicates: repetition of experimental conditions after mRNA extraction –Include reverse transcription, probe labeling, and hybridization 21
22
Normalization Try to preserve biological variation and minimize experimental variation, so different experiments can be compared Consideration: scale, dye bias, location bias, probe bias, … Assumption: most genes / probes don’t change between two conditions Normalization can have larger effect on analysis than downstream steps (e.g. group comparisons) 22
23
Dye Swap in cDNA Microarrays Cy5, Cy3 dyes do not label equally –log 2 R/G -> log 2 R TRUTH /G TRUTH - c So swap the dyes in a replicate experiment, ideally Combine by subtract the normalized log-ratios: [ (log 2 (R/G) - c) - (log 2 (R’/G’) - c’) ] / 2 [ log 2 (R/G) + (log 2 (G’/R’) ] / 2 [ log 2 (RG’/GR’) ] / 2 23
24
Median Scaling Linear scaling –Ensure the different arrays have the same median value and same dynamic range –X' = (X – c 1 ) * c 2 array2 array1 24
25
Loess LOcally WEighted Scatterplot Smoothing Fit a smooth curve –Use robust local linear fits –Effectively applies different scaling factors at different intensity levels –Y = f(X) –Transform X to X' = f(X) –Y and X' are comparable 25
26
Reference for Normalization Need to pick one reference sample –“Middle” chip: median of median –Pooled reference RNA sample –Selection of baseline chip influences the results Need to pick a subset of genes to estimate the scaling factor or smooth curve –Housekeeping genes: present at constant levels –Invariant rank: If a gene is not differentially expressed, its rank in the two arrays (or colors) should be similar 26
27
Quantile Normalization Probes ExperimentsMean Bolstad et al Bioinformatics 2003 –Currently considered the best normalization method –Assume most of the probes/genes don’t change between samples Calculate mean for each quantile and reassign each probe by the quantile mean No experiment retain value, but all experiments have exact same distribution 27
28
Dilution Series RNA sample in 5 different concentrations 5 replicates scanned on 5 different scanners Before and after quantile normalization 28
29
Normalization Quality Check MA Plot log 2 R vs log 2 G Values should be on diagonal M=log 2 R- log 2 G A=(log 2 R+log 2 G)/2 Values should scatter around 0 29
30
Before Normalization Pairwise MA plot for 5 arrays, probe (PM) 30
31
After Normalization Pairwise MA plot for 5 arrays, probe (PM) 31
32
Public Microarray Databases SMD: Stanford Microarray Database, most Stanford and collaborators’ cDNA arraysSMD GEO: Gene Expression Omnibus, a NCBI repository for gene expression and hybridization data, growing quickly.GEO Oncomine: Cancer Microarray DatabaseOncomine –Published cancer related microarrays –Raw data all processed, nice interface 32
33
Homework How many data series are there on GEO with Affymetrix gene expression profiles of –Human breasts –Human prostates –Human brains –Mouse liver –Just the numbers Which series have > 10 samples –Use the DataSet Browser format 33
34
Acknowledgment Terry Speed, Rafael Irizarry & group Kevin Coombes & Keith Baggerly Erick Rouchka Wing Wong & Cheng Li Mark Reimers Erin Conlon Larry Hunter Zhijin Wu Wei Li 34
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.