Download presentation
Presentation is loading. Please wait.
Published byMarybeth Price Modified over 9 years ago
1
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16
2
Microarray data Image quantitation. Normalization Find genes with significant expression differences Annotation Clustering, pattern analysis, network analysis
3
Sources of Non-Biological Variation Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation Differences in the amount of labeled cDNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide.) Variation across replicate slides Variation across hybridization conditions Variation in scanning conditions Variation among technicians doing the lab work.
4
Factors which impact on the signal level Amount of mRNA Labeling efficiencies Quality of the RNA Laser/dye combination Detection efficiency of photomultiplier or CCD
5
Hela HepG2
6
Hela HepG2
7
A = (Log Green + Log Red) / 2 M = Log (Red - Log Green M vs. A Plot
8
M v A plots of chip pairs: before normalization
9
M v A plots of chip pairs: after quantile normalization
10
Types of normalization To total signal (linear normalization) LOESS (LOcally WEighted polynomial regreSSion). To “house keeping genes” To genomic DNA spots (Research Genetics) or mixed cDNA’s To internal spikes
11
Microarray analysis Data exploration: expression of gene X? Statistical analysis: which genes show large, reproducible changes? Clustering: grouping genes by expression pattern. Knowledge-based analysis: Are amine synthesis genes involved in this experiment?
12
Hela HepG2 Fold change: the crudest method of finding differentially expressed genes >2-fold expression change
13
What do we mean by differentially expressed? Statistically, our gene is different from the other genes. Number of genes Log ratio Distribution of average ratios for all genes Probability of a given Value of the ratio Distribution of measurements for gene of interest
14
Finding differentially expressed genes What affects our certainty that a gene is up or down-regulated? Number of sample points Difference in means Standard deviations of sample Sample A Sample B Probe Signal
15
Practical views on statistics With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns. Sensitivity and selectivity are inversely related - e.g. increased selection of true positives WILL result in more false positive and less false negatives. False negatives are lost opportunities, false positives cost $’s and waste time. A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow - so use conservative statistics to protect against false positives when designing follow-on experiments.
16
Statistical Tests Student’s t-test –Correct for multiple testing! (Holm-Bonferroni) False discovery rate. Significance Analysis of Microarrays (SAM) –http://www-stat.stanford.edu/~tibs/SAM/ ANOVA Principal components analysis Special methods for periodic patterns in data.
17
Volcano plot: log(expr) vs p-value Log(fold change) p-value
18
Scatter plot showing genes with significant p-values
19
Pattern finding In many cases, the patterns of differential expression are the target (as opposed to specific genes) –Clustering or other approaches for pattern identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes –Classification - identify genes which best distinguish 2 or more classes. The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e.g. cluster analysis of random noise will produce clusters which will be meaningless….
20
What is clustering? Group similar objects together. –Genes with similar expression patterns. Objects in the same cluster (group) are more similar to each other than objects in different clusters.
21
Clustering What is clustering? Similarity/distance metrics Hierarchical clustering algorithms –Made popular by Stanford, ie. [Eisen et al. 1998] K-means –Made popular by many groups, eg. [Tavazoie et al. 1999] Self-organizing map (SOM) –Made popular by Whitehead, ie. [Tamayo et al. 1999]
22
Typical Tools SAM (Significance Analysis of Microarrays), Stanford GeneSpring Affymetrix GeneChip Operating System (GCOS) Cluster/Treeview R statistics package microarray analysis libraries.
23
How to define similarity? Similarity metric: –A measure of pairwise similarity or dissimilarity –Examples: Correlation coefficient Euclidean distance Experiments genes X Y X Y Raw matrixSimilarity matrix 1 n 1p n n
24
Similarity metrics Euclidean distance Correlation coefficient Euclidean clustering = magnitude & Direction Correlation clustering = direction
25
Sporulation-example
27
Self-organizing maps (SOM) [Kohonen 1995] Basic idea: –map high dimensional data onto a 2D grid of nodes –Neighboring nodes are more similar than points far away
28
Self-organizing maps (SOM)
29
SOM Clusters
30
Things learned from from microarray gene expression experiments Pathways not known to be involved –Ontology? Novel genes involved in a known pathway “like” and “unlike” tissues
31
Transcription Factors Regulatory Networks Identify co-regulated genes Search for common motifs (transcription factor binding sites) –Evaluate known motifs/factors –Search for new ones. Programs: MEME, etc.
32
mRNA-protein Correlation YPD: should have relevant data –will yeast be typical? Electrophoresis 18:533 –23 proteins on 2D gels –r=0.48 for mRNA=protein Post transcriptional and post translational regulation important!
33
Other microarray formats Single nucleotide polymorphism (SNP) chips –Oligos with each of 4 nt at each SNP. Chromosomal IP chips (ChIP:chip) –Determine transcription factor binding sites –Promoter DNA on the chip. Alternative splicing chips –Long oligos, covering alternatively spliced exons, or all exons. Genome tiling chips
34
ChIP:chip--Identification of Transcription Factor Binding Sites Cross link transcription factors to DNA with formaldehyde Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e.g GST fusion). Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA. Brown et.al. (2001) Nature, 409(533-8)
35
ChIP:chip Analysis of TF Binding Sites
36
On to Proteomics DNA RNA Protein
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.