Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Similar presentations


Presentation on theme: "Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16."— Presentation transcript:

1 Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16

2 Microarray data Image quantitation. Normalization Find genes with significant expression differences Annotation Clustering, pattern analysis, network analysis

3 Sources of Non-Biological Variation Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation Differences in the amount of labeled cDNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide.) Variation across replicate slides Variation across hybridization conditions Variation in scanning conditions Variation among technicians doing the lab work.

4 Factors which impact on the signal level Amount of mRNA Labeling efficiencies Quality of the RNA Laser/dye combination Detection efficiency of photomultiplier or CCD

5 Hela HepG2

6 Hela HepG2

7 A = (Log Green + Log Red) / 2 M = Log (Red - Log Green M vs. A Plot

8 M v A plots of chip pairs: before normalization

9 M v A plots of chip pairs: after quantile normalization

10 Types of normalization To total signal (linear normalization) LOESS (LOcally WEighted polynomial regreSSion). To “house keeping genes” To genomic DNA spots (Research Genetics) or mixed cDNA’s To internal spikes

11 Microarray analysis Data exploration: expression of gene X? Statistical analysis: which genes show large, reproducible changes? Clustering: grouping genes by expression pattern. Knowledge-based analysis: Are amine synthesis genes involved in this experiment?

12 Hela HepG2 Fold change: the crudest method of finding differentially expressed genes >2-fold expression change

13 What do we mean by differentially expressed? Statistically, our gene is different from the other genes. Number of genes Log ratio Distribution of average ratios for all genes Probability of a given Value of the ratio Distribution of measurements for gene of interest

14 Finding differentially expressed genes What affects our certainty that a gene is up or down-regulated? Number of sample points Difference in means Standard deviations of sample Sample A Sample B Probe Signal

15 Practical views on statistics With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns. Sensitivity and selectivity are inversely related - e.g. increased selection of true positives WILL result in more false positive and less false negatives. False negatives are lost opportunities, false positives cost $’s and waste time. A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow - so use conservative statistics to protect against false positives when designing follow-on experiments.

16 Statistical Tests Student’s t-test –Correct for multiple testing! (Holm-Bonferroni) False discovery rate. Significance Analysis of Microarrays (SAM) –http://www-stat.stanford.edu/~tibs/SAM/ ANOVA Principal components analysis Special methods for periodic patterns in data.

17 Volcano plot: log(expr) vs p-value Log(fold change) p-value

18 Scatter plot showing genes with significant p-values

19 Pattern finding In many cases, the patterns of differential expression are the target (as opposed to specific genes) –Clustering or other approaches for pattern identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes –Classification - identify genes which best distinguish 2 or more classes. The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e.g. cluster analysis of random noise will produce clusters which will be meaningless….

20 What is clustering? Group similar objects together. –Genes with similar expression patterns. Objects in the same cluster (group) are more similar to each other than objects in different clusters.

21 Clustering What is clustering? Similarity/distance metrics Hierarchical clustering algorithms –Made popular by Stanford, ie. [Eisen et al. 1998] K-means –Made popular by many groups, eg. [Tavazoie et al. 1999] Self-organizing map (SOM) –Made popular by Whitehead, ie. [Tamayo et al. 1999]

22 Typical Tools SAM (Significance Analysis of Microarrays), Stanford GeneSpring Affymetrix GeneChip Operating System (GCOS) Cluster/Treeview R statistics package microarray analysis libraries.

23 How to define similarity? Similarity metric: –A measure of pairwise similarity or dissimilarity –Examples: Correlation coefficient Euclidean distance Experiments genes X Y X Y Raw matrixSimilarity matrix 1 n 1p n n

24 Similarity metrics Euclidean distance Correlation coefficient Euclidean clustering = magnitude & Direction Correlation clustering = direction

25 Sporulation-example

26

27 Self-organizing maps (SOM) [Kohonen 1995] Basic idea: –map high dimensional data onto a 2D grid of nodes –Neighboring nodes are more similar than points far away

28 Self-organizing maps (SOM)

29 SOM Clusters

30 Things learned from from microarray gene expression experiments Pathways not known to be involved –Ontology? Novel genes involved in a known pathway “like” and “unlike” tissues

31 Transcription Factors Regulatory Networks Identify co-regulated genes Search for common motifs (transcription factor binding sites) –Evaluate known motifs/factors –Search for new ones. Programs: MEME, etc.

32 mRNA-protein Correlation YPD: should have relevant data –will yeast be typical? Electrophoresis 18:533 –23 proteins on 2D gels –r=0.48 for mRNA=protein Post transcriptional and post translational regulation important!

33 Other microarray formats Single nucleotide polymorphism (SNP) chips –Oligos with each of 4 nt at each SNP. Chromosomal IP chips (ChIP:chip) –Determine transcription factor binding sites –Promoter DNA on the chip. Alternative splicing chips –Long oligos, covering alternatively spliced exons, or all exons. Genome tiling chips

34 ChIP:chip--Identification of Transcription Factor Binding Sites Cross link transcription factors to DNA with formaldehyde Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e.g GST fusion). Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA. Brown et.al. (2001) Nature, 409(533-8)

35 ChIP:chip Analysis of TF Binding Sites

36 On to Proteomics DNA  RNA  Protein


Download ppt "Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16."

Similar presentations


Ads by Google