Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16

Microarray data Image quantitation. Normalization Find genes with significant expression differences Annotation Clustering, pattern analysis, network analysis

Sources of Non-Biological Variation Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation Differences in the amount of labeled cDNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide.) Variation across replicate slides Variation across hybridization conditions Variation in scanning conditions Variation among technicians doing the lab work.

Factors which impact on the signal level Amount of mRNA Labeling efficiencies Quality of the RNA Laser/dye combination Detection efficiency of photomultiplier or CCD

Hela HepG2

A = (Log Green + Log Red) / 2 M = Log (Red - Log Green M vs. A Plot

M v A plots of chip pairs: before normalization

M v A plots of chip pairs: after quantile normalization

Types of normalization To total signal (linear normalization) LOESS (LOcally WEighted polynomial regreSSion). To “house keeping genes” To genomic DNA spots (Research Genetics) or mixed cDNA’s To internal spikes

Microarray analysis Data exploration: expression of gene X? Statistical analysis: which genes show large, reproducible changes? Clustering: grouping genes by expression pattern. Knowledge-based analysis: Are amine synthesis genes involved in this experiment?

Hela HepG2 Fold change: the crudest method of finding differentially expressed genes >2-fold expression change

What do we mean by differentially expressed? Statistically, our gene is different from the other genes. Number of genes Log ratio Distribution of average ratios for all genes Probability of a given Value of the ratio Distribution of measurements for gene of interest

Finding differentially expressed genes What affects our certainty that a gene is up or down-regulated? Number of sample points Difference in means Standard deviations of sample Sample A Sample B Probe Signal

Practical views on statistics With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns. Sensitivity and selectivity are inversely related - e.g. increased selection of true positives WILL result in more false positive and less false negatives. False negatives are lost opportunities, false positives cost $’s and waste time. A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow - so use conservative statistics to protect against false positives when designing follow-on experiments.

Statistical Tests Student’s t-test –Correct for multiple testing! (Holm-Bonferroni) False discovery rate. Significance Analysis of Microarrays (SAM) –http://www-stat.stanford.edu/~tibs/SAM/ ANOVA Principal components analysis Special methods for periodic patterns in data.

Volcano plot: log(expr) vs p-value Log(fold change) p-value

Scatter plot showing genes with significant p-values

Pattern finding In many cases, the patterns of differential expression are the target (as opposed to specific genes) –Clustering or other approaches for pattern identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes –Classification - identify genes which best distinguish 2 or more classes. The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e.g. cluster analysis of random noise will produce clusters which will be meaningless….

What is clustering? Group similar objects together. –Genes with similar expression patterns. Objects in the same cluster (group) are more similar to each other than objects in different clusters.

Clustering What is clustering? Similarity/distance metrics Hierarchical clustering algorithms –Made popular by Stanford, ie. [Eisen et al. 1998] K-means –Made popular by many groups, eg. [Tavazoie et al. 1999] Self-organizing map (SOM) –Made popular by Whitehead, ie. [Tamayo et al. 1999]

Typical Tools SAM (Significance Analysis of Microarrays), Stanford GeneSpring Affymetrix GeneChip Operating System (GCOS) Cluster/Treeview R statistics package microarray analysis libraries.

How to define similarity? Similarity metric: –A measure of pairwise similarity or dissimilarity –Examples: Correlation coefficient Euclidean distance Experiments genes X Y X Y Raw matrixSimilarity matrix 1 n 1p n n

Similarity metrics Euclidean distance Correlation coefficient Euclidean clustering = magnitude & Direction Correlation clustering = direction

Sporulation-example

Self-organizing maps (SOM) [Kohonen 1995] Basic idea: –map high dimensional data onto a 2D grid of nodes –Neighboring nodes are more similar than points far away

Self-organizing maps (SOM)

SOM Clusters

Things learned from from microarray gene expression experiments Pathways not known to be involved –Ontology? Novel genes involved in a known pathway “like” and “unlike” tissues

Transcription Factors Regulatory Networks Identify co-regulated genes Search for common motifs (transcription factor binding sites) –Evaluate known motifs/factors –Search for new ones. Programs: MEME, etc.

mRNA-protein Correlation YPD: should have relevant data –will yeast be typical? Electrophoresis 18:533 –23 proteins on 2D gels –r=0.48 for mRNA=protein Post transcriptional and post translational regulation important!

Other microarray formats Single nucleotide polymorphism (SNP) chips –Oligos with each of 4 nt at each SNP. Chromosomal IP chips (ChIP:chip) –Determine transcription factor binding sites –Promoter DNA on the chip. Alternative splicing chips –Long oligos, covering alternatively spliced exons, or all exons. Genome tiling chips

ChIP:chip--Identification of Transcription Factor Binding Sites Cross link transcription factors to DNA with formaldehyde Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e.g GST fusion). Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA. Brown et.al. (2001) Nature, 409(533-8)

ChIP:chip Analysis of TF Binding Sites

On to Proteomics DNA  RNA  Protein

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Similar presentations

Presentation on theme: "Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Similar presentations

Presentation on theme: "Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16."— Presentation transcript:

Similar presentations

About project

Feedback