Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.

Similar presentations


Presentation on theme: "DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample."— Presentation transcript:

1 DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample. Microarray data analysis employs mathematical tools that have been established to facilitate – i)Cluster analysis ii)Principal component analysis iii)Some approaches to reduce highly dimensional data to a useful form The main questions that microarray data analysis seeks to answer are as follows- i)For a comparison of two conditions (cell lines treated with or without a drug) ii)For comparisons across multiple conditions (analyzing gene expressions from normal and diseased individuals) iii)To cluster data as a function of sample and/or a function of genes DNA microarray

2 Scatter plot provides one of the most basic ways of analyzing gene expression data from microarray experiments. This shows the comparison of gene expression values for two samples. Most data points typically fall on a 45 o line, but genes that are up or down regulated fall off the line. The scatter plot rapidly displays which genes are most dramatically and differentially regulated in the experiment. Scatter plots

3 Chu et al., studied the developmental program of gene expression during sporulation in the budding yeast Saccharomyces cerevisiae. Data can be downloaded: http://www.dnachip.orghttp://www.dnachip.org Using microsoft Excel or a variety of other graphics packages, the expression data can be graphed as a scatter plot. The main feature of the scatter plot is the substantial correlation between the expression values in the two conditions being compared. Another feature is the predominance of low-intensity values. This means that the majority of genes are expressed at only a low level and relatively few genes are expressed at a high level.

4 In this figure a, the spreadsheet columns are identifiers of yeast open reading frames and data for a time course for changes in gene expression in S. cerevisiae during sporulation. Green and red refer to samples in vegetative cells versus sporulating. In fig b, the scatter plot in linear scale reveals more overall similarities than differences between data sets. To know the dramatically regulated genes, a plot with a logarithmic scale is preferable. There are two reasons – i)This spreads the data from the lower left corner to a more centered distribution which helps to analyze easily ii) It is far easier to describe the fold regulation of genes using a logarithmic scale.

5 For example, gene expression values are obtained at times t=0,1,2,3 and the raw ratio values are 0, 1, 2, 0.5. In linear scale the fold increase of gene expression between t=1 and 2 is 2 and fold decrease between t=1 and 3 is 2, which are symmetric about zero in log space.

6 Significance analysis is a method that find significantly regulated genes in microarray experiment. SAM assigns a score to each gene in a microarray experiment based upon its change in gene expression relative to the standard deviation of repeated measurements. Significance Analysis of Microarrays (SAM) http://www-stat.stanford.edu~tibs/SAM SAM has several useful features – i)This program is convenient to use as a Microsoft Excel plug-in. ii) It accepts microarray data from experiments using a variety of experimental designs.

7 SAM input data can be in a raw or log-transformed format. Each raw of the data matrix contains expression values for one gene, and the columns correspond to samples. SAM uses a modified t-statistics. SAM provides information on the false discovery rate, which are percent of genes that are expected to be identified by chance. The user can adjust a parameter called delta to adjust the false-positive rate: for example, in a typical experiment, for every 100 genes declared significantly regulated according to the test statistic, 10 might be false positive (10%). Standard deviation the difference of the mean of the gene expression values being compared

8 The SAM algorithm calculates a q value which is the lowest false discovery rate at which a gene is described as significantly regulated. The genes are ranked according to the test statistic and plotted to show the number of observed gene expression versus expected number. The graph effectively visualizes the outlier genes that are most dramatically regulated. Arrow 1-upregulated, arrow 2- downregulated.

9 There are several kinds of clustering techniques. The most common form for microarray analysis is hierarchical clustering, in which a sequence of nested partitions is identified resulting in a dendogram. Hierarchical clustering can be performed using agglomerative or divisive approaches. Clustering of Microarray data

10 Agglomerative and divisive clustering generally produce similar results. We can use a typical data set of 20 genes and three time points to produce two clustering trees.

11 For each tree, the y axis (height) represents dissimilarity. Gene 8 and 11 which we identified as possible outliers, have branches with large vertical heights. On clustering trees the genes are spaced evenly across the x axis, and the significance of their position of these genes depends on the cluster to which they belong. Note that while the overall topologies are similar, several of the genes are given distinctly different placements on the tree in agglomerative versus divisive clustering (indicated with arrows). In general, different exploratory techniques may give subtle or dramatic differences in their description of the data.

12 One way to gain confidence in a particular tree topology is to independently replicate your experiment. Another way is to examine the clusters for biological significance. If genes 1 and 12 were both genes encoding cytokines, you might have more confidence in the agglomerative result.

13 Treeview clustering (http://rana.stanford.edu/software)


Download ppt "DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample."

Similar presentations


Ads by Google