DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Microarray Data Preprocessing and Clustering Analysis
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Fuzzy K means.
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Statistical Analysis of Microarray Data
CHAPTER 1: Picturing Distributions with Graphs
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Chapter 6 Random Error The Nature of Random Errors
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
More on Microarrays Chitta Baral Arizona State University.
Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Changes in Gene Regulation in Δ Zap1 Strain of Saccharomyces cerevisiae due to Cold Shock Jim McDonald and Paul Magnano.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Chapter Eight: Using Statistics to Answer Questions.
Cluster validation Integration ICES Bioinformatics.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Principal Components Analysis ( PCA)
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Computational Biology
Unsupervised Learning
Density Curves and Normal Distribution
Significance analysis of microarrays (SAM)
Volume 138, Issue 4, Pages (August 2009)
Clustering.
by Andrea J. O'Hara, Ling Wang, Bruce J. Dezube, William J
Dimension reduction : PCA and Clustering
Volume 12, Issue 6, Pages (December 2003)
Analysis of Microarray Data Using Z Score Transformation
Volume 7, Issue 3, Pages e12 (September 2018)
Chapter Nine: Using Statistics to Answer Questions
Experimental Design Experiments Observational Studies
Volume 12, Issue 9, Pages (April 2002)
Brandon Ho, Anastasia Baryshnikova, Grant W. Brown  Cell Systems 
Statistical chart of significantly differentially expressed genes
Volume 11, Issue 4, Pages (April 2015)
Clustering.
One-way hierarchical cluster analysis of SAM-identified genes using the TMEV software to see the data substructure. One-way hierarchical cluster analysis.
Unsupervised Learning
Presentation transcript:

DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample. Microarray data analysis employs mathematical tools that have been established to facilitate – i)Cluster analysis ii)Principal component analysis iii)Some approaches to reduce highly dimensional data to a useful form The main questions that microarray data analysis seeks to answer are as follows- i)For a comparison of two conditions (cell lines treated with or without a drug) ii)For comparisons across multiple conditions (analyzing gene expressions from normal and diseased individuals) iii)To cluster data as a function of sample and/or a function of genes DNA microarray

Scatter plot provides one of the most basic ways of analyzing gene expression data from microarray experiments. This shows the comparison of gene expression values for two samples. Most data points typically fall on a 45 o line, but genes that are up or down regulated fall off the line. The scatter plot rapidly displays which genes are most dramatically and differentially regulated in the experiment. Scatter plots

Chu et al., studied the developmental program of gene expression during sporulation in the budding yeast Saccharomyces cerevisiae. Data can be downloaded: Using microsoft Excel or a variety of other graphics packages, the expression data can be graphed as a scatter plot. The main feature of the scatter plot is the substantial correlation between the expression values in the two conditions being compared. Another feature is the predominance of low-intensity values. This means that the majority of genes are expressed at only a low level and relatively few genes are expressed at a high level.

In this figure a, the spreadsheet columns are identifiers of yeast open reading frames and data for a time course for changes in gene expression in S. cerevisiae during sporulation. Green and red refer to samples in vegetative cells versus sporulating. In fig b, the scatter plot in linear scale reveals more overall similarities than differences between data sets. To know the dramatically regulated genes, a plot with a logarithmic scale is preferable. There are two reasons – i)This spreads the data from the lower left corner to a more centered distribution which helps to analyze easily ii) It is far easier to describe the fold regulation of genes using a logarithmic scale.

For example, gene expression values are obtained at times t=0,1,2,3 and the raw ratio values are 0, 1, 2, 0.5. In linear scale the fold increase of gene expression between t=1 and 2 is 2 and fold decrease between t=1 and 3 is 2, which are symmetric about zero in log space.

Significance analysis is a method that find significantly regulated genes in microarray experiment. SAM assigns a score to each gene in a microarray experiment based upon its change in gene expression relative to the standard deviation of repeated measurements. Significance Analysis of Microarrays (SAM) SAM has several useful features – i)This program is convenient to use as a Microsoft Excel plug-in. ii) It accepts microarray data from experiments using a variety of experimental designs.

SAM input data can be in a raw or log-transformed format. Each raw of the data matrix contains expression values for one gene, and the columns correspond to samples. SAM uses a modified t-statistics. SAM provides information on the false discovery rate, which are percent of genes that are expected to be identified by chance. The user can adjust a parameter called delta to adjust the false-positive rate: for example, in a typical experiment, for every 100 genes declared significantly regulated according to the test statistic, 10 might be false positive (10%). Standard deviation the difference of the mean of the gene expression values being compared

The SAM algorithm calculates a q value which is the lowest false discovery rate at which a gene is described as significantly regulated. The genes are ranked according to the test statistic and plotted to show the number of observed gene expression versus expected number. The graph effectively visualizes the outlier genes that are most dramatically regulated. Arrow 1-upregulated, arrow 2- downregulated.

There are several kinds of clustering techniques. The most common form for microarray analysis is hierarchical clustering, in which a sequence of nested partitions is identified resulting in a dendogram. Hierarchical clustering can be performed using agglomerative or divisive approaches. Clustering of Microarray data

Agglomerative and divisive clustering generally produce similar results. We can use a typical data set of 20 genes and three time points to produce two clustering trees.

For each tree, the y axis (height) represents dissimilarity. Gene 8 and 11 which we identified as possible outliers, have branches with large vertical heights. On clustering trees the genes are spaced evenly across the x axis, and the significance of their position of these genes depends on the cluster to which they belong. Note that while the overall topologies are similar, several of the genes are given distinctly different placements on the tree in agglomerative versus divisive clustering (indicated with arrows). In general, different exploratory techniques may give subtle or dramatic differences in their description of the data.

One way to gain confidence in a particular tree topology is to independently replicate your experiment. Another way is to examine the clusters for biological significance. If genes 1 and 12 were both genes encoding cytokines, you might have more confidence in the agglomerative result.

Treeview clustering (