Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.

Slides:

Advertisements

Similar presentations

Microarray statistical validation and functional annotation

Advertisements

Linear Models for Microarray Data

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.

Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

G. Alonso, D. Kossmann Systems Group

Putting genetic interactions in context through a global modular decomposition Jamal.

A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.

OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.

Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Mutual Information Mathematical Biology Seminar

SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.

Gene Expression Data Analyses (3)

Differentially expressed genes

Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.

Statistical Analysis of Microarray Data

Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.

Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.

Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.

Sai Moturu. Introduction Current approaches to microarray data analysis –Analysis of experimental data followed by a posterior process where biological.

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.

Generate Affy.dat file Hyb. cRNA Hybridize to Affy arrays Output as Affy.chp file Text Self Organized Maps (SOMs) Functional annotation Pathway assignment.

ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.

Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.

Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown Science Vol. 278.

כמה מהתעשייה? מבנה הקורס השתנה Computer vision.

Statistics Introduction 1.)All measurements contain random error  results always have some uncertainty 2.)Uncertainty are used to determine if two or.

Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.

Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,

Gene expression & Clustering (Chapter 10)

DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently.

Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.

Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008.

CellFateScout step- by-step tutorial for a case study Version 0.94.

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.

GRNmap and GRNsight June 24, Systems Biology Workflow DNA microarray data: wet lab-generated or published Generate gene regulatory network Modeling.

Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara.

Correction for multiple comparisons in FreeSurfer

Sample Size Determination Text, Section 3-7, pg. 101 FAQ in designed experiments (what’s the number of replicates to run?) Answer depends on lots of things;

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.

Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.

Statistical Testing with Genes Saurabh Sinha CS 466.

1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.

Cluster validation Integration ICES Bioinformatics.

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Equivalent Opposite PTPRC low  CD19 low FAM60A low  NUAK1 high XIST high  RPS4Y1 low COL3A1 high  SPARC high Boolean analysis of large gene-expression.

Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.

Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

CGH Data BIOS Chromosome Re-arrangements.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.

Canadian Bioinformatics Workshops

Significance analysis of microarrays (SAM)

Presentation transcript:

Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of Electrical Engineering 2 Department of Computer Science 3 Department of Radiology and 4 Department of Health Research and Policy and Department of Statistics Stanford University Roli Shrivastava

Introduction Problem Statement –To identify up and down regulated gene –To identify the time of transition Experimental Technique –Microarray (Tens of thousands of distinct probes on an array to accomplish the equivalent number of genetic tests in parallel) Computational Technique –A tool called StepMiner to extract biologically meaningful result from large amounts of data

Types of Transitions 1. One Step 2. Two Step 3. Genes for which the one- or two-step patterns do not fit appreciably better than a constant mean value (the null hypothesis).

Fitting One or Two-Step Function F 1 statistic: Computes how well the one-step model fits the data F 2 statistic: Computes how well the two-step model fits the data F 12 statistic: Compares the fit of one-step model and two-step model on same data P-value: Low P-value represents a good fit of the model to the data Calculate the F statistic for the model and data set Calculate the P-value If P < P threshold If P > P threshold The model fits The model does not fit P threshold = 0.05

StepMiner Algorithm one-step fits data AND one-step fits better than two-step two-step fits data AND one-step does not fit it Neither one-step Nor two-step fits the data

Comparison of 4 Algorithms Step height = 5 σ. Number of timepoints = 15. A total of 2000 random data, 2000 one step data and 2000 two step data with random step positions. StepMiner Algo

Comparison of 4 Algorithms Step height = 5 σ. Number of timepoints = 15. A total of 2000 random data, 2000 one step data and 2000 two step data with random step positions.

Generation of Simulated Data Microarray data with 15 non-uniform time points 4000 genes with 2000 one-step and 200 two-step patterns Gaussian noise was added to the above data P-value threshold of 0.05 was used

Results of Simulated Data - I σ is the standard deviation of noise Step position is fixed at 5 for 1- step Step position at 5 and 9 for 2-step Higher the height easier is the identification

Results of Simulated Data - II σ is the standard deviation of noise Random step positions Small reduction in accuracy Higher matches occur if all constant segments in a curve have several time points. Desirable to design experiments so that there are several points before the first interesting transition and after the last interesting transition.

Results of Simulated Data - III Shows sensitivity to P-value threshold and number of time points Random step position and step height of 5σ Two-step signals require more time points than one-step signals Matches increase on increasing P-value but at the cost of higher False Discovery Rate

Results of Simulated Data - IV Shows sensitivity to spacing between steps For 15 time points first step is fixed at position 4 A spacing of at least 3 time points is required when step height is > 3σ Steps are required to be placed at least 3 time points from end point

Diauxic Shift In the initial phases of a growing batch culture, yeast prefers to metabolize glucose and produce ethanol even when oxygen is abundant. When the glucose is exhausted, cells undergo a “diauxic shift,” in which they switch abruptly to an oxidative metabolism. This pathway allows the oxidation of the accumulated fermentation products and is highly efficient as a mechanism for generating ATP. Brauer et. al., Mol Biol Cell May; 16(5): 2503–2517

Analysis of Experimental Data 2284 genes with diauxic shift 1088 were matched with one- step transition 267 were two-step transitions 929 did not match to anything Fitting functions for 3 genes

Same Data reanalyzed using StepMiner Heat Maps Analysis by Brauer et. al. The heat map shows two transitions at 8.25 and 9.25 h

Comparison With Brauer et al’s Results The GO annotations and FDR-corrected P-values for the clusters reported in Brauer et al. was recomputed with the latest yeast gene annotations from the Gene Ontology Consortium Website Table shows the results of the p-values from GO- Term Finder as well as Step Miner.

Table for Comparison

Results Of Comparison The annotation that had the lowest P-values in Brauer et al. had even low P-values in the StepMiner groups. In most cases, the P-values in the reanalysis are lower than Brauer et al’s, implies that grouping by time-of-change is at least as effective as hierarchical clustering at identifying relevant genes. GO annotations are obtained fully automatically using StepMiner – it is not necessary to select interesting clusters manually. Those clusters which has no P-values from StepMiner were “less interpretable in terms of diauxic shift”, in the words of Brauer et al.

Comparison of StepMiner to Other Tools Hierarchical clustering: finds clusters that transition at same time point –Manual search required to find transitions SAM: finds transitions by looking for significant differences in average expression before and after a specified time point. –However, many of the genes selected by this method do not, in fact, have a transition at the specified time point. EDGE: identify genes whose expression systematically change over time and significantly different from the mean of the expressions over time. –Clearly, this method doesn’t provide the direction and position of significant change directly.

Hierarchical vs. StepMiner Cluster that transitions at 3 hours StepMiner clearly shows other transition times

Comparison of StepMiner to Other Tools - STEM Provides model profiles and their significance values But profiles don’t look like step functions and therefore is not helpful to locate transitions

Strengths and Limitations Easy to understand Few parameters Biologically transitions can be more interesting Very fast < 15s for 15 microarrays of genes Can deal with missing measurements Provides statistical parameters like P-value, FDR etc. Binary model There can be other cases: eg, transition is not step Short and long time courses are not good Most appropriate for Time measurements.

Post StepMiner Analysis Once StepMiner is run genes undergoing binary transitions can easily be partitioned into sets based on the number, direction, and timing of transitions. These sets can be merged at the user’s discretion (e.g., the set of one-step genes that rise at time 3 could be merged with the two- step genes that rise at time 3), or can be further subdivided etc.

BACK UP SLIDES

Replication vs. Resolution For accuracy it is better to take more frequent measurements that to get replicates It comes at a cost of correctly identifying the kind of step