Statistical methods for identifying yeast cell cycle transcription factors Speaker: Chun-hui Cai.

Slides:



Advertisements
Similar presentations
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Advertisements

Hypothesis testing Another judgment method of sampling data.
Hypothesis Testing 7.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Business Statistics for Managerial Decision
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Elementary hypothesis testing
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Differentially expressed genes
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
IENG 486 Statistical Quality & Process Control
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Today Concepts underlying inferential statistics
Probability Population:
Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Introduction to Hypothesis Testing
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Confidence Intervals and Hypothesis Testing - II
Hypothesis Testing.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Instructor Resource Chapter 5 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
BPS - 5th Ed. Chapter 231 Inference for Regression.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Hypothesis Testing: Hypotheses
CONCEPTS OF HYPOTHESIS TESTING
Volume 4, Issue 1, Pages (July 2013)
Volume 106, Issue 6, Pages (September 2001)
Predicting Gene Expression from Sequence
Presentation transcript:

Statistical methods for identifying yeast cell cycle transcription factors Speaker: Chun-hui Cai

Knowing transcription factors involved in the yeast cell cycle is helpful for understanding the regulation of yeast cell cycle genes. –Predicting individual cell cycle TFs and synergistic TF pairs.

The essential idea is that if a TF is a cell cycle regulator, then genes regulated and not regulated by it should, on average, have significantly different expression levels during one or more phases of the cell cycle.

Data collection and processing The target genes of TFs are collected from four TF databases and ChIP-chip data. The expression data of yeast genes are gathered from the microarray data of Spellman.

Method for identifying cell cycle TFs Step 1: –The known or putative target genes of each known TF are collected from four TF databases and Chip-chip data Step 2: –A gene is said to be potentially bound or regulated by TF if there is evidence from the four databases, or if its p value in the TF ChIP-chip experiment (pc) is smaller than a certain low threshold (e.g., pc 0.8). For TF α, we generate two gene groups, G α and G -α : a gene belongs to group G α if it is potentially regulated by TF α but to group G -α if it is not regulated by TF α.

Step 3: –TF α is said to be a cell cycle regulator if (i) there exists at least one phase of the cell cycle where the expression levels of genes in G α are significantly different from the expression levels in one or more of the other phases, and (ii) the expression levels of genes in G α are on average significantly higher (or lower) than those of genes in G -α in at least one phase of the cell cycle. Five cell cycle phases: G1, S, SG2, G2M, and MG1. Method for identifying cell cycle TFs

To examine the two statistical criteria in Step 3 – Kolmogorov- Smirnov (KS) test –A nonparametric test to determine whether two distributions differs significantly. –Calculates the maximum vertical deviation (D) between empirical distribution functions of the two groups to determine whether the two data sets are drawn from the same distribution. –x: the average expression level of a gene over all experiment; f i (x): the density function of x for the genes in group i; Fi(x): the corresponding (cumulative) distribution function. For groups i and j, if the statistic D is significantly large, we infer the two groups of genes are from two different distributions and are expressed differently. Method for identifying cell cycle TFs

For a given TF, we perform10 KS tests for 10 possible pairs of the five phases to examine the first criterion in Step 3. If at least one test has a p value lower than 0.01, we proceed to the next test. Otherwise, the TF will not be considered further. A similar method is applied to examine whether the expression levels of genes in G α are on average significantly higher (or lower) than those of genes in G -α in at least one phase of the cell cycle. –For each phase, we test H0: FG α = FG -α vs. H1: FG α != FG -α by using the two-sided KS test, where F denotes the distribution function of the expression levels of genes in a specific group. If H0 is rejected, we compare the statistics D_ = supx[FG α (x) - FG -α (x)] and D+ = supx[FG -α (x) - FG α (x)] of the two distributions to determine whether FG α stFG -α (i.e., D_>D+). FG α < stFG -α means that the expression levels in group G α are stochastically greater than their corresponding expression levels in G -α group. Method for identifying cell cycle TFs

Method for identifying synergistic TF pairs Step 1: –Before testing whether two TFs (say α and β) interact, we study whether they are associated in the same gene more often than random expectation. That is, let N1 be the total number of genes in G α, N2 be the total number of genes in G β, N12 be the number of genes in G α,β, and N be the total number of genes in the yeast genome and calculate whether N12/N is greater than the random expectation (N1/N)X(N2/N). If the variables are independent, then the joint probability of N12/N should be equal to the product of the two marginal probabilities, (N1N)X(N2N). If there exists a positive association, then N12/N will be greater than (N1/N)X(N2/N).

Method for identifying synergistic TF pairs Step 2: –For each pair with significant association, we test whether there exists at least one phase of the cell cycle where the expression levels of genes in G α,β are significantly different from the expression levels in one or more of the other phases. Step 3: –We generate four groups of genes, G α,β, G α,-β, G - α,β, and G -α,-β, and collect their expression data under different cell cycle phases. For most pairs, we set the threshold (pc) as 0.8 to define a gene not regulated by a specific TF.

Method for identifying synergistic TF pairs Step 4: –O ijk = μ + μ i + μ j + μ ij + ε ijk where i = α or -α; j = β or -β; k represents gene k in group G α,β, G α,-β, G -α,β, or G -α,-β ; O ijk is the observed average expression level of gene k in group G α,β, G α,-β, G -α,β, or G -α,-β is the overall mean; μ i is the mean in group G α or G -α ; μ j is the mean in group G β or G -β ; μ ij is the interaction effect to be estimated; and ε ijk is the random noise. If the estimate of the term μ αβ is significantly positive (negative), then there is positive (negative) interaction between TFs α and β. –Note that when a TF (say β) does not have a DNA-binding domain, we assume that its single effect does not exist and modify the model as O ijk = μ + μ i + μ ij + ε ijk

Results Correction for multiple tests – false discovery rate (q value) is used for a balance between the numbers of true and false positives. –From the p and q values of a test, three confidence levels are defined: confident (p<10 -5 and q<10 -5 ), plausible (p<0.001 and q<0.001), and doubtful (p<0.01 and q<0.01).

The average expression levels of the 30 inferred cell- cycle TFs (blue) and the average expression levels of their known or predicted target genes (red).

The synergy TF relations in Tables 2 and 10. Eighty TF interaction pairs from 39 TFs are obtained. A TF (or a synergistic pair) is given in red if it is a cell cycle TF with literature support. A synergistic pair is given in green if TFs α and β have physical interactions or are consistent with predictions of other studies.

Discussion In the analysis of individual TFs and of synergistic pairs, 50 TFs are considered confident or plausible as individual cell cycle TFs or as synergistic TF pairs. Twenty-four of these 50 regulators are indeed known cell cycle regulators. Some other TFs have been identified to be regulators in other pathways, such as BAS1 in biosynthesis pathway, FHL1 in rRNA processing, GAL4 in galactose-induction, MIG1 and MIG2 in glucose repression, and MSN2 and MSN4 in stress conditions.

G1: SKN7 and STB1 contribute directly to G1-specific transcription. ASH1 serves to repress the late G1-specific transcription of HO and prevents mating-type switching. RME1 acts as an activator of at least one late G1-specific transcript, CLN2. ACE2 and SWI5 are thought to be cooperative activators at the G1 phase. SWI4, SWI6, and MBP1 are involved at the G1 phase, and SWI4-SWI6 (SBF) and MBP1-SWI6 (MBF) activate late G1 and early S genes. S: YOX1 transcription peaks in the late G1 and represses the early cell cycle box (bound by MCM1) activity until the M phase. For HIR1-HIR2 (or HIR1-HIR3 or HIR2-HIR3), they function as transcriptional corepressors to regulate histone gene transcription in the yeast cell cycle, but they must be transiently inactivated at the G1/S phase boundary for the transcription of histone genes to be derepressed. Therefore, HIR1, HIR2, and HIR3 are indeed corepressors, but genes regulated by them are derepressed in the S and G2 phases

M&G2: NDD1 is the critical activator of a group of M phase- specific transcripts. This M phase-specific transcription requires FKH1/FKH2 as well as MCM1 and NDD1. Our methods indicate that FKH2, NDD1, MCM1, FKH1-MCM1, FKH2-MCM1, and MCM1- NDD1 function in the M phase. Other studies suggested that MCM1, together with FKH1 or FKH2, recruits the NDD1 protein in late G2, and that FKH1, NDD1, and FKH2-MCM1 function in the G2 phase. M/G1: the known TFs include YOX1, DIG1, TEC1, and STE12. DIG1-STE12 is involved in the regulation of mating-specific genes. Kato et al. (30) suggested that SWI4, YOX1, SWI5, DIG1, and STE12 function during this phase. Our methods indicate that SWI4, SWI5, YOX1, TEC1, DIG1, and STE12 (the synergistic pair DIG1-STE12) function at this phase.

Some TFs are present at many phases in the cell cycle. For example, let us consider FKH2. Genes in G FKH2 have a higher expression level than those in G -FKH2 in the S, G2, and M phases but have a lower expression level than those inG FKH2 in the M/G1 and G1 phases. Genes inG FKH2 are expressed in the G1 phase in two patterns: genes in G FKH2,NDD1 have a lower expression level than genes in G -FKH2,-NDD1, whereas genes in G FKH2,SWI6 have a higher expression level than genes in G -FKH2,-SWI6. A TF can act as both a repressor and an activator of gene expression by cooperating with different TFs, making it difficult to confirm in individual analysis.