CIS: Compound Importance Sampling for Binding Site p-value Estimation The Hebrew University, Jerusalem, Israel Yoseph Barash Gal Elidan Tommy Kaplan Nir.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Bellwork If you roll a die, what is the probability that you roll a 2 or an odd number? P(2 or odd) 2. Is this an example of mutually exclusive, overlapping,
Slide 1 Insert your own content. Slide 2 Insert your own content.
1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.
Factors, Primes & Composite Numbers
Combining Like Terms. Only combine terms that are exactly the same!! Whats the same mean? –If numbers have a variable, then you can combine only ones.
February 7, 2002 A brief review of Linear Algebra Linear Programming Models Handouts: Lecture Notes.
Graph of a Curve Continuity This curve is _____________These curves are _____________ Smoothness This curve is _____________These curves are _____________.
Graph of a Curve Continuity This curve is continuous
1 2 Test for Independence 2 Test for Independence.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Properties of numbers EVEN and ODD numbers
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
Solve Multi-step Equations
ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving.
Chapter 4: Basic Estimation Techniques
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
DIVISIBILITY, FACTORS & MULTIPLES
ABSTRACT: We examine how to detect hidden variables when learning probabilistic models. This problem is crucial for for improving our understanding of.
Modeling Dependencies in Protein-DNA Binding Sites 1 School of Computer Science & Engineering 2 Hadassah Medical School The Hebrew University, Jerusalem,
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,
Ideal Parent Structure Learning School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan with Iftach Nachman and Nir.
1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.
O X Click on Number next to person for a question.
Logarithmic Equations
Sequence Alignment I Lecture #2
Solving Equations How to Solve Them
Factors, Prime Numbers & Composite Numbers
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
Outcome: Determine the square root of perfect squares.
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
This, that, these, those Number your paper from 1-10.
Limits (Algebraic) Calculus Fall, What can we do with limits?
PATH INTEGRAL FORMULATION OF LIGHT TRANSPORT Jaroslav Křivánek Charles University in Prague
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
11 = This is the fact family. You say: 8+3=11 and 3+8=11
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
Week 1.
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
O X Click on Number next to person for a question.
2.4 – Factoring Polynomials Tricky Trinomials The Tabletop Method.
One step equations Add Subtract Multiply Divide Addition X + 5 = -9 X = X = X = X = X = 2.
5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.
EMIS 8374 LP Review: The Ratio Test. 1 Main Steps of the Simplex Method 1.Put the problem in row-0 form. 2.Construct the simplex tableau. 3.Obtain an.
Multiplication Facts Practice
Rewrite each expression using the distributive property (4-2).
Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.
Computational Facility Layout
Figure 43.0 Specialized lymphocytes attacking a cancer cell
Graeme Henchel Multiples Graeme Henchel
0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.
MULTIPLICATION OF INTEGERS
Count to 20. Count reliably at least 10 objects. Use ‘more’ and ‘less’ to compare two numbers. Count reliably at least 10 objects. Estimate number of objects.
7x7=.
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
. A Simple Hyper Geometric Approach for Discovering Putative Transcription Factor Binding Sites Yoseph Barash Gill Bejerano Nir Friedman Hebrew University.
Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Transcription factor binding motifs
Presentation transcript:

CIS: Compound Importance Sampling for Binding Site p-value Estimation The Hebrew University, Jerusalem, Israel Yoseph Barash Gal Elidan Tommy Kaplan Nir Friedman

2 Detecting Target Genes promoter binding site? gene binding site? Probabilistic framework Log odds Score: ACGTACGT 1 2 k p[i,c] – prob. of letter c at position i

3 Detecting target genes (2) ? ?

4 p-value of Scores Score Prob S

5 p-value score: Universal Interpretable Control false positive error rate Detecting target genes (3) Bonferroni corrected p-value 0.01 score p-value

6 p-value Estimation Score Problem 1: naïve enumeration infeasible #seq = 4 k Prob S* Estimate the p-value by sampling from P 0 : samples scores: s 1 …s n

7 p-value Estimation Need ~10 7 attempts to get a sample with pvalue < Prob Problem 2: Multiple hypothesis Testing low p-values (10 -7 ) S* Score S*

8 Importance Sampling Approach Score 1.Cheat: Sample from Q(s 1 …s k ), to get high scoring samples 2. Get absolution: Weigh each sample S* Prob Empirical p-value ~ N ~ 10 4

9 Why is this allowed? x = subsequence Importance Sampling Desired estimate: expectation of log-odds Sample from P 0 (x) and count Multiply and divide by Q(x) Sample from Q(x) and reweight How to choose Q?

10 Choosing Sampling Distribution Score Q 10 = MotifQ 1 = Background Q5Q5 Under-sampled region Density

11 Choosing Sampling Distribution wRescale wCombine Comprehensive Coverage Sampling distribution Score Density Mixing ratio

12 PSSM Example 6e-5 Naive 0 2e-5 4e MAST (Bailey et al. 98) Normal p-value Score CIS ( ) (40 000) What if we want something else?

13 wDependency Models - Many possible variants: Trees, Mixture of PSSMs, Mixture of Trees etc. Tree Example: wSuggested by several recent papers: Barash et al.(2003), King & Roth (2003), Zhou & Liu (2004),… Beyond PSSM Models wMain Point: Capture dependencies between biding site positions Improve sites predictions Challenge: compute p-values for general models X1X1 X2X2 X3X3 X4X4 X5X5

14 Tree Model Example 0 2e-5 4e-5 6e-5 8e-5 1e p-value Scor e X Not efficient X Not applicable X Not accurate wNaïve Sampling wMAST (Baily et al,98) wNormal Approx. Naive Normal CIS ( ) (40 000)

15 Decreased Estimator Variability 0 2e-5 4e-5 6e-5 8e-5 1e p-value Scor e 10 repeats of sampling Naive Normal CIS ( 10x ) ( 10x )

16 CIS - Summary General form – Wide range of probabilistic models Computationally efficient Handles low p-values accurately Available online, at:

17 Thank you Joint Work with: Nir Friedman Gal Elidan Tommy Kaplan