E XPECTATION M AXIMIZATION M EETS S AMPLING IN M OTIF F INDING Zhizhuo Zhang.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Model Assessment, Selection and Averaging
Chapter 4: Linear Models for Classification
Segmentation and Fitting Using Probabilistic Methods
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Visual Recognition Tutorial
Overview Full Bayesian Learning MAP learning
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Lecture 5: Learning models using EM
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Transcription factor binding motifs (part I) 10/17/07.
Maximum likelihood (ML) and likelihood ratio (LR) test
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Evaluating Hypotheses
Gaussian Mixture Example: Start After First Iteration.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation Maximization Algorithm
Maximum Likelihood (ML), Expectation Maximization (EM)
Bootstrapping LING 572 Fei Xia 1/31/06.
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Maximum likelihood (ML)
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
EM and expected complete log-likelihood Mixture of Experts
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Flat clustering approaches
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Machine Learning 5. Parametric Methods.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Tutorial I: Missing Value Analysis
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
16.0 Some Fundamental Principles – EM Algorithm References: , of Huang, or of Jelinek of Rabiner and Juang 3.
LECTURE 11: Advanced Discriminant Analysis
Statistical Models for Automatic Speech Recognition
De novo Motif Finding using ChIP-Seq
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Presentation transcript:

E XPECTATION M AXIMIZATION M EETS S AMPLING IN M OTIF F INDING Zhizhuo Zhang

O UTLINE Review of Mixture Model and EM algorithm Importance Sampling Re-sampling EM Extending EM Integrate Other Features Result

R EVIEW M OTIF F INDING : M IXTURE MODELING Given a dataset X, a motif model Ѳ, and a background model θ 0, the likelihood of observed X, is defined as : To optimize likelihood above is NP-hard, EM algorithm solve this problem with the concept of missing data. Assume the missing data Z i is binding site Boolean flag of each site: Motif ComponentBackground Component

R EVIEW M OTIF F INDING : EM E-step: M-step:

P ROS AND C ONS Pros: Pure Probabilistic Modeling EM is a well known method The complexity of each iteration is linear Cons: In each iteration, it examines all the sites (most is background sites) EM is sensitive to its starting condition The length of motif is assumed given

S AMPLING I DEA (1) Simple Example: 20 As and 10 Bs AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBB Let’s define a sampling function Q(x), and Q(x)=1 when x is sampled: E.G., P(Q(A)=1)=0.1 P(Q(B)=1)=0.2 The sampled data maybe: AABB we can recover the original data from “AABB” 2A in sample/0.1=20 A in original 2B in sample/0.2=10 B in original

S AMPLING I DEA (2) Almost every sampling function can recover the statistics in the original, which is known as “ Importance sampling ” We can defined a good sampling function on the sequence data, which prefer to sample binding sites than background sites. According the parameter complexity, motif model need more samples than background to achieve the same level of accuracy.

R E - SAMPLING EM Sampling function Q(.), and sampled data X Q E-step: the same as original EM M-step:

R E - SAMPLING EM

How to find a good sampling function Intuitively, Motif PWM is the natural good sampling function, but it is impossible for us to know the motif PWM before hand. Fortunately, a approximate PWM model already can do a good job in practice.

H OW TO FIND A GOOD APPROXIMATING PWM? Unknown length Unknown distribution

E XTENDING EM Start from all over-represented 5-mers Similarly, we find a motif model(PWM) contains the given 5-mer which maximizes the likelihood of the observed data. We define a extending EM process which optimizes the flanking columns included in the final PWM.

E XTENDING EM Imagine we have a length-25 PWM Ѳ with 5-mer q “ACTTG” in the middle, which is wide enough for us to target any motif less than 15bp ( W max ). Po12… … 2425 A 0.25 …… …… 0.25 C …… …… 0.25 G …… …… 0.25 T …… …… 0.25

E XTENDING EM We use two indices to maintain the start and end of the real motif PWM

E XTENDING EM The M-step is the same as original EM, but we need to determine which column should be included. The increase of log-likelihood by including column j

C ONSIDER OTHER FEATURES IN EM Other features Positional Bias Strand Bias Sequence Rank Bias We integrate them into mixture model New likelihood ratio Boolean variable to determine whether include feature or not.

C ONSIDER OTHER FEATURES IN EM If feature data is modeled as multinomial, Chi-square Test is used to decide whether a feature should be included: The multinomial parameters φ also can be learned in the M-step:

A LL TOGETHER

PWM ModelPosition Prior Model Peak Rank Prior Model

S IMULATION R ESULT

R EAL D ATA R ESULT 163 ChIP-seq datasets Compare 6 popular motif finders. Half for training, half for testing

R EAL D ATA R ESULT De novo AP1 ModelDe novo FOXA1 Model De novo ER Model

C ONCLUSION SEME can perform EM on biased sampled data but estimate parameters unbiasedly vary PWM size in EM procedure by starting with a short 5-mer automatically learn and select other feature information during EM iterations