Sequential Pattern Discovery under a Markov Assumption

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
CS479/679 Pattern Recognition Dr. George Bebis
STAT 497 APPLIED TIME SERIES ANALYSIS
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Patterns, Profiles, and Multiple Alignment.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Theory By Johan Walters (SR 2003)
Spring 2003Data Mining by H. Liu, ASU1 7. Sequence Mining Sequences and Strings Recognition with Strings MM & HMM Sequence Association Rules.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Transcription factor binding motifs (part I) 10/17/07.
Evaluating Hypotheses
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Estimating a Population Proportion
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Hidden Markov Models for Sequence Analysis 4
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Classification Heejune Ahn SeoulTech Last updated May. 03.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
John Lafferty Andrew McCallum Fernando Pereira
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Machine Learning 5. Parametric Methods.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Hidden Markov Models BMI/CS 576
Advanced Statistical Computing Fall 2016
Learning Sequence Motif Models Using Expectation Maximization (EM)
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Transcription factor binding motifs
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Hidden Markov Models Part 2: Algorithms
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Hidden Markov Models (HMMs)
Classification and Prediction
Professor of Computer Science and Mathematics
CONTEXT DEPENDENT CLASSIFICATION
Finding regulatory modules
Expectation-Maximization & Belief Propagation
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Volume 86, Issue 3, Pages (March 2004)
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Sequential Pattern Discovery under a Markov Assumption Darya Chudova, Padhraic Smyth. 11/22/2018

Introduction Problem: Identification of recurrent patterns in large data sets in the form of categorical sequences (E.g. Motifs in a DNA sequence). As example consider the pattern ADDABB embedded in a background process: ….BACADBADBBC[ADDABB]BACDBDBA[ADDACB]DAC… 11/22/2018

Models for the Patterns Model 1, True Model: True model for generating patterns and background known. Corresponds to Bayes classification and error. Model 2, Supervised Training: General form of model and location of patterns known. Model 3, Unsupervised Training: General form of the model known. All parameters and location of patterns unknown. 11/22/2018

Contributions of the paper Provide an accurate approximate expression for the Bayes error under Markov assumption. Illustrate how alphabet size, pattern length, pattern frequency, and pattern autocorrelation affects the Bayes error rate. Empirical investigation of several well-known algorithms in the Markov context. Application to motif-finding problems and how the theoretical framework helps. 11/22/2018

Hidden Markov Model P1 P3 P4 P2 B 0.9 B 0.9 B 0.9 D 0.9 A 0.25 B 0.25 C 0.25 D 0.25 P1 P3 P4 P2 B 0.99 0.01 1.0 1.0 1.0 1.0 Background state B can only transition to itself or the first pattern state P1. Each pattern state Pi can only transition to state Pi+1, 0<I<L. The last pattern state PL can only transition back to the background state B. 11/22/2018

The parameters nA: size of the observable alphabet. L: length of the pattern. : noise probability of substitution error in each of the pattern positions. ns: expected number of substitutions in a pattern, ns=L  . F: frequency of pattern occurrence in the sequence, so that expected number of patterns in a sequence of length N is given by F  N. 11/22/2018

Bayes Error Rate Under iid assumption: Pe*=  min{p(h=B|o), p(h=P1…L|o)}p(o) o h For the Markov case: Pe*= lim (1/N) min {p(hi=B|O),p(hi=P1…L|O)} N i Explain Bayes rule. 11/22/2018

Analytical Expressions for Bayes Error Rate Closed form expression difficult. Get iid approximation i.e. each position i is classified independently depending on the next L-1 symbols oi,…,oL-1. Pe* PeIID Expression for PeIID very complex to evaluate or interpret. 11/22/2018

IID-pure Further simplifying assumption that each substring of length L, starting at position i is generated by a run of L background states or L pattern states. We calculate the associated error Pe^IID=LCl(nA-1)lmin{(1-)(L-l)( /(nA-1))l F,(1/nA)L(1-F)} Pe* PeIID  Pe^IID Normalized Bayes error rate: PNe*= Pe*/F Explain symbols. 11/22/2018

Normalized Error Normalized error rate increases with increasing substitution error ns. Normalized error rate decreases with increasing pattern frequency F Normalized Bayes error rate decreases with increasing pattern length if the metric used by Sze et al. (2002) is kept constant. Draw graphs figure 2 and figure 3. Compares with Sze et al. (2002). Who define the expected percentage of symbols with substitution errors in a pattern. 11/22/2018

Insights from analytical expression As substitution error 0 for the trial case with L=5, nA=4 and F=0.005, we get 20% of patterns misclassified. For fixed pattern length and pattern frequency, if >0.28 all the pattern symbols are classified as background. For fixed L and , if F is less than 3 in a thousand, all patterns are classified as background. Introducing insertions increases Bayes error rate. 11/22/2018

The effect of pattern structure Bayes error is higher for structured patterns i.e. error for BBBBBBBBBB is higher than error for BCBCBCBCBC which is higher than BCCBCCBCCB Bayes error for highly autocorrelated patterns is higher. Discuss the graphs. 11/22/2018

Three Pattern Discovery Algorithms Motif sampler by Liu et al. (1995) (IID-Gibbs). MEME algorithm by Bailey and Elkan (1995) (IID-EM). HMM based algorithm (HMM-EM) 11/22/2018

Comparing the three algorithms All three algorithms perform as well with a strong prior on the pattern frequency. For a weak prior IID-Gibbs remains unaffected but the other algorithms worsen, with HMM-EM performing worst. Asymptotically all seem to converge to the true rate. Draw diagrams figure 8 and figure 9 11/22/2018

Component-Wise Breakdown of Error The Basic Bayes error Additional error due to noise in parameter estimates Further additional error from not knowing where the patterns are located. High accuracy is obtained if all the above errors are small. 11/22/2018

Finding Real Motifs HMM model fitted to data of E. coli DNA-binding protein families. Bayes error rate seems to be independent of the training sample size. 11/22/2018

11/22/2018