Probabilistic Automaton Ashish Srivastava Harshil Pathak.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Formal Languages: main findings so far A problem can be formalised as a formal language A formal language can be defined in various ways, e.g.: the language.
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Characterization of state merging strategies which ensure identification in the limit from complete data Cristina Bibire.
Hidden Markov Models (HMM) Rabiner’s Paper
Angelo Dalli Department of Intelligent Computing Systems
YES-NO machines Finite State Automata as language recognizers.
Sampling: Final and Initial Sample Size Determination
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Hidden Markov Models in NLP
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Computational Learning Theory
Finite state automaton (FSA)
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Rosen 5th ed., ch. 11 Ref: Wikipedia
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Learning languages from bounded resources: the case of the DFA and the balls of strings ICGI -Saint Malo September 2008 de la Higuera, Janodet and Tantini.
AUTOMATA THEORY Reference Introduction to Automata Theory Languages and Computation Hopcraft, Ullman and Motwani.
1 Chapter 1 Automata: the Methods & the Madness Angkor Wat, Cambodia.
Some Probability Theory and Computational models A short overview.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
1 Grammatical inference Vs Grammar induction London June 2007 Colin de la Higuera.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Evaluation Decoding Dynamic Programming.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Zadar, August Learning probabilistic finite automata Colin de la Higuera University of Nantes.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 10 Automata, Grammars and Languages.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology,
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Outline (of this first talk)
Introduction to the Theory of Computation
CIS Automata and Formal Languages – Pei Wang
Non Deterministic Automata
Chapter 2 FINITE AUTOMATA.
Non-Deterministic Finite Automata
CONTEXT DEPENDENT CLASSIFICATION
LECTURE 15: REESTIMATION, EM AND MIXTURES
Grammatical inference: learning models
Presentation transcript:

Probabilistic Automaton Ashish Srivastava Harshil Pathak

●Introduction to Probabilistic Automaton ●Deterministic Probabilistic Finite Automata ●Probabilistic Finite Automaton ●Probably Approximately Correct (PAC) learnability Outline

Motivation Serves the purpose of modeling and analyzing asynchronous, concurrent systems with discrete probabilistic choice in a formal and precise way randomized, distributed algorithms probabilistic communication protocols the Binary Exponential Back Off protocol fault tolerant systems speech recognition

●It’s an extension (generalization) of Finite Automata. ●It includes the probability of a transition into the transition function. ●Languages recognised by probabilistic automaton are called stochastic languages. Probabilistic Automaton

Definition ●Finite set of states Q ●finite set of input symbols Σ ●a transition function :Q x Σ -> 2 Q ●transition probabilities P: Q x Σ x Q -> [0,1] ●final-state probabilities F: Q -> [0,1] ●Stochastic Matrix P gives the probability of transition from one state to another taking a particular symbol. ∀ q ∈ Q, F(q) + Σ a,q p(q,a,q’) = 1

Distributions over strings Given a finite alphabet Σ, the set Σ* of all strings over Σ is enumerable and therefore a distribution can be defined. A probabilistic language D is a probability distribution over Σ*. The probability of a string x ∈ Σ* under the distribution D is denoted by a non-negative value Pr D (x) and these probabilities must add to one.

Usefulness They do not tell us if a string belongs to a language. They are good candidates for grammar induction e.g. Having seen so far “abbaba”, what is the next symbol This distribution, if learnt from data, can in turn be used to disambiguate, by finding the most probable string corresponding to a pattern, or to predict by proposing the next symbol for a given prefix, when the structure of the automaton is unknown. If the structure is known, the problem becomes probability estimation problem.

PFA

Probability of string “aba” in given PFA Pr(aba) = 0.7*0.4*0.1* *0.4*0.35*0.2 = =

epsilon-PFA

DPFA Even though determinism restricts the class of distributions that can be generated, we introduce deterministic probabilistic finite-state automata because of the following reasons: Parsing is easier as only one path has to be followed. Some intractable problems (finding the most probable string, comparing two distributions) become tractable. There are a number of positive learning results for DPFA that do not hold for PFA.

DPFA

Computing Probability of “abab”

PFA is strictly more powerful than DPFA

Computing Probabilities The computation of the probability of a string is by dynamic programming : O(n 2 m) Backward and Forward algorithm (popularly used in Hidden Markov Models) If we want the most probable derivation to define the probability of a string, then we can use the Viterbi algorithm

Learning Paradigm for DPFA Given a class of stochastic languages or distributions C over Σ*, an algorithm A Probably Approximately Correctly (PAC) learns C if there is a polynomial q such that for all c in C, all > 0 and > 0, A is given a sample S n and produces a hypothesis G n, such that Pr[D(c||G n ) > ] q(1/,1/, |c|), where |c| is some measure of the complexity of the target. We say is the confidence parameter and is the error parameter.

PAC Learning for DPFA

Distance measure Two distributions over Σ* : D and D’ Kullback Leibler divergence (or relative entropy) between D and D’ : Σ w ∈ Σ* Pr D (W) * [log (Pr D (w) / Pr D’ (w))]

References 1.Clark, Alexander, and Franck Thollard. "PAC-learnability of probabilistic deterministic finite state automata." The Journal of Machine Learning Research 5 (2004): De la Higuera, Colin. Grammatical inference: learning automata and grammars. Cambridge University Press, Probabilistic Finite State Machines. Franck Thollard. 4.Stoelinga, Mariëlle. "An introduction to probabilistic automata." Bulletin of the EATCS (2002): 2. 5.Vidal, Enrique, et al. "Probabilistic finite-state machines-part I." Pattern Analysis and Machine Intelligence, IEEE Transactions on 27.7 (2005):

Thank You!!