[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1.

Slides:



Advertisements
Similar presentations
Lazy Paired Hyper-Parameter Tuning
Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Topic models Source: Topic models, David Blei, MLSS 09.
Preliminary Results (Synthetic Data) We generate a random 4-ary MRF and we sample training and test data. We forget the structure and start learning with.
Jose-Luis Blanco, Javier González, Juan-Antonio Fernández-Madrigal University of Málaga (Spain) Dpt. of System Engineering and Automation May Pasadena,
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Supervised Learning Recap
Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.
Artificial Spiking Neural Networks
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.
Autosegmental Phonology
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
A Non-Parametric Bayesian Approach to Inflectional Morphology Jason Eisner Johns Hopkins University This is joint work with Markus Dreyer. Most of the.
Learning From Data Chichang Jou Tamkang University.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
1 Advanced Smoothing, Evaluation of Language Models.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Computational Investigation of Palestinian Arabic Dialects
Evolution of Universal Grammar Pia Göser Universität Tübingen Seminar: Sprachevolution Dozent: Prof. Jäger
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
The Linguistics of Second Language Acquisition
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Universit at Dortmund, LS VIII
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Dual Decomposition Inference for Graphical Models over Strings
Randomized Algorithms for Bayesian Hierarchical Clustering
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
MURI Annual Review, Vanderbilt, Sep 8 th, 2009 Heterogeneous Sensor Webs for Automated Target Recognition and Tracking in Urban Terrain (W911NF )
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Biointelligence Laboratory, Seoul National University
Lecture 2: Statistical learning primer for biologists
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Approximation-aware Dependency Parsing by Belief Propagation September 19, 2015 TACL at EMNLP 1 Matt Gormley Mark Dredze Jason Eisner.
Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Probabilistic Automaton Ashish Srivastava Harshil Pathak.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
1 ASRU, Dec Graphical Models Over String-Valued Random Variables Jason Eisner Ryan Cotterell Nanyun (Violet) Peng Nick Andrews Markus Dreyer Michael.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Bridging the gap between L2 speech perception research and phonological theory Paola Escudero & Paul Boersma (March 2002) Presented by Paola Escudero.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Morphological Smoothing and Extrapolation of Word Embeddings
Consistent and Efficient Reconstruction of Latent Tree Models
Dual Decomposition Inference for Graphical Models over Strings
Learning Recommender Systems with Adaptive Regularization
Bucket Renormalization for Approximate Inference
Morphological Segmentation Inside-Out
INF 5860 Machine learning for image classification
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Expectation-Maximization & Belief Propagation
A Joint Model of Orthography and Morphological Segmentation
Stochastic Methods.
Presentation transcript:

[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1

What is Phonology? 2

3

4

5

6

7

[kæt] Phonology: Orthography: cat Phonology explains regular sound patterns 8

What is Phonology? [kæt] Phonetics: Phonology: Orthography: cat Phonology explains regular sound patterns Not phonetics, which deals with acoustics 9

Q: What do phonologists do? A: They find sound patterns in sets of words! 10

A Phonological Exercise [tɔk] [tɔks] [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 11

A Phonological Exercise [tɔk] [tɔks] [tɔkt] Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 12

A Phonological Exercise [tɔk] [tɔks] [tɔkt] Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 13

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 14 THANK

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] 15 THANK

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ 16

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ 17

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæk] [kɹæks] [kɹækt] [slæp] [slæps] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ Prediction! 18 THANK

A Model of Phonology tɔk s s tɔks Concatenate “talks” 19

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /bæt/ /koʊd/ /slæp/ /kɹæk/ 20 THANK

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /bæt/ /koʊd/ /slæp/ /kɹæk/ z instead of s ɪt instead of t 21 THANK

A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT EAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] [it] [eɪt] [itən] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /it/ /bæt/ /koʊd/ /slæp/ /kɹæk/ eɪt instead of i t ɪt 22

A Model of Phonology koʊds koʊd#s koʊdz Concatenate Phonology (stochastic) “codes” 23 Modeling word forms using latent underlying morphs and phonology. Cotterell et. al. TACL 2015

A Model of Phonology rizaignation rizaign#ation rεzɪgneɪʃn “resignation” Concatenate 24 Phonology (stochastic)

Generative Phonology A system that generates exactly those attested forms Primary research program in phonology since the 1950s Example: [rezɪɡneɪʃən] “resignation” and [rizainz] “resigns” 25

Why this matters Linguists hand engineer phonological grammars Linguistically Interesting: can we create an automated phonologist? Cognitively Interesting: can we model how babies learn phonology? “Engineeringly” Interesting: can we analyze and generate words we haven’t heard before? (i.e., matrix completion for large vocabularies) 26

A Probability Model Describes the generating process of the observed surface words: – We model the morpheme M (a) ∈ M as an IID sample from a probability distribution M φ (m). – We model the surface form S(u) as a sample from a conditional distribution S θ (s | u) 27

The Generative Story 28 The process of generating a surface word: – Sample the parameters φ and θ from priors. – For each abstract morpheme a ∈ A, Sample the morph M(a) ∼ M φ. – Whenever a new abstract word =a 1,a 2 ··· must be pronounced for the first time, construct its underlying form u by concatenating the morphs M(a 1 ),M(a 2 ) ···, and sample the surface word S(u) ∼ S θ (· | u). – Reuse this S(u) in future.

Why Probability? A language’s morphology and phonology are deterministic Advantages: – Soft models admit efficient learning and inference – Quantification of irregularity (“sing” and “sang”) Our use is orthogonal to phonologists’ use of probability, e.g., to explain gradient phenomena 29

Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s 30

Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r COPY 31

Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r i i COPY 32

Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r i i COPY z z 33

Upper Left Context Lower Left Context i i i i Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z COPY a a 34

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i COPY 35

i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 36

i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ COPY n n 37

i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ n n SUB z z 38

i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ n n SUB z z 39

i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ COPY n n 40

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 41

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 42 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B) INS(A).02 INS(B).01...

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 43 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B) INS(A).02 INS(B).01...

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 44 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B) INS(A).02 INS(B) Feature Function Weights

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 45 Feature Function Weights Features

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 46 Feature Function Weights Features Surface Form

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 47 Feature Function Weights Features Surface Form Transduction

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 48 Feature Function Weights Features Surface Form Transduction Upper String

Phonological Attributes Binary Attributes (+ and -) 49

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 50

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 51 Faithfulness Features EDIT(g, ɛ ) EDIT(+cons, ɛ ) EDIT(+voiced, ɛ )

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 52 Markedness Features BIGRAM(a, i) BIGRAM(-high, -low) BIGRAM(+back, -back)

i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 53 Markedness Features BIGRAM(a, i) BIGRAM(-high, -low) BIGRAM(+back, -back) Inspired by Optimality Theory: A popular Constraint Based Phonology Formalism

Outline A generative model for phonology – Generative Phonology – A Probabilistic Model – Stochastic Edit Process for Phonology Inference and Learning – A Hill Climbing Example – EM Algorithm with Finite State Operations Evaluation and Results 54

A Generative Model of Phonology rizˈajnz rizajgnz rizajgn z z 55

rizˈajnz A Generative Model of Phonology rizajgnz rizajgn z z 56

A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dˈæmz rizˈajnz rizajgnz rizajgn z z 57

A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dˈæmz rizˈajnz rizajgnz rizajgn z z 58

A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 59

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 60

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 61

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 62

A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 63

A Generative Model of Phonology A Directed Graphical Model of the lexicon dˌæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 64

Graphical models are flexible gəliːpt gəliːbt t t gə 65 liːb “geliebt” (German: loved) Matrix completion: each word built from one stem (row) + one suffix (column). WRONG Graphical model: a word can be built from any # of morphemes (parents). RIGHT

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 66

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 67 (Approximate) Inference MCMC – Bouchard-Côté (2007) Belief Propagation – Dreyer and Eisner (2009) Expectation Propagation – Cotterell and Eisner (2015) Dual Decomposition – Peng et al. (2015)

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 68 (Approximate) Inference MCMC – Bouchard-Côté (2007) Belief Propagation – Dreyer and Eisner (2009) Expectation Propagation – Cotterell and Eisner (2015) Dual Decomposition – Peng et al. (2015)

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 69

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 70 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky …

A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 71 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky … r in g u e ε s e h a Encoded as Weighted Finite- State Automaton

Discovering the Underlying Forms = Inference in a Graphical Model ???? rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 72

Discovering the Underlying Forms = Inference in a Graphical Model ???? rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 73

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz 74

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Factor to Variable Messages 75

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Variable to Factor Messages 76

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Encoded as Finite- State Machines r in g u e ε s e h a r in g u e ε e e s e h a r in g u e ε e e s e h a r in g u e ε e e s e h a r in g u e ε s e h a r in g u e ε s e h a 77

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz 78

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz r in g u e ε e e s e h a r in g u e ε e e s e h a 79

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz r i n g u e ε e e s e h a r i n g u e ε e e s e h a r i n g u e ε e e s e h a Point-wise product (finite-state intersection) yields marginal belief 80

Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmnz Distribution Over Underlying Forms: UR Prob rizajgnz.95 rezajnz.02 rezigz.02 rezgz.0001 … … chomsky … r i n g u e ε e e s e h a r i n g u e ε e e s e h a r i n g u e ε e e s e h a 81

Training the Model Trained with EM (Dempster et al. 1977) E-Step: – Finite-State Belief Propagation M-Step: – Train stochastic phonology with gradient descent i i i i r r z z a a i i g g n n s s r r z z a a i i COP Y r in g u e ε e e s e h a r in g u e ε s e h a 82

Datasets Experiments on 7 languages from different families – English (CELEX) – Dutch (CELEX) – German (CELEX) – Maori (Kenstowicz) – Tangale (Kenstowicz) – Indonesian (Kenstowicz) – Catalan(Kenstowicz) 83

A Generative Model of Phonology ???? dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z How do you pronounce this word? 84

A Generative Model of Phonology dˈæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z How do you pronounce this word? 85

Evaluation Metrics: (Lower is Always Better) – 1-best error rate (did we get it right?) – cross-entropy (what probability did we give the right answer?) – expected edit-distance (how far away on average are we?) – Average each metric over many training-test splits Comparisons: – Lower Bound: Phonology as noisy concatenation – Upper Bound: Oracle URs from linguists 86

Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky … Exploring the Evaluation Metrics 87 1-best error rate – Is the 1-best correct?

Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky … Exploring the Evaluation Metrics 88 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer?

Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky … Exploring the Evaluation Metrics 89 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer? Expected Edit Distance – How close am I on average?

Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky … Exploring the Evaluation Metrics 90 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer? Expected Edit Distance – How close am I on average? Average over many training-test splits

German Results 91 Error Bars with bootstrap resampling!

CELEX Results 92

Phonological Exercise Results 93

Conclusion We presented a novel framework for computational phonology New datasets for research in the area A fair evaluation strategy for phonological learners 94

Fin Thank you for your attention! 95

A Generative Model of Phonology A Directed Graphical Model of the lexicon dˌæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 96

Gold UR Recovery 97