Download presentation
Presentation is loading. Please wait.
Published byDennis Anthony Modified over 9 years ago
1
[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1
2
What is Phonology? 2
3
3
4
4
5
5
6
6
7
7
8
[kæt] Phonology: Orthography: cat Phonology explains regular sound patterns 8
9
What is Phonology? [kæt] Phonetics: Phonology: Orthography: cat Phonology explains regular sound patterns Not phonetics, which deals with acoustics 9
10
Q: What do phonologists do? A: They find sound patterns in sets of words! 10
11
A Phonological Exercise [tɔk] [tɔks] [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 11
12
A Phonological Exercise [tɔk] [tɔks] [tɔkt] Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 12
13
A Phonological Exercise [tɔk] [tɔks] [tɔkt] Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 13
14
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 14 THANK
15
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] 15 THANK
16
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ 16
17
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ 17
18
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæk] [kɹæks] [kɹækt] [slæp] [slæps] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ Prediction! 18 THANK
19
A Model of Phonology tɔk s s tɔks Concatenate “talks” 19
20
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /bæt/ /koʊd/ /slæp/ /kɹæk/ 20 THANK
21
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /bæt/ /koʊd/ /slæp/ /kɹæk/ z instead of s ɪt instead of t 21 THANK
22
A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT EAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] [it] [eɪt] [itən] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /it/ /bæt/ /koʊd/ /slæp/ /kɹæk/ eɪt instead of i t ɪt 22
23
A Model of Phonology koʊds koʊd#s koʊdz Concatenate Phonology (stochastic) “codes” 23 Modeling word forms using latent underlying morphs and phonology. Cotterell et. al. TACL 2015
24
A Model of Phonology rizaignation rizaign#ation rεzɪgneɪʃn “resignation” Concatenate 24 Phonology (stochastic)
25
Generative Phonology A system that generates exactly those attested forms Primary research program in phonology since the 1950s Example: [rezɪɡneɪʃən] “resignation” and [rizainz] “resigns” 25
26
Why this matters Linguists hand engineer phonological grammars Linguistically Interesting: can we create an automated phonologist? Cognitively Interesting: can we model how babies learn phonology? “Engineeringly” Interesting: can we analyze and generate words we haven’t heard before? (i.e., matrix completion for large vocabularies) 26
27
A Probability Model Describes the generating process of the observed surface words: – We model the morpheme M (a) ∈ M as an IID sample from a probability distribution M φ (m). – We model the surface form S(u) as a sample from a conditional distribution S θ (s | u) 27
28
The Generative Story 28 The process of generating a surface word: – Sample the parameters φ and θ from priors. – For each abstract morpheme a ∈ A, Sample the morph M(a) ∼ M φ. – Whenever a new abstract word =a 1,a 2 ··· must be pronounced for the first time, construct its underlying form u by concatenating the morphs M(a 1 ),M(a 2 ) ···, and sample the surface word S(u) ∼ S θ (· | u). – Reuse this S(u) in future.
29
Why Probability? A language’s morphology and phonology are deterministic Advantages: – Soft models admit efficient learning and inference – Quantification of irregularity (“sing” and “sang”) Our use is orthogonal to phonologists’ use of probability, e.g., to explain gradient phenomena 29
30
Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s 30
31
Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r COPY 31
32
Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r i i COPY 32
33
Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r i i COPY z z 33
34
Upper Left Context Lower Left Context i i i i Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z COPY a a 34
35
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i COPY 35
36
i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 36
37
i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ COPY n n 37
38
i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ n n SUB z z 38
39
i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ n n SUB z z 39
40
i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ COPY n n 40
41
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 41
42
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 42 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B).03... INS(A).02 INS(B).01...
43
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 43 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B).03... INS(A).02 INS(B).01...
44
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 44 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B).03... INS(A).02 INS(B).01... Feature Function Weights
45
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 45 Feature Function Weights Features
46
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 46 Feature Function Weights Features Surface Form
47
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 47 Feature Function Weights Features Surface Form Transduction
48
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 48 Feature Function Weights Features Surface Form Transduction Upper String
49
Phonological Attributes Binary Attributes (+ and -) 49
50
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 50
51
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 51 Faithfulness Features EDIT(g, ɛ ) EDIT(+cons, ɛ ) EDIT(+voiced, ɛ )
52
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 52 Markedness Features BIGRAM(a, i) BIGRAM(-high, -low) BIGRAM(+back, -back)
53
i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 53 Markedness Features BIGRAM(a, i) BIGRAM(-high, -low) BIGRAM(+back, -back) Inspired by Optimality Theory: A popular Constraint Based Phonology Formalism
54
Outline A generative model for phonology – Generative Phonology – A Probabilistic Model – Stochastic Edit Process for Phonology Inference and Learning – A Hill Climbing Example – EM Algorithm with Finite State Operations Evaluation and Results 54
55
A Generative Model of Phonology rizˈajnz rizajgnz rizajgn z z 55
56
rizˈajnz A Generative Model of Phonology rizajgnz rizajgn z z 56
57
A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dˈæmz rizˈajnz rizajgnz rizajgn z z 57
58
A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dˈæmz rizˈajnz rizajgnz rizajgn z z 58
59
A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 59
60
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 60
61
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 61
62
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 62
63
A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 63
64
A Generative Model of Phonology A Directed Graphical Model of the lexicon dˌæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 64
65
Graphical models are flexible gəliːpt gəliːbt t t gə 65 liːb “geliebt” (German: loved) Matrix completion: each word built from one stem (row) + one suffix (column). WRONG Graphical model: a word can be built from any # of morphemes (parents). RIGHT
66
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 66
67
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 67 (Approximate) Inference MCMC – Bouchard-Côté (2007) Belief Propagation – Dreyer and Eisner (2009) Expectation Propagation – Cotterell and Eisner (2015) Dual Decomposition – Peng et al. (2015)
68
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 68 (Approximate) Inference MCMC – Bouchard-Côté (2007) Belief Propagation – Dreyer and Eisner (2009) Expectation Propagation – Cotterell and Eisner (2015) Dual Decomposition – Peng et al. (2015)
69
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 69
70
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 70 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 …
71
A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 71 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … r in g u e ε s e h a Encoded as Weighted Finite- State Automaton
72
Discovering the Underlying Forms = Inference in a Graphical Model ???? rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 72
73
Discovering the Underlying Forms = Inference in a Graphical Model ???? rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 73
74
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz 74
75
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Factor to Variable Messages 75
76
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Variable to Factor Messages 76
77
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Encoded as Finite- State Machines r in g u e ε s e h a r in g u e ε e e s e h a r in g u e ε e e s e h a r in g u e ε e e s e h a r in g u e ε s e h a r in g u e ε s e h a 77
78
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz 78
79
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz r in g u e ε e e s e h a r in g u e ε e e s e h a 79
80
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz r i n g u e ε e e s e h a r i n g u e ε e e s e h a r i n g u e ε e e s e h a Point-wise product (finite-state intersection) yields marginal belief 80
81
Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmnz Distribution Over Underlying Forms: UR Prob rizajgnz.95 rezajnz.02 rezigz.02 rezgz.0001 … … chomsky.000001 … r i n g u e ε e e s e h a r i n g u e ε e e s e h a r i n g u e ε e e s e h a 81
82
Training the Model Trained with EM (Dempster et al. 1977) E-Step: – Finite-State Belief Propagation M-Step: – Train stochastic phonology with gradient descent i i i i r r z z a a i i g g n n s s r r z z a a i i COP Y r in g u e ε e e s e h a r in g u e ε s e h a 82
83
Datasets Experiments on 7 languages from different families – English (CELEX) – Dutch (CELEX) – German (CELEX) – Maori (Kenstowicz) – Tangale (Kenstowicz) – Indonesian (Kenstowicz) – Catalan(Kenstowicz) 83
84
A Generative Model of Phonology ???? dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z How do you pronounce this word? 84
85
A Generative Model of Phonology dˈæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z How do you pronounce this word? 85
86
Evaluation Metrics: (Lower is Always Better) – 1-best error rate (did we get it right?) – cross-entropy (what probability did we give the right answer?) – expected edit-distance (how far away on average are we?) – Average each metric over many training-test splits Comparisons: – Lower Bound: Phonology as noisy concatenation – Upper Bound: Oracle URs from linguists 86
87
Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 87 1-best error rate – Is the 1-best correct?
88
Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 88 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer?
89
Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 89 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer? Expected Edit Distance – How close am I on average?
90
Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 90 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer? Expected Edit Distance – How close am I on average? Average over many training-test splits
91
German Results 91 Error Bars with bootstrap resampling!
92
CELEX Results 92
93
Phonological Exercise Results 93
94
Conclusion We presented a novel framework for computational phonology New datasets for research in the area A fair evaluation strategy for phonological learners 94
95
Fin Thank you for your attention! 95
96
A Generative Model of Phonology A Directed Graphical Model of the lexicon dˌæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 96
97
Gold UR Recovery 97
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.