Presentation is loading. Please wait.

Presentation is loading. Please wait.

[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1.

Similar presentations


Presentation on theme: "[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1."— Presentation transcript:

1 [TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1

2 What is Phonology? 2

3 3

4 4

5 5

6 6

7 7

8 [kæt] Phonology: Orthography: cat Phonology explains regular sound patterns 8

9 What is Phonology? [kæt] Phonetics: Phonology: Orthography: cat Phonology explains regular sound patterns Not phonetics, which deals with acoustics 9

10 Q: What do phonologists do? A: They find sound patterns in sets of words! 10

11 A Phonological Exercise [tɔk] [tɔks] [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 11

12 A Phonological Exercise [tɔk] [tɔks] [tɔkt] Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 12

13 A Phonological Exercise [tɔk] [tɔks] [tɔkt] Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 13

14 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] 14 THANK

15 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Tenses Verbs [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] 15 THANK

16 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ 16

17 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæks] [kɹækt] [slæp] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ 17

18 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP [kɹæk] [kɹæks] [kɹækt] [slæp] [slæps] [slæpt] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /slæp/ /kɹæk/ Prediction! 18 THANK

19 A Model of Phonology tɔk s s tɔks Concatenate “talks” 19

20 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /bæt/ /koʊd/ /slæp/ /kɹæk/ 20 THANK

21 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /bæt/ /koʊd/ /slæp/ /kɹæk/ z instead of s ɪt instead of t 21 THANK

22 A Phonological Exercise [tɔk] [tɔks] [tɔkt] TALK THANK HACK 1P Pres. Sg. 3P Pres. Sg.Past Tense Past Part. Suffixes Stems [tɔkt] [θeɪŋk] [θeɪŋks] [θeɪŋkt] [hæk] [hæks] [hækt] CRACK SLAP CODE BAT EAT [kɹæks] [kɹækt] [slæp] [slæpt] [koʊdz] [koʊdɪd] [bæt] [bætɪd] [it] [eɪt] [itən] /Ø//Ø/ /s//s/ /t//t//t//t/ /tɔk/ /θeɪŋk/ /hæk/ /it/ /bæt/ /koʊd/ /slæp/ /kɹæk/ eɪt instead of i t ɪt 22

23 A Model of Phonology koʊds koʊd#s koʊdz Concatenate Phonology (stochastic) “codes” 23 Modeling word forms using latent underlying morphs and phonology. Cotterell et. al. TACL 2015

24 A Model of Phonology rizaignation rizaign#ation rεzɪgneɪʃn “resignation” Concatenate 24 Phonology (stochastic)

25 Generative Phonology A system that generates exactly those attested forms Primary research program in phonology since the 1950s Example: [rezɪɡneɪʃən] “resignation” and [rizainz] “resigns” 25

26 Why this matters Linguists hand engineer phonological grammars Linguistically Interesting: can we create an automated phonologist? Cognitively Interesting: can we model how babies learn phonology? “Engineeringly” Interesting: can we analyze and generate words we haven’t heard before? (i.e., matrix completion for large vocabularies) 26

27 A Probability Model Describes the generating process of the observed surface words: – We model the morpheme M (a) ∈ M as an IID sample from a probability distribution M φ (m). – We model the surface form S(u) as a sample from a conditional distribution S θ (s | u) 27

28 The Generative Story 28 The process of generating a surface word: – Sample the parameters φ and θ from priors. – For each abstract morpheme a ∈ A, Sample the morph M(a) ∼ M φ. – Whenever a new abstract word =a 1,a 2 ··· must be pronounced for the first time, construct its underlying form u by concatenating the morphs M(a 1 ),M(a 2 ) ···, and sample the surface word S(u) ∼ S θ (· | u). – Reuse this S(u) in future.

29 Why Probability? A language’s morphology and phonology are deterministic Advantages: – Soft models admit efficient learning and inference – Quantification of irregularity (“sing” and “sang”) Our use is orthogonal to phonologists’ use of probability, e.g., to explain gradient phenomena 29

30 Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s 30

31 Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r COPY 31

32 Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r i i COPY 32

33 Upper Left Context Lower Left Context Upper Right Context Phonology as an Edit Process r r i i z z a a i i g g n n s s r r i i COPY z z 33

34 Upper Left Context Lower Left Context i i i i Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z COPY a a 34

35 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i COPY 35

36 i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 36

37 i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ COPY n n 37

38 i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ n n SUB z z 38

39 i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ n n SUB z z 39

40 i i i i Lower Left Context Upper Left Context Upper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i ɛ ɛ COPY n n 40

41 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 41

42 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 42 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B).03... INS(A).02 INS(B).01...

43 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 43 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B).03... INS(A).02 INS(B).01...

44 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 44 ActionProb DEL.75 COPY.01 SUB(A).05 SUB(B).03... INS(A).02 INS(B).01... Feature Function Weights

45 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 45 Feature Function Weights Features

46 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 46 Feature Function Weights Features Surface Form

47 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 47 Feature Function Weights Features Surface Form Transduction

48 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 48 Feature Function Weights Features Surface Form Transduction Upper String

49 Phonological Attributes Binary Attributes (+ and -) 49

50 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 50

51 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 51 Faithfulness Features EDIT(g, ɛ ) EDIT(+cons, ɛ ) EDIT(+voiced, ɛ )

52 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 52 Markedness Features BIGRAM(a, i) BIGRAM(-high, -low) BIGRAM(+back, -back)

53 i i i i Lower Left Context Upper Left ContextUpper Right Context Phonology as an Edit Process r r z z a a i i g g n n s s r r z z a a i i DEL ɛ ɛ 53 Markedness Features BIGRAM(a, i) BIGRAM(-high, -low) BIGRAM(+back, -back) Inspired by Optimality Theory: A popular Constraint Based Phonology Formalism

54 Outline A generative model for phonology – Generative Phonology – A Probabilistic Model – Stochastic Edit Process for Phonology Inference and Learning – A Hill Climbing Example – EM Algorithm with Finite State Operations Evaluation and Results 54

55 A Generative Model of Phonology rizˈajnz rizajgnz rizajgn z z 55

56 rizˈajnz A Generative Model of Phonology rizajgnz rizajgn z z 56

57 A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dˈæmz rizˈajnz rizajgnz rizajgn z z 57

58 A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dˈæmz rizˈajnz rizajgnz rizajgn z z 58

59 A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmnz dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 59

60 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 60

61 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 61

62 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 62

63 A Generative Model of Phonology A Directed Graphical Model of the lexicon dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 63

64 A Generative Model of Phonology A Directed Graphical Model of the lexicon dˌæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 64

65 Graphical models are flexible gəliːpt gəliːbt t t gə 65 liːb “geliebt” (German: loved) Matrix completion: each word built from one stem (row) + one suffix (column). WRONG Graphical model: a word can be built from any # of morphemes (parents). RIGHT

66 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 66

67 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 67 (Approximate) Inference MCMC – Bouchard-Côté (2007) Belief Propagation – Dreyer and Eisner (2009) Expectation Propagation – Cotterell and Eisner (2015) Dual Decomposition – Peng et al. (2015)

68 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 68 (Approximate) Inference MCMC – Bouchard-Côté (2007) Belief Propagation – Dreyer and Eisner (2009) Expectation Propagation – Cotterell and Eisner (2015) Dual Decomposition – Peng et al. (2015)

69 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 69

70 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 70 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 …

71 A Generative Model of Phonology A Directed Graphical Model of the lexicon rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 71 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … r in g u e ε s e h a Encoded as Weighted Finite- State Automaton

72 Discovering the Underlying Forms = Inference in a Graphical Model ???? rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 72

73 Discovering the Underlying Forms = Inference in a Graphical Model ???? rˌɛzɪgnˈeɪʃən dˈæmz rizˈajnz 73

74 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz 74

75 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Factor to Variable Messages 75

76 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Variable to Factor Messages 76

77 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz Encoded as Finite- State Machines r in g u e ε s e h a r in g u e ε e e s e h a r in g u e ε e e s e h a r in g u e ε e e s e h a r in g u e ε s e h a r in g u e ε s e h a 77

78 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz 78

79 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz r in g u e ε e e s e h a r in g u e ε e e s e h a 79

80 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmz r i n g u e ε e e s e h a r i n g u e ε e e s e h a r i n g u e ε e e s e h a Point-wise product (finite-state intersection) yields marginal belief 80

81 Belief Propagation (BP) in a Nutshell dˌæmnˈeɪʃən rizˈajnz rˌɛzɪgnˈeɪʃən dæmnz rizajgnz rizajgneɪʃən dæmneɪʃən eɪʃən z z rizajgn dæmn dˈæmnz Distribution Over Underlying Forms: UR Prob rizajgnz.95 rezajnz.02 rezigz.02 rezgz.0001 … … chomsky.000001 … r i n g u e ε e e s e h a r i n g u e ε e e s e h a r i n g u e ε e e s e h a 81

82 Training the Model Trained with EM (Dempster et al. 1977) E-Step: – Finite-State Belief Propagation M-Step: – Train stochastic phonology with gradient descent i i i i r r z z a a i i g g n n s s r r z z a a i i COP Y r in g u e ε e e s e h a r in g u e ε s e h a 82

83 Datasets Experiments on 7 languages from different families – English (CELEX) – Dutch (CELEX) – German (CELEX) – Maori (Kenstowicz) – Tangale (Kenstowicz) – Indonesian (Kenstowicz) – Catalan(Kenstowicz) 83

84 A Generative Model of Phonology ???? dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z How do you pronounce this word? 84

85 A Generative Model of Phonology dˈæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z How do you pronounce this word? 85

86 Evaluation Metrics: (Lower is Always Better) – 1-best error rate (did we get it right?) – cross-entropy (what probability did we give the right answer?) – expected edit-distance (how far away on average are we?) – Average each metric over many training-test splits Comparisons: – Lower Bound: Phonology as noisy concatenation – Upper Bound: Oracle URs from linguists 86

87 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 87 1-best error rate – Is the 1-best correct?

88 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 88 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer?

89 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 89 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer? Expected Edit Distance – How close am I on average?

90 Distribution Over Surface Form: UR Prob dæme ɪʃ ən.80 dæmne ɪʃ ən.10 dæmine ɪʃ ən..001 dæmiine ɪʃ ən.0001 … … chomsky.000001 … Exploring the Evaluation Metrics 90 1-best error rate – Is the 1-best correct? Cross Entropy – What is the probability of the correct answer? Expected Edit Distance – How close am I on average? Average over many training-test splits

91 German Results 91 Error Bars with bootstrap resampling!

92 CELEX Results 92

93 Phonological Exercise Results 93

94 Conclusion We presented a novel framework for computational phonology New datasets for research in the area A fair evaluation strategy for phonological learners 94

95 Fin Thank you for your attention! 95

96 A Generative Model of Phonology A Directed Graphical Model of the lexicon dˌæmnˈeɪʃən dæmneɪʃən rˌɛzɪgnˈeɪʃən dæmnz rizajgneɪʃən eɪʃən dæmn dˈæmz rizˈajnz rizajgnz rizajgn z z 96

97 Gold UR Recovery 97


Download ppt "[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1."

Similar presentations


Ads by Google