Download presentation
Presentation is loading. Please wait.
Published byJarred Lemmon Modified over 9 years ago
1
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011
2
Announcements Wednesday online GP meeting scheduling Seminar on Friday: Luke Zettlemoyer (CSE) Automatic grammar induction Treehouse Friday: Classifiers – Memory Lane
3
Roadmap Motivation: FST applications FST perspectives FSTs and Regular Relations FST Operations
4
FSTs Finite automaton that maps between two strings Automaton with two labels/arc input:output
5
FST Applications Tokenization Segmentation Morphological analysis Transliteration Parsing Translation Speech recognition Spoken language understanding….
6
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects
7
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects FST as generator: Outputs pairs of strings in languages
8
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects FST as generator: Outputs pairs of strings in languages FST as translator: Reads an input string and prints output string
9
Approaches to FSTs FST as recognizer: Takes pair of input:output strings Accepts if in language, o.w. rejects FST as generator: Outputs pairs of strings in languages FST as translator: Reads an input string and prints output string FST as set relator: Computes relations between sets
10
FSTs & Regular Relations FSAs: equivalent to regular languages
11
FSTs & Regular Relations FSAs: equivalent to regular languages FSTs: equivalent to regular relations Sets of pairs of strings
12
FSTs & Regular Relations FSAs: equivalent to regular languages FSTs: equivalent to regular relations Sets of pairs of strings Regular relations: For all (x,y) in Σ 1 x Σ 2, {(x,y)} is a regular relation The empty set is a regular relation If R 1,R 2 are regular relations, R 1 R 2, R 1 U R 2 and R 1 * are regular relations
13
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages
14
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:
15
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:R1 ={(a n b *,c n )} & R2={(a*b m,c m )}, intersection is {(a n b n,c n )} => not regular
16
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:R1 ={(a n b *,c n )} & R2={(a*b n,c n )}, intersection is {(a n b n,c n )} => not regular Difference
17
Regular Relation Closures By definition, Regular Relations are closed under: Concatenation: R 1 R 2 Union: R 1 U R 2 Kleene *: R 1 * Like regular languages Unlike regular languages, they are NOT closed under: Intersection:R1 ={(a n b *,c n )} & R2={(a*b n,c n )}, intersection is {(a n b n,c n )} => not regular Difference Complementation
18
Regular Relation Closures Regular relations are also closed under: Composition:
19
Regular Relation Closures Regular relations are also closed under: Composition: Inversion:
20
Regular Relation Closures Regular relations are also closed under: Composition: Inversion: Operations: Projection:
21
Regular Relation Closures Regular relations are also closed under: Composition: Inversion: Operations: Projection: Identity & cross-product of regular languages
22
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ
23
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ
24
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F
25
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transition relations between states: δsubset Q x (Σuε) x (ΓU ε) x Q
26
FST Formal Definition A Finite-State Transducer is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transition relations between states: δsubset Q x (Σuε) x (ΓU ε) x Q FSAs are a special case of FSTs
27
FST Operations Union:
28
FST Operations Union: Concatenation:
29
FST Operations Inversion: Switching input and output labels If T maps from I to O, T -1 maps from O to !
30
FST Operations Inversion: Switching input and output labels If T maps from I to O, T -1 maps from O to I Composition: If T 1 is a transducer from I 1 to O 2 and T 2 is a transducer from O 2 to O 3, then T 1 T 2 is a transducer from I 1 to O 3
31
FST Operations Inversion: Switching input and output labels If T maps from I to O, T -1 maps from O to I Composition: If T 1 is a transducer from I 1 to O 2 and T 2 is a transducer from O 2 to O 3, then T 1 T 2 is a transducer from I 1 to O 3
32
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}
33
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}
34
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}
35
FST Examples R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….} R(T) = {(a,x),(ab,xy),(abb,xyy),…}
36
FST Application Examples Case folding: He said he said
37
FST Application Examples Case folding: He said he said Tokenization: “He ran.” “ He ran. “
38
FST Application Examples Case folding: He said he said Tokenization: “He ran.” “ He ran. “ POS tagging: They can fish PRO VERB NOUN
39
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R Morphological generation: Fox s Foxes Morphological analysis: cats cat s
40
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R
41
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R Morphological generation: Fox s Foxes
42
FST Application Examples Pronunciation: B AH T EH R B AH DX EH R Morphological generation: Fox s Foxes Morphological analysis: cats cat s
43
FST Algorithms Recognition: Is a given string pair (x,y) accepted by the FST? (x,y) yes/no
44
FST Algorithms Recognition: Is a given string pair (x,y) accepted by the FST? (x,y) yes/no Composition: Given a pair of transducers T1 and T2, create a new transducer T1 T2.
45
FST Algorithms Recognition: Is a given string pair (x,y) accepted by the FST? (x,y) yes/no Composition: Given a pair of transducers T1 and T2, create a new transducer T1 T2. Transduction: Given an input string and an FST, compute the output string. x y
46
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q
47
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q Initial state probabilities: Q R +
48
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q Initial state probabilities: Q R + Transition probabilities: δ R +
49
WFST Definition A Probabilistic Finite-State Automaton is a 7-tuple: A finite set of states: Q A finite set of input symbols: Σ A finite set of output symbols: Γ A finite set of initial states: I A finite set of final states: F A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q Initial state probabilities: Q R + Transition probabilities: δ R + Final state probabilities: Q R +
50
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications
51
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications Closed under union, concatenation, Kleene*, inversion, composition Project to FSAs
52
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications Closed under union, concatenation, Kleene*, inversion, composition Project to FSAs Not closed under intersection, complementation, difference
53
Summary FSTs Equivalent to regular relations Transduce strings to strings Useful for range of applications Closed under union, concatenation, Kleene*, inversion, composition Project to FSAs Not closed under intersection, complementation, difference Algorithms: recognition, composition, transduction
54
Morphology and FSTs
55
Roadmap Motivation: Representing words A little (mostly English) Morphology Stemming FSTs & Morphology FSTs & Phonology
56
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports
57
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports Many possible surface forms: Televised, television, televise,.. Sports, sport, sporting,…
58
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports Many possible surface forms: Televised, television, televise,.. Sports, sport, sporting,… How can we match?
59
Surface Variation & Morphology Searching (a la Google) for documents about: Televised sports Many possible surface forms: Televised, television, televise,.. Sports, sport, sporting,… How can we match? Convert surface forms to common base form Stemming or morphological analysis
60
The Lexicon Goal: Represent all the words in a language Approach?
61
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words?
62
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words? Doable for English Typical for ASR (Automatic Speech Recognition) English is morphologically relatively impoverished
63
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words? Doable for English Typical for ASR (Automatic Speech Recognition) English is morphologically relatively impoverished Other languages?
64
The Lexicon Goal: Represent all the words in a language Approach? Enumerate all words? Doable for English Typical for ASR (Automatic Speech Recognition) English is morphologically relatively impoverished Other languages? Wildly impractical Turkish: 40,000 forms/verb; uygarlas¸tıramadıklarımızdanmıs¸sınızcasına “(behaving) as if you are among those whom we could not civilize”
65
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes
66
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language.
67
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix
68
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible
69
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking
70
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking Infix: e.g., hingi humingi (Tagalog)
71
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. Stem: the morpheme that forms the central meaning unit in a word Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking Infix: e.g., hingi humingi (Tagalog) Circumfix: e.g., sagen gesagt (German)
72
Two Perspectives Stemming: writing
73
Two Perspectives Stemming: writing write (or writ) Beijing
74
Two Perspectives Stemming: writing write (or writ) Beijing Beije Morphological Analysis:
75
Two Perspectives Stemming: writing write (or writ) Beijing Beije Morphological Analysis: writing write+V+prog
76
Two Perspectives Stemming: writing write (or writ) Beijing Beije Morphological Analysis: writing write+V+prog cats cat + N + pl writes write+V+3rdpers+Sg
77
Ambiguity in Morphology Alternative analyses: Flies
78
Ambiguity in Morphology Alternative analyses: Flies fly+N+Pl Flies fly+V+3rdpers+Sg Saw
79
Ambiguity in Morphology Alternative analyses: Flies fly+N+Pl Flies fly+V+3rdpers+Sg Saw see+V+past Saw
80
Ambiguity in Morphology Alternative analyses: Flies fly+N+Pl Flies fly+V+3rdpers+Sg Saw see+V+past Saw saw+N
81
Multi-linguality in Morphology Morphologically impoverished languages E.g. English
82
Multi-linguality in Morphology Morphologically impoverished languages E.g. English Isolating languages E.g., Chinese
83
Multi-linguality in Morphology Morphologically impoverished languages E.g. English Isolating languages E.g., Chinese Morphologically rich languages: E.g. Turkish
84
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped
85
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped Derivation: Stem + gram. morphone new class E.g. Walk + er walker (N)
86
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped Derivation: Stem + gram. morphone new class E.g. Walk + er walker (N) Compounding: multiple stems new word E.g. doghouse, catwalk, …
87
Combining Morphemes Inflection: Stem + gram. morpheme same class E.g.: help + ed helped Derivation: Stem + gram. morphone new class E.g. Walk + er walker (N) Compounding: multiple stems new word E.g. doghouse, catwalk, … Clitics: stem+clitic I + ll I’ll; he + is he’s
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.