Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1
Text to Speech Problem Conversion of Text to Speech: TTS ◦ Automated Telecom Services ◦ by Phone ◦ Banking Systems ◦ Handicapped People 2
Pronunciation Pronunciation of the words Dictionary Words Non-Dictionary Words Phonetic analysis Dictionary lookup? Language is alive, new words add Proper Nouns Machine Learning higher accuracy L 2 P alignment is needed 3
4 Problem Letter to Phoneme Alignment ◦ Letter : c a k e ◦ Phoneme : k ei k 4 L2P Automatic Speech Recognition & Spelling Correction
5 It's not Trivial! why? No Consistency ◦ City / s / ◦ Cake / k / ◦ Kid / k / No Transparency ◦ K i d (3) / k i d / (3) ◦ S i x (3) / s i k s / (4) ◦ Q u e u e (5) / k j u: / (3) ◦ A x e (3) / a k s / (3) 5
Framework 6 BrickbrIk Brighteningbr2tHIN BritishbrItIS BronxbrQNks BuglebjugP Buoyb4 b|r|i|ck|b|r|I|k| b|r|ig|ht|en|i|ng|b|r|2|t|H|I|N| b|r|i|t|i|sh|b|r|I|t|I|S| b|r|o|n|x|b|r|Q|N|ks| b|u|g|le|b|ju|g|P| bu|oy|b|4|
Evaluation No Aligned Dictionary Unsupervised Learning Previously aligner was tied with a generator Evaluation on percentage of correctly predicted phonemes and words 7
Model of our problem 8 B | r | i | t | i | sh | B | r | I | t | I | S |
Static Model, Structure Independent sub alignments 9 l1l1 l1l1 l2l2 l2l2 p1p1 p1p1 p2p2 p2p2 a1a1 l3l3 l3l3 l4l4 l4l4 p3p3 p3p3 p4p4 p4p4 a2a2 l n-1 lnln lnln p m-1 pmpm pmpm akak
Static Model, Learning EM ◦ Initialize Parameters ◦ Expectation Step: Parameters Alignments ◦ Maximization Step: Alignments Parameters 10
Result of Static Model 11 MethodLettersWords Static Model81.34%43.5%
Dynamic Model 12 Sequence of data Unrolled model for T=3 slices l1l1 l1l1 l2l2 l2l2 p1p1 p1p1 p2p2 p2p2 a1a1 l3l3 l3l3 l4l4 l4l4 p3p3 p3p3 p4p4 p4p4 a2a2 l5l5 l5l5 l6l6 l6l6 p5p5 p5p5 p6p6 p6p6 akak
Questions 13