Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1.

Slides:

Advertisements

Similar presentations

Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Vassilis Pitsikalis 1, Stavros Theodorakis 1,

Advertisements

Pronunciation Modeling Lecture 11 Spoken Language Processing Prof. Andrew Rosenberg.

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

“Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.

Linguist Module in Sphinx-4 By Sonthi Dusitpirom.

Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.

Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Progress Report Reihaneh Rabbany Presented for NLP Group Computing Science Department University of Alberta April 2009.

Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.

Docsoft:AV Automatic Closed Captioning and Transcribing Appliance July 9 th, 2007.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

University of Alberta Letter-to-phoneme conversion Sittichai Jiampojamarn CMPUT 500 / HUCO 612 September 26, 2007.

ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004.

Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard.

12.0 Computer-Assisted Language Learning (CALL) References: 1.“An Overview of Spoken Language Technology for Education”, Speech Communications, 51, pp ,

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Introduction to Automatic Speech Recognition

Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

MOOC on M4D 2013 S PEECH T ECHNOLOGY FOR M OBILE P HONES Rajesh Hegde Indian Institute of Technology Kanpur Commonwealth of Learning Vancouver.

PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.

Transliteration Transliteration CS 626 course seminar by Purva Joshi Mugdha Bapat Aditya Joshi Manasi Bapat

Letters and Sounds John Cross CE Primary.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.

1 Chapter 7 ~~~~~ ReadingAssessment. 2 Early Literacy Assessment Oral Language Oral Language Assess receptive and expressive vocabulary Assess receptive.

Korea Maritime and Ocean University NLP Jung Tae LEE

READING AND WRITING SOFTWARE Understanding the Tools.

LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.

OPTIMAL TEXT SELECTION ALGORITHM ASR Project Meetings Dt: 08 June Rohit Kumar - LTRC, IIIT Hyderabad.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Any system of formalized symbols, signs, sounds, gestures, or the like used or conceived as a means of communicating thought and emotion.

Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.

22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

L ETTER TO P HONEME A LIGNMENT Reihaneh Rabbany Shahin Jabbari.

© 2013 by Larson Technical Services

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Chien Shing Chen Author: Wei-Hao.

Phonics Instruction by Chuck Branch. Phonics Instruction While the National Reading Panel found it essential that a planned sequence be taught explicitly,

Course Projects Speech Processing

Speech Sounds What speech sounds do humans make and how do we make them? Kuiper and Allan Chapter 4.

Course Name: Speech Recognition Course Number: Instructor: Hossein Sameti Department of Computer Engineering Room 706 Phone:

Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.

Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers.

East Harling Primary School Letters and Sounds What is phonics? Phonics is the back-to-basics method of reading that teaches children to recognise the.

Being a Reader at St Leonard's

G. Anushiya Rachel Project Officer

Linguistic knowledge for Speech recognition

Online Multiscale Dynamic Topic Models

Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky

ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE

Hidden Markov Models (HMM)

Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

Voice To Text Conversion

Grammar Dictionary 3 Column Notes.

Rohit Kumar *, Amit Kataria, Sanjeev Sofat

Listen Attend and Spell – a brief introduction

Presentation transcript:

Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1

Text to Speech Problem Conversion of Text to Speech: TTS ◦ Automated Telecom Services ◦ by Phone ◦ Banking Systems ◦ Handicapped People 2

Pronunciation Pronunciation of the words  Dictionary Words  Non-Dictionary Words Phonetic analysis Dictionary lookup? Language is alive, new words add Proper Nouns Machine Learning  higher accuracy  L 2 P alignment is needed 3

4 Problem Letter to Phoneme Alignment ◦ Letter : c a k e ◦ Phoneme : k ei k  4 L2P Automatic Speech Recognition & Spelling Correction

5 It's not Trivial! why? No Consistency ◦ City  / s / ◦ Cake  / k / ◦ Kid  / k / No Transparency ◦ K i d (3)  / k i d / (3) ◦ S i x (3)  / s i k s / (4) ‏ ◦ Q u e u e (5)  / k j u: / (3) ‏ ◦ A x e (3)  / a k s / (3) ‏ 5

Framework 6 BrickbrIk Brighteningbr2tHIN BritishbrItIS BronxbrQNks BuglebjugP Buoyb4 b|r|i|ck|b|r|I|k| b|r|ig|ht|en|i|ng|b|r|2|t|H|I|N| b|r|i|t|i|sh|b|r|I|t|I|S| b|r|o|n|x|b|r|Q|N|ks| b|u|g|le|b|ju|g|P| bu|oy|b|4|

Evaluation No Aligned Dictionary Unsupervised Learning Previously aligner was tied with a generator Evaluation on percentage of correctly predicted phonemes and words 7

Model of our problem 8 B | r | i | t | i | sh | B | r | I | t | I | S |

Static Model, Structure Independent sub alignments 9 l1l1 l1l1 l2l2 l2l2 p1p1 p1p1 p2p2 p2p2 a1a1 l3l3 l3l3 l4l4 l4l4 p3p3 p3p3 p4p4 p4p4 a2a2 l n-1 lnln lnln p m-1 pmpm pmpm akak

Static Model, Learning EM ◦ Initialize Parameters ◦ Expectation Step:  Parameters  Alignments ◦ Maximization Step:  Alignments  Parameters 10

Result of Static Model 11 MethodLettersWords Static Model81.34%43.5%

Dynamic Model 12 Sequence of data Unrolled model for T=3 slices l1l1 l1l1 l2l2 l2l2 p1p1 p1p1 p2p2 p2p2 a1a1 l3l3 l3l3 l4l4 l4l4 p3p3 p3p3 p4p4 p4p4 a2a2 l5l5 l5l5 l6l6 l6l6 p5p5 p5p5 p6p6 p6p6 akak

Questions 13