Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Slides:



Advertisements
Similar presentations
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Advertisements

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
The Beatbox Voice-to-Drum Synthesizer A BSTRACT The Beatbox is a real time voice-to-drum synthesizer intended primarily for the entertainment of small.
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
A PRESENTATION BY SHAMALEE DESHPANDE
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Introduction to Automatic Speech Recognition
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
7-Speech Recognition Speech Recognition Concepts
1 Computational Linguistics Ling 200 Spring 2006.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Jacob Zurasky ECE5526 – Spring 2011
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Signal Processing I
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Performance Comparison of Speaker and Emotion Recognition
Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
G. Anushiya Rachel Project Officer
Automatic Speech Recognition
Speech Recognition
Approaches to Machine Translation
Speech Recognition UNIT -5.
Artificial Intelligence for Speech Recognition
Approaches to Machine Translation
Introduction to Machine Translation
Speech recognition, machine learning
Artificial Intelligence 2004 Speech & Natural Language Processing
Speech recognition, machine learning
Presentation transcript:

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke

Introduction  Based on an article “PDA translates Speech” by Kimberley Patch[1].  Combined effort of researchers from CMU, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc.  What is the Aim? Two-way translation of medical information from English to Arabic and Arabic to English.  System Used: iPaq handheld computer

System  iPaq handheld computer  64 MB memory  Requirements Two recognizers Translators Synthesizers

Different Phases  Automatic Speech Recognition (ASR)  Speech Translation  Speech Synthesis

Automatic Speech Recognition  ASR-Technology that recognizes and executes voice commands  Steps in ASR Feature Extraction Acoustic modeling Language modeling Pattern Classification Utterance verification Decision

Speech Recognition Process[2] Feature Extraction Pattern Classification Acoustic Modeling Language Modeling Utterance Verification Decision Functions of a speech recognizer

Feature Extraction  Features:- Attributes pertaining to a person that enable a speech recognizer to distinguish the phonemes in each word[3]. Energy:

Visual Display of Frequencies  Spectrogram. The energy levels are decoded to extract the features, which are stored in an feature vector for further processing[3].

Feature Extraction  Speech Signal ->Microphone->Analog signal.  Digitization of analog signal to store in the computer.  Digitization involves sampling (Common sampling rates…8000hz to 16,000hz).  Features are extracted from the digitized speech.  Results in feature vector (numerical measurements of speech attributes [3])  Speech recognizer uses the feature vectors to decode the digitized speech signal.

Acoustic Modeling  Numerical representation of sound (utterances of words in a language).  Comparison of speech features of digitized speech signal with the features of existing models.  Determination of sound is probabilistic by nature.  Hidden Markov Model (HMM) is a statistical technique which forms basis for the development of acoustic models.  HMMs give the statististical likelihood of particular sequence of words or phonemes[3]  HMMs are used in both speech training and speech recognition

HMMs Cont’d  Depend on the Markov Chain. (a sequence of random variables whose next values depend on the previous values[3] as represented below).

Other Speech Recognition Components  Pattern Classifier: The Pattern classification component groups the patterns generated by the acoustic modeling component. Speech patterns having similar speech features are grouped together.  The correctness of the words generated by the pattern classifier is measured by the utterance verification component.  What the Speechalator Prototype[4] uses… The prototype uses a HMM based recognizer, designed and developed by Multi-Modal Technologies Inc. The speech recognizer needs 1 MB of memory and the acoustic models occupy 3MB of memory.

Speech Translation

 What is Machine Translation (MT)? Translation of Speech from one language to another with the help of software.  Types of MT: Direct Translation (Word–to-word) Transfer Based Translation Interlingua Translation

Why MT is difficult  Ambiguity: Sentence and words have different meanings. Lexical Ambiguity, Structural Ambiguity, Semantically Ambiguous.  Structural Differences between Language  Idioms cannot be translated

Approaches in Machine Translation Analysis IL Synthesis Source Language Target Language Direct Translation Machine Translation Triangle or Vauqois Triangle Transfer

Differences between the three translation architectures:  Direct translation: Word-to-word translation  Transfer based: Requires the knowledge of both source and target language.  Suits for Bilingual Translation  Intermediate representations are language dependent  Parses the source language sentence, and applies transfer rules that map grammatical segments of the source and target language.

Differences between the three translation architectures cont’d..  Interlingual Transaltion. Generates a language independent representation called Interlingua (IL) for the meaning of sentences or segments of sentences in the source language. A text in source language can be converted into any target language. Hence suits for multilingual translation.

More on Machine Translation  Knowledge Based MT (KBMT): Completely analyze and understand the meaning of the source text [5]. Translate into target language text. Performance heavily relies on the amount of world knowledge present to analyze the source language. Knowledge represented in the form of frames. [Event: Murder is a: Crime]

Machine Translation Cont’d  Example Based MT (EBMT): Sentence are analyzed on the basis of similar example sentences analyzed previously. What Speechalator Prototype Uses?  Statistical based MT (SBMT) [5]: Uses Corpora that is analyzed previously. No linguistic information required. N-gram modeling used

Speech Synthesis

 Generation of human voice from a given text or phonetic description [6]. Text To Speech (TTS) systems.

Snapshot of Spechalator

Conclusions  Speechalator is an good achievement in both mobile technology and NLP.  Simple push-to-talk button interface.  Uses optimized Speech recognizers and speech synthesizers.  This architecture allows components to be placed both on-device and on a server.  Presently most of the components are ported to the device.  Performance: 80% accuracy Takes 2-3 seconds for translation Presently restricted to a domain…

Future Work  Increase accuracy of the device to deal with noisy environments.  Build more learning algorithms.  Multi-lingual speech recognizer.  To achieve Domain independence.

References 1.Kimberley Patch. PDA Translates Speech. Technology and Research News (TRN), 17/24 December, Richard V. Cox, Lawrence R. Rabiner, Candace A. Kamm. Speech and Language Processing for next-millennium communication services. Proceedings of the IEEE, 88(8): , Feb ASR Home page. 4.Speechalator: Two-Way Speech-To-Speech Translation on a Consumer PDA, Eurospeech 2003 Geneva, Switzerland Pages: Machine Translation: A survey of approaches. Joseph Seaseley. University of Michigan Ann Arbor. 6.Thierry Dutoit. A short introduction to Text-to-Speech Synthesis (TTS). Dutoit