Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke

Introduction  Based on an article “PDA translates Speech” by Kimberley Patch[1].  Combined effort of researchers from CMU, Cepstral, LLC, Multimodal Technologies Inc. and Mobile Technologies Inc.  What is the Aim? Two-way translation of medical information from English to Arabic and Arabic to English.  System Used: iPaq handheld computer

System  iPaq handheld computer  64 MB memory  Requirements Two recognizers Translators Synthesizers

Different Phases  Automatic Speech Recognition (ASR)  Speech Translation  Speech Synthesis

Automatic Speech Recognition  ASR-Technology that recognizes and executes voice commands  Steps in ASR Feature Extraction Acoustic modeling Language modeling Pattern Classification Utterance verification Decision

Speech Recognition Process[2] Feature Extraction Pattern Classification Acoustic Modeling Language Modeling Utterance Verification Decision Functions of a speech recognizer

Feature Extraction  Features:- Attributes pertaining to a person that enable a speech recognizer to distinguish the phonemes in each word[3]. Energy:

Visual Display of Frequencies  Spectrogram. The energy levels are decoded to extract the features, which are stored in an feature vector for further processing[3].

Feature Extraction  Speech Signal ->Microphone->Analog signal.  Digitization of analog signal to store in the computer.  Digitization involves sampling (Common sampling rates…8000hz to 16,000hz).  Features are extracted from the digitized speech.  Results in feature vector (numerical measurements of speech attributes [3])  Speech recognizer uses the feature vectors to decode the digitized speech signal.

Acoustic Modeling  Numerical representation of sound (utterances of words in a language).  Comparison of speech features of digitized speech signal with the features of existing models.  Determination of sound is probabilistic by nature.  Hidden Markov Model (HMM) is a statistical technique which forms basis for the development of acoustic models.  HMMs give the statististical likelihood of particular sequence of words or phonemes[3]  HMMs are used in both speech training and speech recognition

HMMs Cont’d  Depend on the Markov Chain. (a sequence of random variables whose next values depend on the previous values[3] as represented below).

Other Speech Recognition Components  Pattern Classifier: The Pattern classification component groups the patterns generated by the acoustic modeling component. Speech patterns having similar speech features are grouped together.  The correctness of the words generated by the pattern classifier is measured by the utterance verification component.  What the Speechalator Prototype[4] uses… The prototype uses a HMM based recognizer, designed and developed by Multi-Modal Technologies Inc. The speech recognizer needs 1 MB of memory and the acoustic models occupy 3MB of memory.

Speech Translation

 What is Machine Translation (MT)? Translation of Speech from one language to another with the help of software.  Types of MT: Direct Translation (Word–to-word) Transfer Based Translation Interlingua Translation

Why MT is difficult  Ambiguity: Sentence and words have different meanings. Lexical Ambiguity, Structural Ambiguity, Semantically Ambiguous.  Structural Differences between Language  Idioms cannot be translated

Approaches in Machine Translation Analysis IL Synthesis Source Language Target Language Direct Translation Machine Translation Triangle or Vauqois Triangle Transfer

Differences between the three translation architectures:  Direct translation: Word-to-word translation  Transfer based: Requires the knowledge of both source and target language.  Suits for Bilingual Translation  Intermediate representations are language dependent  Parses the source language sentence, and applies transfer rules that map grammatical segments of the source and target language.

Differences between the three translation architectures cont’d..  Interlingual Transaltion. Generates a language independent representation called Interlingua (IL) for the meaning of sentences or segments of sentences in the source language. A text in source language can be converted into any target language. Hence suits for multilingual translation.

More on Machine Translation  Knowledge Based MT (KBMT): Completely analyze and understand the meaning of the source text [5]. Translate into target language text. Performance heavily relies on the amount of world knowledge present to analyze the source language. Knowledge represented in the form of frames. [Event: Murder is a: Crime]

Machine Translation Cont’d  Example Based MT (EBMT): Sentence are analyzed on the basis of similar example sentences analyzed previously. What Speechalator Prototype Uses?  Statistical based MT (SBMT) [5]: Uses Corpora that is analyzed previously. No linguistic information required. N-gram modeling used

Speech Synthesis

 Generation of human voice from a given text or phonetic description [6]. Text To Speech (TTS) systems.

Snapshot of Spechalator

Conclusions  Speechalator is an good achievement in both mobile technology and NLP.  Simple push-to-talk button interface.  Uses optimized Speech recognizers and speech synthesizers.  This architecture allows components to be placed both on-device and on a server.  Presently most of the components are ported to the device.  Performance: 80% accuracy Takes 2-3 seconds for translation Presently restricted to a domain…

Future Work  Increase accuracy of the device to deal with noisy environments.  Build more learning algorithms.  Multi-lingual speech recognizer.  To achieve Domain independence.

References 1.Kimberley Patch. PDA Translates Speech. Technology and Research News (TRN), 17/24 December, 2003. 2.Richard V. Cox, Lawrence R. Rabiner, Candace A. Kamm. Speech and Language Processing for next-millennium communication services. Proceedings of the IEEE, 88(8):1314-1337, Feb 2000. 3.http://www.isip.msstate.edu/projects/speech/ ASR Home page.http://www.isip.msstate.edu/projects/speech/ 4.Speechalator: Two-Way Speech-To-Speech Translation on a Consumer PDA, Eurospeech 2003 Geneva, Switzerland Pages:1-4. 5.Machine Translation: A survey of approaches. Joseph Seaseley. University of Michigan Ann Arbor. 6.Thierry Dutoit. A short introduction to Text-to-Speech Synthesis (TTS). http://tcts.fpms.ac.be/synthesis/introtts.htmlThierry Dutoit http://tcts.fpms.ac.be/synthesis/introtts.html

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Similar presentations

Presentation on theme: "Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Similar presentations

Presentation on theme: "Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke."— Presentation transcript:

Similar presentations

About project

Feedback