Language to Language Translation- A Way to Homogeneous India... Team effort of:- Anasree Chatterjee & Diwa Arunashree Mentor:- Prof. K.T.Talele.

Slides:

Advertisements

Similar presentations

LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)

Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Information Retrieval in Practice

Application of HMMs: Speech recognition “Noisy channel” model of speech.

Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.

Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.

Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.

Why is ASR Hard? Natural speech is continuous

ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.

Overview of Search Engines

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

A Text-to-Speech Synthesis System

MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian.

Automatic Transcript Generation Helmer Strik A 2 RT Dept. of Language & Speech University of Nijmegen.

ISSUES IN SPEECH RECOGNITION Shraddha Sharma

Introduction to Automatic Speech Recognition

Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield

Speech synthesis Recording and sampling Speech recognition Apr. 5

04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

Speech Recognition Application

CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.

By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,

Jacob Zurasky ECE5526 – Spring 2011

Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Collaborator Revolutionizing the way you communicate and understand

© 2013 by Larson Technical Services

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

Speech Recognition Created By : Kanjariya Hardik G.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

Information Retrieval in Practice

A NONPARAMETRIC BAYESIAN APPROACH FOR

G. Anushiya Rachel Project Officer

Speech Recognition

Search Engine Architecture

Yes, I'm able to index audio files within Alfresco

Speech recognition in mobile environment Robust ASR with dual Mic

Text-To-Speech System for English

Artificial Intelligence for Speech Recognition

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

Command Me Specification

Rohit Kumar *, Amit Kataria, Sanjeev Sofat

Speech recognition, machine learning

What is a consonant cluster?

Indian Institute of Technology Bombay

Artificial Intelligence 2004 Speech & Natural Language Processing

Speech recognition, machine learning

Presentation transcript:

Language to Language Translation- A Way to Homogeneous India... Team effort of:- Anasree Chatterjee & Diwa Arunashree Mentor:- Prof. K.T.Talele

What is language? Need for proper communication Hazards of miscommunication Hence Need for our system Key users of our system Why the system ??

Our system overview.... Any out of 8 languages English Hindi Enjoy the words !

Speech to Text Text to Text Text to Speech Input Speech in English or Hindi Output Speech in 8 Different Language Speak in one language & listen in another language in just 3 steps ! English or Hindi speech to English or Hindi text e.g. English English text to text of selected output language e.g. Bengali Bengali text to Bengali speech

Voice Input Analog to Digital Feature Extraction Acoustic Model Acoustic Model Language Model Language Model Phonetic Lexicon Phonetic Lexicon Speech Engine/Decoder Speech Engine/Decoder Store Word in a File Speech to Text Architecture !

1.Voice Input 2.Analog to Digital 3.Feature Extracting Noise Filtering

Speech to Text Architecture ! Voice Input Analog to Digital Feature Extraction Acoustic Model Acoustic Model Language Model Language Model Phonetic Lexicon Phonetic Lexicon Speech Engine Speech Engine Store Word in a File Store Word in a File Acoustic Model Acoustic Model

Audio Recording Text Transcription Text Transcription Software Software ACOUSTIC MODEL Statistical Representations of the Sounds that make up each Word Statistical Representations of the Sounds that make up each Word Hidden Markov Model (HMM) Tool CMU Sphinx Train Tool CMU Sphinx Train Uses Components of ASR contd....

Speech to Text Architecture ! Voice Input Analog to Digital Feature Extraction Acoustic Model Acoustic Model Language Model Language Model Phonetic Lexicon Phonetic Lexicon Speech Engine Speech Engine Store Word in a File Store Word in a File Phonetic Lexicon Phonetic Lexicon

Phonetic representation of every word in vocabulary Valid words from output of acoustic model Valid words from output of acoustic model Phoneme -- basic unit of Phoneme -- basic unit of PHONETIC LEXICON Hindi :- Itrans-3 English :- phonetics Hindi :- Itrans-3 English :- phonetics Contains words + phonetic Contains words + phonetic Phonetizer Components of ASR contd....

Itrans-3 Phoneme Sound Wave Hindi Speech Hindi Script /UTF8 Hindi.dic Phoneme Hindi Word In:d:iyaa इंडिया / ইংডিযা / ఇండియా / ഇംഡിയാ / இடியா Paanii पानी / পানী / పానీ / പാനീ / பானீ In:d:iyaa इंडिया / ইংডিযা / ఇండియా / ഇംഡിയാ / இடியா Paanii पानी / পানী / పానీ / പാനീ / பானீ IT3 to UTF8

In:d:iyaa इंडिया / ইংডিযা / ఇండియా / ഇംഡിയാ / இடியா Paanii पानी / পানী / పానీ / പാനീ / பானீ In:d:iyaa इंडिया / ইংডিযা / ఇండియా / ഇംഡിയാ / இடியா Paanii पानी / পানী / పానీ / പാനീ / பானீ

English Speech English word Pocket Sphinx SphinxTrain Phoneme Sound Wave Cmu07.dic Phoneme Word

Speech to Text Architecture ! Voice Input Analog to Digital Feature Extraction Acoustic Model Acoustic Model Language Model Language Model Phonetic Lexicon Phonetic Lexicon Speech Engine Speech Engine Store Word in a File Store Word in a File Language Model Language Model

Statistical Language Model assigns probability to sequence of m words by probability distribution. Statistical Language Model assigns probability to sequence of m words by probability distribution. Captures underlying grammatical structure of language. Captures underlying grammatical structure of language. USE:- Restrict Word Search USE:- Restrict Word Search LANGUAGE MODEL Most common language models – n-gram LM Most common language models – n-gram LM Tool CMUCLMTK Tool CMUCLMTK Components of ASR contd....

CORPUS.ARPA Steps of Language Model:- Word frequencies Vocabulary file Corpus N-gram file Language Model in.ARPA format Create CORPUS.TXT CMU Cam LM TOOL KIT

.ARPA File

Speech to Text Architecture ! Voice Input Analog to Digital Feature Extraction Acoustic Model Acoustic Model Language Model Language Model Phonetic Lexicon Phonetic Lexicon Speech Engine Speech Engine Store Word in a File Store Word in a File Speech Engine Speech Engine

Compares input speech data with acoustic models Modified Version DTW Algorithm used Modified Version DTW Algorithm used SPEECH ENGINE / DECODER A spects of Speech Decoding A spects of Speech Decoding Tool CMU Sphinx-- PocketSphinx Tool CMU Sphinx-- PocketSphinx Components of ASR contd.... Determine which part of signal is speech and filter out silence durations Uses

Samples of PocketSphinx acting as a Decoder....

Retrieve Stored Word from File E.g. India Retrieve Stored Word from File E.g. India Database Script of Word in Selected Language E.g. इंडिया / ইংডিযা / ఇండియా / ഇംഡിയാ / இடியா Script of Word in Selected Language E.g. इंडिया / ইংডিযা / ఇండియా / ഇംഡിയാ / இடியா FINDRETRIVE Text to Text Architecture

Use & Creation of Database!

Speech Sound Database Text to Speech Architecture ! Input Text in UTF8 Encodings Phonetic Synthesizer Text parser Text to Phonetic Script Conversion Text to Phonetic Script Conversion Grapheme To Phoneme Rules Grapheme To Phoneme Rules Speech Synthesizer CV Pair Algorithm CV Pair Algorithm Sound concatenation Sound concatenation

Phonetic description syllable based. 8 kinds of sounds allowed V: a plain vowel CV: a consonant followed by a vowel VC: a vowel followed by a consonant CVC: a consonant followed by a vowel followed by a consonant HCV: a half consonant, followed by a CV HCVC: a half consonant, followed by a CVC 0C: a consonant alone G[0-9]*: a silence gap of the specified length (typical gaps (C -consonant, V -Vowel, H-Half Sound) Grapheme to Phoneme Conversion !

CONSONANTS :- VOWELS :- Consonants & Vowels !

Speech Sound Database Text to Speech Architecture ! Input Text in UTF8 Encodings Phonetic Synthesizer Text parser Text to Phonetic Script Conversion Text to Phonetic Script Conversion Grapheme To Phoneme Rules Grapheme To Phoneme Rules Speech Synthesizer CV Pair Algorithm CV Pair Algorithm Sound concatenation Sound concatenation

Unicode text common script. Speech Synthesizer common script Text to Phonetic Script ! Examples

Speech Sound Database Text to Speech Architecture ! Input Text in UTF8 Encodings Phonetic Synthesizer Text parser Text to Phonetic Script Conversion Text to Phonetic Script Conversion Grapheme To Phoneme Rules Grapheme To Phoneme Rules Speech Synthesizer CV Pair Algorithm CV Pair Algorithm Sound concatenation Sound concatenation

Sound files are gsm compressed i.e. “.gsm” fromat Sound units stored in the database are:- Total size of db MB CV pairs : * VC pairs : * V : C : Halfs :--- ky kr kl kll kv ksh khy khr khl khv gy gr gl gv gn ghy ghr ghv ghn chy chr chv jy jv ty tr tv thy thr dy dr dv dhy dhr dhv ny nr nv tty ttr ttv ddy ddr ddv py pr pl pll fr fl by br bl bhy bhr bhl my mr vy vr vl CV pairs : * VC pairs : * V : C : Halfs :--- ky kr kl kll kv ksh khy khr khl khv gy gr gl gv gn ghy ghr ghv ghn chy chr chv jy jv ty tr tv thy thr dy dr dv dhy dhr dhv ny nr nv tty ttr ttv ddy ddr ddv py pr pl pll fr fl by br bl bhy bhr bhl my mr vy vr vl Sound Database !

CV files x.y.gsm named consonant number consonant number vowel number V files named x.gsm vowel number CV files x.y.gsm named vowel number consonant number consonant number Halfs files x.y.gsm named 2 consonants 0C files named x.gsm consonant number consonant number Sound Concatenation cvoffsets vcoffsets hoffsets voffsets 4 more Files

Speech Sound Database Text to Speech Architecture ! Input Text in UTF8 Encodings Phonetic Synthesizer Text parser Text to Phonetic Script Conversion Text to Phonetic Script Conversion Grapheme To Phoneme Rules Grapheme To Phoneme Rules Speech Synthesizer CV Pair Algorithm CV Pair Algorithm Sound concatenation Sound concatenation

Extended modules:- Constraints :- Training is tedious :- 2 input Languages. Phone generation of all Indian languages difficult. Can be trained for all Indian languages Increase accuracy Better quality of the text to speech synthesizer modules A larger dictionary approx words Future scope :- S2TT2ST2T File Reader S2T Reporter

BOL INDIA BOL PRIVATE LIMITED Masters of Computer Application. Sardar Patel Institute Of Technology. Andheri (West) Mumbai-58 Anasree Chatterjee (Director) Diwa Arunashree (Director) Prof. K.T.Talele (Joint Director) Shivani Nadkarni (Joint Director) Aditya Naravane (Joint Director ) Anasree Chatterjee (Director) Diwa Arunashree (Director) Prof. K.T.Talele (Joint Director) Shivani Nadkarni (Joint Director) Aditya Naravane (Joint Director ) “Language to Language Translator – A way To Homogeneous India ” Languator -- especially designed for the 3Ts’ that is T ravelers, T ourists and at pars the people who are victims of T ransferable jobs. It will also serve to certain extent the needs of S2T Reporters. “Language to Language Translator – A way To Homogeneous India ” Languator -- especially designed for the 3Ts’ that is T ravelers, T ourists and at pars the people who are victims of T ransferable jobs. It will also serve to certain extent the needs of S2T Reporters.