1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.

Slides:



Advertisements
Similar presentations
By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Introducing COMPARA The Portuguese-English Parallel Corpus Ana Frankenberg-Garcia ISLA, Lisbon & Diana Santos SINTEF, Oslo.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
 Detecting system  Training system Human Emotions Estimation by Adaboost based on Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki ( Kobe University ) User's.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Computational Linguistics Courses Experiment Test.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
Survey on state-of-the-art approaches: Neural Network Trends in Speech Recognition Survey on state-of-the-art approaches: Neural Network Trends in Speech.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Phonetic Posteriorgrams for Many-to-one Voice Conversion without Parallel Data Training Lifa Sun, Kun Li, Hao Wang, Shiyin Kang and Helen Meng Human-Computer.
How can speech technology be used to help people with disabilities?
Olivier Siohan David Rybach
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
G. Anushiya Rachel Project Officer
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Research on Machine Learning and Deep Learning
Linguistic knowledge for Speech recognition
Mr. Darko Pekar, Speech Morphing Inc.
WAVENET: A GENERATIVE MODEL FOR RAW AUDIO
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments Good morning, My name is Guan-Lin Chao, from Carnegie Mellon.
Conditional Random Fields for ASR
Deep Exploration and Filtering of Text (DEFT)
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.
Intelligent Information System Lab
Spoken Dialog System.
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Neural Machine Translation By Learning to Jointly Align and Translate
Automatic Fluency Assessment
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Deep Learning based Machine Translation
Distributed Representation of Words, Sentences and Paragraphs
Speech Processing August 4, /2/2018.
Advanced NLP: Speech Research and Technologies
Prof. Adam Meyers: Proteus Project
1-R-43  Neutral-to-Emotional Voice Conversion with Latent Representations of F0 using Generative Adversarial Networks Zhaojie Luo, Tetsuya Takiguchi, and.
Neural Speech Synthesis with Transformer Network
Statistical n-gram David ling.
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Machine Translation(MT)
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
Idiap Research Institute University of Edinburgh
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
Image Processing and Multi-domain Translation
STAT Midterm Presentation
Speech Prosody Conversion using Sequence Generative Adversarial Nets
Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Joint Learning of Correlated Sequence Labeling Tasks
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe University) Canonical Correlation Analysis Overview Background Problems Goal 1. The traditional pipeline combined method will increase the cumulative loss of information greatly when combing each system. 2. The prosody of the source utterance is discarded by the ASR system and is not accessible to the TTS system in the target language. Hellow 你好 1. Reduce the error rate when combining the Automatic Speech Recognizer (ASR), Machine Translation system (NMT) and Test-to-Speech (TTS) system. 2. Changing the prosody in different languages. 3. Applying the voice conversion (VC) to the speech-to-speech translation system. Translate the linguistic content in different language  Speech prosody conversion VEmotional robot Proposed method ASR NMT TTS Good morning “早上好” Conversion features speaker's voice collected speech to speech translation with speaker' voice VC “text” ASR MT TTS Traditional approaches to S2SMT use a pipeline architecture. “text” ASR MT TTS In this paper, we proposed a speech-to-speech translation combined with the voice conversion model. For combining the most popular ASR, NMT, TTS and VC models, we used the dual learning. TTS: We used the Tacotron 2, a neural network architecture for speech synthesis directly from the text. VC: We used the starGAN, which requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training. NMT: For the standard NMT model, we carry out experiments with Chinese-to-English (Zh-En) using the data of iwslt2015. ASR: We used the pre-trained Deep Speech 2 model, which uses 9,400 hours of labeled speech containing 11 million mandarin utterances as training data. To reduce the error rate, we aiming to apply the dual learning to combine the ASR, NMT, and TTS using the popular models. Dataset Experiment Dataset Results MCD MCD results of using and not using dual learning in different VC Native to Native: The VC from native speaker to native speaker. TTS to Native: The VC from TTS voice to the native speaker. TTS to Non-Native: The VC from TTS voice to the non-native speaker. BELU result of tranforming Chinese to English using standard NMT model.