Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University.

Slides:



Advertisements
Similar presentations
By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Advertisements

Device receives electronic signal transmitted from signs containing information A device that can communicate GPS location relative to the destination.
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers Athanassios Hatzis, Phil Green,
SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
The Computerised FDA Application Formulating A System of Acoustic Objective Measures for the Frenchay Dysarthria Assessment Tests.
Communicating with Robots using Speech: The Robot Talks (Speech Synthesis) Stephen Cox Chris Watkins Ibrahim Almajai.
Languages Dialect and Accents
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Communication is the goal of writing Technique 1: Write about people in action as the subject or object of many sentences.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Designing a User Interface for People with Disabilities u u
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Numerical Text-to-Speech Synthesis System Presentation By: Sevakula Rahul Kumar.
Chelsea Johnson, Cortney Jones, Amber Cunningham, and Dylan Bush.
Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
New technologies supporting people with severe speech disorders Mark Hawley Barnsley District General Hospital and University of Sheffield.
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Studying how English works, its history and its impact on others so we can better understand our linguistic identity and our heritage NAME DATE The Unit.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Voice Onset Time In Chinese Learners of English Major Sharpe.
Clinical Applications of Speech Technology Phil Green Speech and Hearing Research Group Dept of Computer Science University of Sheffield
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
Introduction to IT Presented by: Ishan Agarwal ABV-IIITM, Gwalior.
Synthesis of Child Speech With HMM Adaptation and Voice Conversion Oliver Watts, Junichi Yamagishi, Member, IEEE, Simon King, Senior Member, IEEE, and.
Vergina: A Modern Greek Speech Database for Speech Synthesis Alexandros Lazaridis Theodoros Kostoulas Todor Ganchev Iosif Mporas Nikos Fakotakis Artificial.
SPEECH SYNTHESIS --AusTalk Zhijie Shao Master of Computer Science Supervisor: Trent Lewis.
STARDUST – Speech Training And Recognition for Dysarthric Users of Assistive Technology Mark Hawley et al Barnsley District General Hospital and University.
By Luisa and Noriko. Description of class  Language Proficiency level: low intermediate  Class size: 12 students  Age: 15 to 25  Native Language background:
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Multimodal User Interface with Natural Language Classification for Clinicians At Point of Care Health Informatics Showcase Peter Budd Sponsors: NCCH -
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
Glencoe Introduction to Multimedia Chapter 8 Audio 1 sound effect An artificially created or enhanced sound used to achieve an effect (without speech or.
HMM training strategy for incremental speech synthesis.
Statistics for Mission The Church of Scotland and Scotland’s Census.
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Performance Comparison of Speaker and Emotion Recognition
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Chapter 7 Professional and Social Communication How to responsibly communicate with others.
Making it Meaningful  Dialects of American English as YOU see them Dialects of American English  Does everyone speak using a dialect? Information about.
Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
LearningHouse.com | (502) Understanding and Complying With the Americans With Disabilities Act (ADA)
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Recognition Xiaofeng Lai. What is speech recognition?  Speech recognition :  This is the ability of a machine or program to identify words and.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
IIS for Speech Processing Michael J. Watts
G. Anushiya Rachel Project Officer
Mr. Darko Pekar, Speech Morphing Inc.
Presentation transcript:

Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University of Sheffield

Introduction Building voices for VIVOCA (communication aids) Speech synthesis techniques Future research: personalisation of synthetic voices

Current speech synthesis: communication aids High quality voices available E.g. Toby Churchill Lightwriter – DECtalk™ (Fonix) for American English – Acapela for British English Personalisation limited: age, gender, language

Personalisation Voice = identity – Gender – Age – Geographic background – Socio-economic background – Ethnic background – As that individual Maintains social relationships Maintains social closeness Sets group membership

VIVOCA Voice Input Voice Output Communication Aid Speech Recognition Dysarthric speech input Text-to-Speech Synthesis Recognised text Intelligible synthesised speech output Retain elements of clients’ identity for synthesised speech output Intelligible and personalised synthesised speech output

VIVOCA: personalisation Sheffield/Barnsley user group Retain local accent – geographic identity Speaker database – Arctic database: sentences Professional local speakers – Ian McMillan – Christa Ackroyd

Concatenative synthesis Input data Text input Synthesised speech Speech recordings Unit segmentation Unit database Unit selection Concatenation + smoothing i a sh Festvox: +… ++ …

Concatenative synthesis High quality Natural sounding Sounds like original speaker  Need a lot of data (~600 sentences)  Can be inconsistent  Difficult to manipulate prosody

HMM synthesis yes yes

HMM synthesis procedure Input data Text input Speaker model Synthesised speech Speech recordings TrainingSynthesis e t HTS

HMM synthesis Consistent Intelligible Needs relatively little input (~20 mins) Can be adapted with small amount of data (>5 sentences) Easier to manipulate  Buzzy quality  Less natural than concatenative

Future research Further personalisation for individuals with progressive speech disorders – Capturing the essence of a voice Voice banking – Before deterioration Adaptation using HMM synthesis – Before or during deterioration

Thank you This work is sponsored by EPSRC Doctoral Training grant For further details of VIVOCA see: