STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.

Slides:

Advertisements

Similar presentations

Documentation and Document Control

Advertisements

1 Cross-Correlations and Cleaning Up Data Jessica Ferguson.

Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,

Digital Audio 1.

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.

Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.

DU, C-SIIT1 Collecting and Transcribing Real Chinese Spontaneous Telephone Speech Corpus Limin Du, Chair Professor Director, Center for Speech Interactive.

Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.

Design, compilation and processing of CUCall: a set of Cantonese spoken language corpora collected over telephone networks by W.K. Lo, P.C. Ching, Tan.

Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.

EE442—Multimedia Networking Jane Dong California State University, Los Angeles.

Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.

Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)

Final Project CS HCI Kim T Le. Screen Readers for Blind.

Technical Communication 1

11 Data Interface Standard for Accounting Software Project Progress Report China National Audit Office June, 2015.

Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese.

Towards a definition of GestBase - an open database of gestures Milan Rusko Institute of Informatics of the Slovak Academy of Sciences, Bratislava.

Speech Recognition Final Project Resources

1 “ Speech ” EMPOWERED COMPUTING Greenfield Business Centre, 20 th September, 2006.

1 DEVELOPING ASSESSMENT TOOLS FOR ESL Liz Davidson & Nadia Casarotto CMM General Studies and Further Education.

Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.

Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,

1 Introducing The Buckeye Speech Corpus Kyuchul Yoon English Division, Kyungnam University March 21, 2008 School of English,

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

LR College Paris 10 th ECESS meeting 10th ECESS Meeting College Language Resources Paris January Goal of meeting 2. Status members of College 3.

Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,

Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.

D1.HGE.CL7.01 D1.HGA.CL6.08 Slide 1. Introduction Design, prepare and present reports  Classroom schedule  Trainer contact details  Assessments  Resources:

Information Technology – Dialogue Systems Ulm University (Germany) Speech Data Corpus for Verbal Intelligence Estimation.

DDMs -From Conception to Impact Rating D Easthampton High School – Team Leader Meeting March 17, 2014 Facilitated by Shirley Gilfether.

Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.

Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.

Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.

A Fully Annotated Corpus of Russian Speech

A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.

Managing Learning Objects in Large Scale Courseware Authoring Studio Ivo Marinchev, Ivo Hristov Institute of Information Technologies Bulgarian Academy.

ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.

Performance Comparison of Speaker and Emotion Recognition

© 2013 by Larson Technical Services

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

PET Examination OVERVIEW John Scullion Guadalajara 1.

ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.

Gender What question would you like to ask these people? DO NOT CHOOSE THE OBVIOUS QUESTION tch?v=WDswiT87oo8.

11 Researcher practice in data management Margaret Henty.

ENGR 1181 College of Engineering Engineering Education Innovation Center Introduction to Technical Communication.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.

Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,

Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.

INTRODUCTION TO APPLIED LINGUISTICS

Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.

Yes, I'm able to index audio files within Alfresco

Automatic Speech Recognition

3.0 Map of Subject Areas.

Institutional role in supporting open access, open science, open data

A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.

Online Testing System Assessment Viewing Application (AVA)

Digital Audio 1.

Introduction to Database Management System

Overview What is Multimedia? Characteristics of multimedia

MBI 630: Week 11 Interface Design

Hands-on tutorial: Using Praat for analysing a speech corpus

Online Testing System Assessment Viewing Application (AVA)

e-PLUS Lab5 Language Lab System

Presentation transcript:

STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences

2 1. Introduction Speech corpus (database) is the collection of speech signal, its annotation and documents. Speech corpus is the basis for both phonetic research and developing speech synthesis and recognition systems.

3 Speech synthesis---TTS (Text to Speech) example: “Welcome to 20th International CODATA Conference” standard chinese SiChuan dialect Chind’s voice (mandarin) Speech recognition---ASR (Automatic Speech Recognition ) IBM VIAVOICE, MICROSOFT OFFICE XP…… phonetic research

4 Importance of standardization research In China, many speech research and development affiliations are developing their own speech corpora. 863, 973, the National Science Foundation of China … it is very important to be able to conveniently share these speech corpora to avoid waste of time and money and to make the research work more efficiency. standardization research of speech corpus is necessary and specifications should be stipulated.

5 2. Standardization research of speech corpus 1). Legal correlatedLegal correlated 2) Standardization of collection procedure of speech corpusStandardization of collection procedure of speech corpus 3). Standardization of speech corpusStandardization of speech corpus

6 1). Legal correlated Legal documents of speech corpus: property right statement of the corpora (database), agreement with the speakers, agreement with the users, …

7 2) Standardization of collection procedure of speech corpus Fig 1 ： the collection procedure of speech corpus

8 3). Standardization of speech corpus Specification of speakers: Describing the speaker’s features; specification of corpus design: Describing the corpus organization and contents; specification of recording: Describing the recording technical specifications and the recording platform ; specification of annotation: Describing the annotation conventions; specification of validation: Setting explicit the criteria that the corpus should fulfill. Giving an overview of the features to be checked; specification of distribution: Describing the distribution plan, principles and the storage medium.

9 3. Detailed Specifications exemplified by RASC863 a speech corpus example --- RASC863 ( Regional Accented Speech Corpus funded by National 863 Project) RASC863 is a speech corpus with four regional accents, namely Chongqing, Shanghai, Guangzhou and Xiamen. 800 speakers (200 * 4) 70GB

10 RASC Specification of speakers Specification of speakers describes the number of speakers to be recorded for each language and their characterizations. age, education level, gender, dialectal coverage Sometimes it has to describe the speaking styles. read speech, answering speech, command/control speech, descriptive speech, non-prompted speech, spontaneous speech, neutral vs. emotional speech and dialogue.

11 RASC863-the distribution of speakers ItemsLevelsMalefemale Age/gender (y) (y) Older than 50 (y) Education Junior high school Senior high school Undergraduate/ graduated Accent category L1 L2 L

12 RASC863 – 2. Specification of corpus design The aim of speech corpus design is to determine what to be recorded and to get the necessary script. The RASC863 prompt sheet for each speaker: Items Speech style Content 0 Spontaneous 4 to 5 minutes 1-15 Spontaneous 15 question answers Read 23 common sentences Read 15 dialectal words Read 110 phonetically balanced sentences (<30 syllables each)

13 RASC863-3.Specification of recording Usually the specification of recording contains recording guide, technical parameters, recording procedures, recording log files, etc. Hardware: notebook, usb sound card (M-Audio usbpre), Microphone (sennheiser earphone, CR 722 capacitor microphone ) Software: Cooledit Pro 2.0 YYSRecorder

14 RASC863-4.Specification of corpus structure Corpus structure related to the corpus internal organization structure, the file naming rules and the storage media for distribution In RASC863, each recorded sound corresponds to a metadata file and a wave file. The metadata file describes the detailed information related to this recorded sound file Session ID Speaker ID Date of Recording Recording place Speaking style ***** acoustic and technical description Recoding sound name Environmental Conditions Microphones Sampling rate Bits per sample ***** Annotation part Annotation Convention ……

15 RASC863-5.Specification of annotation Specification of annotation describes the annotation format, rules, tools, consistency criterion. Speech corpus annotation includes speech to characters transcription, segmental annotation and prosodic annotation. if there are more than one transcribers transcribing or annotating simultaneously, their annotation consistency should be checked first. Rasc863: C-ToBI3.0, SAMPA-C

16 RASC Legal agreement A very important thing is about the agreement between the producer and the speaker, often called speaker agreement, in which the usage of the recorded speech data or even some of the speaker’s information should be clearly demonstrated.

17 RASC863-7.Specification of validation and distribution Corpus validation criterion is the final validation after the pre-validation and the finishing of the whole corpus production. It can check the quality of corpus and provide the reference criterion to users. Corpus distribution can be made through a distribution organization or the corpus production affiliation itself. The producer should provide the information about corpus to distributor and users. And legal agreement between producer, distributor and user should be signed before formal distribution.

18 Discussion Some speech corpus of Phonetics Laboratory ( supported by 863, 973, the National Science Foundation of China and the National Science Foundation of America Ongoing research: man-machine conversation based on spontaneous speech …… the specifications of corpus will be extended and perfected

19 THANK YOU!