1 Introducing The Buckeye Speech Corpus Kyuchul Yoon English Division, Kyungnam University March 21, 2008 School of English,

Slides:



Advertisements
Similar presentations
1 Speech Sounds Introduction to Linguistics for Computational Linguists.
Advertisements

Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers Athanassios Hatzis, Phil Green,
Introduction to Computational Linguistics
Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist.
Yao LSA Separating speaker- and listener- oriented forces in speech – Evidence from phonological neighborhood density.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
ORTHOGRAPHIC EFFECTS ON PRONUNCIATION OF NATIVE SPEAKERS OF ENGLISH LEARNING GERMAN AS A FOREIGN LANGUAGE Irina Konstantinova LING 620 Ohio University.
-- A corpus study using logistic regression Yao 1 Vowel alternation in the pronunciation of THE in American English.
Phonetics Linguistics for ELT B Ed TESL 2005 Cohort 2.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Semantic and phonetic automatic reconstruction of medical dictations STEFAN PETRIK, CHRISTINA DREXEL, LEO FESSLER, JEREMY JANCSARY, ALEXANDRA KLEIN,GERNOT.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Speech Recognition Final Project Resources
BCH4905 Science for Life Seminar, Spring Procedures for the Class Or How to ENJOY the semester and GET AN “A” in BCH4905, Science for Life Seminar,
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Nasal endings of Taiwan Mandarin: Production, perception, and linguistic change Student : Shu-Ping Huang ID No. : NA3C0004 Professor : Dr. Chung Chienjer.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
1 Computational Linguistics Ling 200 Spring 2006.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Issues and suggestions Communication skills curriculum in Engineering and Technology courses - Ms.C.Divya, AP/English.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
Assessment of Phonology
YAO UC BERKELEY JULY 25, 2008 An Exemplar-based Approach to Automatic Burst Detection in Voiceless.
Frequency, Pitch, Tone and Length October 16, 2013 Thanks to Chilin Shih for making some of these lecture materials available.
A Fully Annotated Corpus of Russian Speech
Introduction to Speech Neal Snider, For LIN110, April 12 th, 2005 (adapted from slides by Florian Jaeger)
Speech and Music Retrieval INST 734 Doug Oard Module 12.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.
The Games Corpus Design, implementation and annotation Agustín Gravano Spoken Language Processing Group Columbia University.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.
Performance Comparison of Speaker and Emotion Recognition
Basic structure of sphinx 4
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
 explain expected stages and patterns of language development as related to first and second language acquisition (critical period hypothesis– Proficiency.
Lecture 1 Phonetics – the study of speech sounds
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Speech Recognition Created By : Kanjariya Hardik G.
STUDENT ASSESSMENT LISA COTTLE, DIRECTOR OF TEST ADMINISTRATION TEA, CHARTER SCHOOL ADMINISTRATION ©
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Phonetics Unit 1.
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Automatic Speech Recognition
Text-To-Speech System for English
AS Language Transition to A2.
Audio Books for Phonetics Research
Understanding Variation of VOT in spontaneous speech
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
Using Rise in Lieu of a Textbook
What is linguistics? Linguistics is the scientific study of language, in other words, it is the discipline that studies the nature and use of language.
Presentation transcript:

1 Introducing The Buckeye Speech Corpus Kyuchul Yoon English Division, Kyungnam University March 21, 2008 School of English, Kyung Hee University

2 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? The Buckeye Corpus of conversational speech 40 speakers in Columbus, OH conversing freely with an interviewer Orthographically transcribed and phonetically labeled Audio/text files & time-aligned phonetic labels (Xwaves, Wavesurfer) Available to researchers in academics and industry

3 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? Principal Investigators –Mark Pitt (Department of Psychology) –Eric Fossler-Lussier (Department of Computer Science and Engineering) –Elizabeth Hume (Department of Linguistics) –Keith Johnson (Department of Linguistics) Post-doctoral researchers (4) Graduate students (7) Undergraduate students (15)

4 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? Collection of speech completed by spring speakers, all natives of Central Ohio (i.e. born in/near Columbus, or moved there no later than age 10) Sample design is strafied for age/sex –Class was not strictly controlled –Most are middle class to upper working class

5 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? From 40 speakers, about 300,000 words of speech were collected (about 40 hours) –This large sample should ensure that the estimates of the forms and frequency of phonological variation are representative of the population under study –There should be a large number of tokens of many variant forms appearing in different phonetic environments –Useful for studying variation

6 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? Qualified speakers had a conversation about everyday topics such as politics, sports, traffic, schools, etc. A modified sociolinguistic interview format was chosen Interviews conducted in a small seminar room by the (male) postdoc and (female) graduate assistant

7 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? A detailed description of the procedures/conventions used in creating the corpus can be found in the manual Sound files and text transcriptions –Digital recordings were transferred onto a PC using a digital I/O card –Recorded conversations were transcribed into written English text by undergraduate transcribers using Soundscriber software ( –Transcripts are stored as ASCII text files

8 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? Automatic word and phone alignment –Sound files and written transcriptions were input to an automatic phonetic transcription program, Entropics Aligner –Aligner uses acoustic phone models trained on the TIMIT corpus of spoken English. It comes with a dictionary that lists several alternative pronunciations for many words –RA’s used Aligner to select the best fitting alternative pronunciations of words from among the alternatives listed in the dictionary and aligned the selected words and their phones to a portion of the sound wave

9 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? Hand realignment –Errors produced by the Aligner were corrected by phonetically trained RA’s –Corrections were made when the Aligner’s labels were placed at the wrong locations or when a label that is not a part of Aligner’s segmental repertoire was needed –For the hand alignment procedure, deciding upon the appropriate transcription of a given sequence was done using combined waveform and spectrographic displays of the signal using Entropics waves+ or Wavesurfer software

10 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? The.words /.phones /.log label files –The alignment procedure creates three (ASCII text) ‘label’ files corresponding to each sound file –The first contains the word labels and offset times –The second contains the phone labels and offset times –The third label file is a log of notes supplied by the labelers, marking instances of unusual voice quality, manner of speaking, nasality, etc.

11 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? Can be used for both pure research and for applied research and product development As a resource for pure research –The corpus provides one of the richest sources of data on pronunciation variation in conversational speech Auditory word recognition in psycholinguistics Rules of pronunciation variation in phonology Age and gender related conditioning on pronunciation variation in sociolinguistics Effects of pronunciation variation on automatic speech recognition

12 The Buckeye Speech Corpus What is it? Project Personnel Collection & Recording Transcription & Analysis Why create the corpus? On the applied side –Training acoustic models for speech recognition systems –Lexicon training for handling pronunciation variation –Testbed for grammar training

13 Corpus Citation Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational Speech (2nd release) [ Columbus, OH: Department of Psychology, Ohio State University (Distributor). Related Publications Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (2006). Word-medial /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change, 18(1), Pitt, Mark, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. (2005). The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability. Speech Communication, 45, Pitt, Mark and Keith Johnson. (2003). Using pronunciation data as a starting point in modeling word recognition. Paper presented at the 15th International Congress of Phonetic Sciences. Johnson, Keith. (2003). Aligning phonetic transcriptions with their citation forms. Acoustic Research Letters Online. Johnson, Keith. (2003). Massive reduction in conversational American English. Proceedings of the Workshop on Spontaneous Speech: Data and Analysis. August, Tokyo, JP. Raymond, William D., Robin Dautricourt, and Elizabeth Hume. (Submitted, 2003). Medial /t,d/ deletion in spontaneous speech. Manuscript submitted to Language Variation and Change. Raymond, William D. (2003). An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus. Proceedings of the Workshop on Spontaneous Speech: Data and Analysis. August, Tokyo, JP. Raymond, William D., Mark Pitt, Keith Johnson, Elizabeth Hume, Matthew Makashay, Robin Dautricourt, and Craig Hilts. (2002). An analysis of transcription consistency in spontaneous speech from the Buckeye corpus. Proceedings of ICSLP-02. September, Denver.Word-medial /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. The Buckeye Corpus of Conversational Speech: Labeling Conventions and a Test of Transcriber Reliability. Using pronunciation data as a starting point in modeling word recognition.Aligning phonetic transcriptions with their citation forms. Massive reduction in conversational American English. Medial /t,d/ deletion in spontaneous speech. An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus. An analysis of transcription consistency in spontaneous speech from the Buckeye corpus.

14 What it looks like

15 이후 순서 Buckeye Corpus 검색 스크립트 소개 인터넷 방송 저장 방법 및 상용프로그램 소개 포먼트 변형 / 합성 스크립트 소개 Voice bar/prevoicing/VOT 길이 조정 스크립트 소개 TextGrid 자동 생성 스크립트 소개 …