Download presentation
Presentation is loading. Please wait.
Published byBarnaby Watts Modified over 9 years ago
1
What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006
2
Summer at AFRL - DAGSI AFRL AFRL Air Force Research LabsAir Force Research Labs Wright-Patterson AFB, Dayton OHWright-Patterson AFB, Dayton OH DAGSI Student/Faculty Resarch Fellowship program DAGSI Student/Faculty Resarch Fellowship program Dayton Area Graduate Studies InstituteDayton Area Graduate Studies Institute Effort to encourage collaboration between Ohio universities and AFRLEffort to encourage collaboration between Ohio universities and AFRL
3
Summer at AFRL – SCREAM Lab SCREAM Lab SCREAM Lab Speech and Communication Research, Engineering, Analysis and Modeling LabSpeech and Communication Research, Engineering, Analysis and Modeling Lab Interest in a wide variety of speech research issues for the militaryInterest in a wide variety of speech research issues for the military Speech-to-speech translation, rapid development of speech recognition systems, etc. Speech-to-speech translation, rapid development of speech recognition systems, etc.
4
Summer at AFRL – Why us? SCREAM Lab members were interested in collaborating with OSU SCREAM Lab members were interested in collaborating with OSU SCREAM Lab working on research in using phonological features in speech recognition SCREAM Lab working on research in using phonological features in speech recognition Perceived overlap with ASAT projectPerceived overlap with ASAT project
5
Review – Phonological Features For the ASAT Project, we have been using phonological feature detectors For the ASAT Project, we have been using phonological feature detectors We train detectors on a particular phonological feature We train detectors on a particular phonological feature e.g. manner or place for consonant, height, frontness, etc. for vowelse.g. manner or place for consonant, height, frontness, etc. for vowels We then combine these features together for ASR purposes We then combine these features together for ASR purposes
6
Phonological Features (cont.) SCREAM Lab very interested in phonological feature detectors SCREAM Lab very interested in phonological feature detectors Need for quick development of new ASR systems for new languagesNeed for quick development of new ASR systems for new languages A full set of phonological feature detectors would allow reuse of acoustic data for training across new languagesA full set of phonological feature detectors would allow reuse of acoustic data for training across new languages Multi-lingual detectors are clearly needed to get full coverage of all features Multi-lingual detectors are clearly needed to get full coverage of all features
7
Phonological Features (cont.) Our phonological feature detectors Our phonological feature detectors Monolingual (English only)Monolingual (English only) Trained using a set of multi-layer perceptron neural networksTrained using a set of multi-layer perceptron neural networks Output a set of phonological feature class probabilitiesOutput a set of phonological feature class probabilities SCREAM lab feature detectors SCREAM lab feature detectors Monolingual and multilingualMonolingual and multilingual Trained using Gaussian Mixture ModelsTrained using Gaussian Mixture Models Output a set of likelihoodsOutput a set of likelihoods Based on work by Tanja Schultz (CMU)Based on work by Tanja Schultz (CMU)
8
Summer at AFRL - Proposal Besides acoustic models, new ASR systems for new languages have other needs Besides acoustic models, new ASR systems for new languages have other needs An ASR system needs a lexicon mapping phones-to-words An ASR system needs a lexicon mapping phones-to-words Normally hand-constructedNormally hand-constructed Require time and expertiseRequire time and expertise
9
Summer at AFRL - Proposal Our proposal: look at methods of bootstrapping new lexicons from: Our proposal: look at methods of bootstrapping new lexicons from: Acoustic dataAcoustic data Word-level transcriptsWord-level transcripts Phonological feature detector outputsPhonological feature detector outputs How? How? Start by looking at work on deriving Acoustic Sub-Word UnitsStart by looking at work on deriving Acoustic Sub-Word Units
10
Summer at AFRM - Proposal Acoustic Sub-Word Units (ASWUs) Acoustic Sub-Word Units (ASWUs) Similar to phones in that they are smaller pieces of wordsSimilar to phones in that they are smaller pieces of words BUT – automatically derived from acoustics instead of manually definedBUT – automatically derived from acoustics instead of manually defined Used to derive both a sub-word unit set and a lexicon for that set simultaneouslyUsed to derive both a sub-word unit set and a lexicon for that set simultaneously Research in this area has been mainly to improve ASR performanceResearch in this area has been mainly to improve ASR performance
11
Summer at AFRL - Proposal Can we use these methods along with phonological features as inputs to induce new lexicons? Can we use these methods along with phonological features as inputs to induce new lexicons? Using phonological features, the sub- word units may be mappable to standard IPA phone labelsUsing phonological features, the sub- word units may be mappable to standard IPA phone labels
12
Summer at AFRL - Proposal The proposed system is inspired by an ASWU by (Singh et al., 2002) The proposed system is inspired by an ASWU by (Singh et al., 2002) Notable for not requiring word boundaries to be marked for trainingNotable for not requiring word boundaries to be marked for training Start with a basic dictionary (including a starting phoneset size) Start with a basic dictionary (including a starting phoneset size) Train a set of acoustic models on the training data with that dictionary Train a set of acoustic models on the training data with that dictionary Alter the basic dictionary in a manner that improves your pronunciations Alter the basic dictionary in a manner that improves your pronunciations Repeat until a stopping criterion is reached Repeat until a stopping criterion is reached
13
Summer at AFRL - Proposal Start with a basic dictionary Start with a basic dictionary Start with an assumption that the number of phones in a word is related to the number of letters in the orthographyStart with an assumption that the number of phones in a word is related to the number of letters in the orthography Basic dictionary maps word to sequence of letters in that word: Basic dictionary maps word to sequence of letters in that word: ABLE A B L E BANNED B A N N E D
14
Summer at AFRL - Proposal Train a set of acoustic models Train a set of acoustic models Using the basic dictionary, map words in the transcript to these “pronunciations”Using the basic dictionary, map words in the transcript to these “pronunciations” Train an HMM-model using the output of the feature detectors as its input, and the above mapping as training labelsTrain an HMM-model using the output of the feature detectors as its input, and the above mapping as training labels
15
Summer at AFRL - Proposal Alter the basic dictionary Alter the basic dictionary Using some metric, find a candidate “phone” to be modifiedUsing some metric, find a candidate “phone” to be modified We’ve looked at a couple of metrics – more on this later We’ve looked at a couple of metrics – more on this later Once the phone is identified, see if the phone should be “split” or “deleted”Once the phone is identified, see if the phone should be “split” or “deleted” A “split” indicates that the given phone label actually represents two different sounds, and so should be replaced with two different phone labels A “split” indicates that the given phone label actually represents two different sounds, and so should be replaced with two different phone labels A “delete” indicates that for a particular word or words the model fits better if that phone label is removed from the pronunciation A “delete” indicates that for a particular word or words the model fits better if that phone label is removed from the pronunciation
16
Summer at AFRL - Proposal Split example: Split example: BE B E DEVELOP D E1 V E1 L O P Delete examples: Delete examples: ABLE A B L E :: ABLE A B L ABANDONED A B A N D O N D
17
Summer at AFRL - Proposal For splits, all possible alterations are added to temporary lexicon For splits, all possible alterations are added to temporary lexicon For deletes, we alter the HMM to add a possible deletion arc for the phone For deletes, we alter the HMM to add a possible deletion arc for the phone After lexicon or HMM is altered, word transcript is force aligned using new possible pronunciations After lexicon or HMM is altered, word transcript is force aligned using new possible pronunciations Best pronunciations are pulled from this alignment and used to build new lexiconBest pronunciations are pulled from this alignment and used to build new lexicon Steps are repeated using the new lexicon in place of the basic lexiconSteps are repeated using the new lexicon in place of the basic lexicon
18
Summer at AFRL - Proposal How do we determine the candidate “phone label” to alter? How do we determine the candidate “phone label” to alter? Initially, modelled each phone with two Gaussians in the HMMInitially, modelled each phone with two Gaussians in the HMM Compared the two Gaussians to each other using their KL-divergencesCompared the two Gaussians to each other using their KL-divergences Took the phone label with the largest KL divergence as the one to alter Took the phone label with the largest KL divergence as the one to alter Idea was that each Gaussian described a cluster – the further these centers were from each other, the more probable they were describing two different phones Idea was that each Gaussian described a cluster – the further these centers were from each other, the more probable they were describing two different phones
19
Summer at AFRL - Proposal KL-divergence metric did not work well KL-divergence metric did not work well System would pick candidates that a human would find unreasonable (such as “F” or “Q”)System would pick candidates that a human would find unreasonable (such as “F” or “Q”) System would split or delete these phones multiple times, continually returning to the same phone labelSystem would split or delete these phones multiple times, continually returning to the same phone label
20
Summer at AFRL - Proposal Why did the KL divergence perform this way? Why did the KL divergence perform this way? Suspcion: Large variations in the two Gaussians in areas that do not matter for that phone pushed up the scores (e.g. vowel features for consonants)Suspcion: Large variations in the two Gaussians in areas that do not matter for that phone pushed up the scores (e.g. vowel features for consonants) Splitting these phones only allowed the coverage to spread wider, drawing the system back to those phonesSplitting these phones only allowed the coverage to spread wider, drawing the system back to those phones
21
Summer at AFRL - Proposal What next? What next? Tried Mahalanobis distance metric, with poor results also Tried Mahalanobis distance metric, with poor results also Returned to Acoustic Sub-Word papers for inspiration Returned to Acoustic Sub-Word papers for inspiration Instead of looking at cluster stats, multiple papers use an average frame likelihood metric for each phone cluster to determine candidate phone for alteringInstead of looking at cluster stats, multiple papers use an average frame likelihood metric for each phone cluster to determine candidate phone for altering Have started moving my code to use this framework – preliminary passes show promise, but no results quite yetHave started moving my code to use this framework – preliminary passes show promise, but no results quite yet
22
Conclusion – It’s 75 miles to Dayton Advice for those thinking of doing work at WPAFB Advice for those thinking of doing work at WPAFB Working in the SCREAM Lab was greatWorking in the SCREAM Lab was great Hundreds of processors, tons of multi-lingual corpora Hundreds of processors, tons of multi-lingual corpora Friendly people, decent work environment (if a bit dark) Friendly people, decent work environment (if a bit dark) Many hoops to jump through, even just for a summer studentMany hoops to jump through, even just for a summer student ID badges, computer usage training, etc. ID badges, computer usage training, etc. Sometimes feels like you’re working at a corporation…Sometimes feels like you’re working at a corporation… until the guys in uniform come around until the guys in uniform come around The base is built like a campus crossed with a prisonThe base is built like a campus crossed with a prison cinderblock is the building material of choice. cinderblock is the building material of choice. Don’t forget your ID BadgeDon’t forget your ID Badge It’s 75 miles from Columbus to Dayton It’s 75 miles from Columbus to Dayton
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.