Download presentation
Presentation is loading. Please wait.
Published byMargaret Cain Modified over 9 years ago
1
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University Electronics and Information Systems (ELIS) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A A A A A A
2
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 072 Overview Problem statement Methodology –computing phonological scores –foreignizable phonemes Experiments –baseline system –systems with methodology implementation Conclusions
3
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 073 Automatic attendant or car navigation systems –lexicon may contain > 100K words –many from foreign origin Native speaker of Dutch can pronounce Andrew as Problem statement
4
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 074 Automatic attendant or car navigation systems –lexicon may contain > 100K words –many from foreign origin Native speaker of Dutch can pronounce Andrew as nativizedA n d r E w intermediateE n d r u w foreignizedE n d r u Problem statement
5
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 075 Standard solutions –foreign g2p’s + mapping to native phonemes –include foreign phoneme acoustic models Our proposal –combine scores of standard acoustic models and phonologically inspired back-off model both models trained on native speech only –use foreign g2p’s without phoneme mapping –introduce foreignizable phonemes instead of traditional foreign-to-native phoneme mappings Problem situation
6
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 076 Combining scores two-stream score per acoustic model state q –standard model : log p A (x | q) –phonological back-off model : log p B (x | q) control parameters –g 1q, g 2q = state dependent stream weight (different risk for foreignized pronunciation) –α, β = state independent scaling coefficients (to get same overall mean, variance) –equidistant samples on g 1q + g 2q = 1 (factor has no effect)
7
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 077 Combining scores Computation of log p B (x | q) –phonological feature space: binary features f i (i=1,…,25) –map each state to phonological space select features of state on basis of forced alignment of speech with standard acoustic models select f i with large enough mean of P(f i | x) / P(f i ) on state other strategy for foreignizable phonemes (see further) –compute posterior probabilities P(f i | x) configuration of 4 neural networks
8
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 078 Combining scores Computation of log p B (x | q) –phonological feature space: binary features f i (i=1,…,25) –map each state to phonological space select features of state on basis of forced alignment of speech with standard acoustic models select f i with large enough mean of P(f i | x) / P(f i ) on state other strategy for foreignizable phonemes (see further) –compute posterior probabilities P(f i | x) configuration of 4 neural networks –convert posterior probabilities to log-likelihood
9
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 079 Combining scores Come to final two-stream score –g 2q less dependent on q than –g 2q log p B (x) = discardable –computation of log P B (q | x) / P B (q) P q : positive features that are ‘on’ for state q N q : negative features absent or ‘off’ for q
10
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0710 Combining scores Assuming independent PHFs we get (1) (2) Start with only positive features (term (1)) –problem : unequal number for different q –solution : take average or w qp x (1), with w qp = 1 / card(P q ) –experiment showed this is better Add negative features (term (2)) –supposed to represent same probability –experiment shows 75 % correlation between (1) and (2) –keeping (1) + (2) is slightly better than discarding (2)
11
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0711 Introducing foreignizable phonemes Baseline pronunciation of foreign name –take foreign language g2p output –map foreign phonemes to best native equivalent Our pronunciation –if equivalent has different PHFs keep info of original foreignizable phoneme: /NativePhon/_/ForeignPhon/ –e.g. /rr/ /r/_/rr/ (Dutch /r/ originating from English /rr/) –6 such phonemes for English Dutch –use positive PHFs of /ForeignPhon/ (knowledge based)
12
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0712 Introducing foreignizable phonemes Pronunciation variants –mix of standard and new approach
13
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0713 Introducing foreignizable phonemes Pronunciation variants –mix of standard and new approach
14
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0714 Experiments Recognition of English names –database from Nuance (Cremelie, N and ten Bosch, L) –2050 English name utterances –21 different names –26 native speakers of Dutch Recognizer –Standard acoustic models: cross-word triphones, trained on Dutch read speech –PHF feature detector: neural network configuration, trained on Dutch read speech –Vocabulary: 21 English names + 1779 Dutch names –Lexicon: different transcriptions for each name (see next slide)
15
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0715 Baseline system No back-off model used Effects of different types of transcriptions measured
16
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0716 Baseline system No back-off model used Effects of different types of transcriptions measured
17
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0717 Baseline system No back-off model used Effects of different types of transcriptions measured Most important findings 1.English much better than Dutch transcriptions (alone) model foreign pronunciations 2.Dutch transcriptions inevitable model native pronunciations
18
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0718 Systems with back-off model system FOREIGN –consider one foreignizable phonemes at the time –same g 1 on all its states : find optimal value under condition that g 1 = 1 for all other phonemes –repeat process until all foreignizable phonemes treated system NATIVE –same g 1 on all states –search for best g 1 system ALL –foreignizable phonemes : g 1 = from FOREIGN –other phonemes: same g 1, g 1 = from NATIVE
19
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0719 Systems with back-off model Main results : relative improvement of 11% Other results –g 1 < 0.5 for system FOREIGN –g 1 > 0.5 for system NATIVE
20
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0720 Latest work Seek confirmation of results on other data Autonomata database (STEVIN-project) –60000 names, 5000 different names, 240 speakers –French + English + Dutch names –French + English + Dutch speakers –French + English + Dutch g2p outputs per name –large RI by using foreign g2p’s on French and English –much larger RI with our methodology than here –paper submitted to ASRU-2007
21
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Interspeech 07 - August 30th 0721 Conclusions as of today large improvements on foreign name recognition by adding foreign g2p outputs (RI of around 40%) substantial extra improvements by adding new methodology (RI of up to 30%)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.