Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: Computer Vision, Speech Communication and.

Similar presentations


Presentation on theme: "ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: Computer Vision, Speech Communication and."— Presentation transcript:

1 ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: http://cvsp.cs.ntua.grhttp://cvsp.cs.ntua.gr Computer Vision, Speech Communication and Signal Processing Research Group HIWIRE

2 ICCS - NTUA HIWIRE Meeting, Granada June 2005 Group Leader : Prof. Petros Maragos Ph.D. Students / Graduate Research Assistants :  D. Dimitriadis (speech: recognition, modulations)  V. Pitsikalis (speech: recognition, fractals/chaos, NLP)  A. Katsamanis (speech: modulations, statistical processing, recognition)  G. Papandreou (vision: PDEs, active contours, level sets, AV-ASR)  G. Evangelopoulos (vision/speech: texture, modulations, fractals)  S. Leykimiatis (speech: statistical processing, microphone arrays) HIWIRE Involved CVSP Members

3 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ICCS-NTUA in HIWIRE: 1 st Year Evaluation  DatabasesCompleted  BaselineCompleted WP1  Noise Robust FeaturesResults 1 st Year  Audio-Visual ASR Baseline + Visual Features  Multi-microphone arrayExploratory Phase  VADPrelim. Results WP2  Speaker NormalizationBaseline  Non-native Speech DatabaseCompleted

4 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ICCS-NTUA in HIWIRE: 1 st Year Evaluation  DatabasesCompleted  BaselineCompleted WP1  Noise Robust FeaturesResults 1 st Year Modulation Features Results 1 st Year Fractal Features Results 1 st Year  Audio-Visual ASR Baseline + Visual Features  Multi-microphone arrayExploratory Phase  VADPrelim. Results WP2  Speaker NormalizationBaseline  Non-native Speech DatabaseCompleted

5 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 WP1: Noise Robustness Platform: HTK Baseline + Evaluation:  Aurora 2, Aurora 3, TIMIT+NOISE Modulation Features  AM-FM Modulations  Teager Energy Cepstrum Fractal Features  Dynamical Denoising  Correlation Dimension  Multiscale Fractal Dimension Hybrid-Merged Features up to +62 % (Aurora 3) up to +36% (Aurora 2) up to +61 % ( Aurora 2)

6 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ICCS-NTUA in HIWIRE: 1 st Year Evaluation  DatabasesCompleted  BaselineCompleted WP1  Noise Robust FeaturesResults 1 st Year  Speech Modulation FeaturesResults 1 st Year Fractal Features Results 1 st Year  Audio-Visual ASR Baseline + Visual Features  Multi-microphone arrayExploratory Phase  VADPrelim. Results WP2  Speaker NormalizationBaseline  Non-native Speech DatabaseCompleted

7 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Speech Modulation Features Filterbank Design Short-Term AM-FM Modulation Features  Short-Term Mean Inst. Amplitude IA-Mean  Short-Term Mean Inst. Frequency IF-Mean  Frequency Modulation Percentages FMP Short-Term Energy Modulation Features  Average Teager Energy, Cepstrum Coef. TECC

8 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Modulation Acoustic Features Speech Nonlinear Processing Demodulation Robust Feature Transformation/ Selection Regularization + Multiband Filtering Statistical Processing V.A.D. Energy Features: Teager Energy Cepstrum Coeff. TECC AM-FM Modulation Features: Mean Inst. Ampl. IA-Mean Mean Inst. Freq. IF-Mean Freq. Mod. Percent. FMP

9 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 TIMIT-based Speech Databases TIMIT Database:  Training Set: 3696 sentences, ~35 phonemes/utterances  Testing Set: 1344 utterances, 46680 phonemes  Sampling Frequency 16 kHz Feature Vectors:  MFCC+C0+AM-FM+1 st +2 nd Time Derivatives Stream Weights: (1) for MFCC and (2) for ΑΜ-FM 3-state left-right HMMs, 16 mixtures All-pair, Unweighted grammar Performance Criterion: Phone Accuracy Rates (%) Back-end System: HTK v3.2.0

10 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: TIMIT+Noise Up to +106%

11 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Aurora 3 - Spanish Connected-Digits, Sampling Frequency 8 kHz Training Set:  WM (Well-Matched): 3392 utterances (quiet 532, low 1668 and high noise 1192  MM (Medium-Mismatch): 1607 utterances (quiet 396 and low noise 1211)  HM (High-Mismatch): 1696 utterances (quiet 266, low 834 and high noise 596) Testing Set:  WM: 1522 utterances (quiet 260, low 754 and high noise 508), 8056 digits  MM: 850 utterances (quiet 0, low 0 and high noise 850), 4543 digits  HM: 631 utterances (quiet 0, low 377 and high noise 254), 3325 digits 2 Back-end ASR Systems (ΗΤΚ and BLasr) Feature Vectors: MFCC+AM-FM (or Auditory+ΑM-FM), TECC All-Pair, Unweighted Grammar (or Word-Pair Grammar) Performance Criterion: Word (digit) Accuracy Rates

12 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 3 (HTK) Up to +62%

13 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Databases: Aurora 2 Task: Speaker Independent Recognition of Digit Sequences TI - Digits at 8kHz Training (8440 Utterances per scenario, 55M/55F)  Clean (8kHz, G712)  Multi-Condition (8kHz, G712) 4 Noises (artificial): subway, babble, car, exhibition 5 SNRs : 5, 10, 15, 20dB, clean Testing, artificially added noise  7 SNRs: [-5, 0, 5, 10, 15, 20dB, clean]  A: noises as in multi-cond train., G712 (28028 Utters)  B: restaurant, street, airport, train station, G712 (28028 Utters)  C: subway, street (MIRS) (14014 Utters)

14 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +12%

15 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Work To Be Done on Modulation Features Refinements w.r.t. AM-FM Features Fusion w. Other Features

16 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ICCS-NTUA in HIWIRE: 1 st Year Evaluation  DatabasesCompleted  BaselineCompleted WP1  Noise Robust FeaturesResults 1 st Year Speech Modulation Features Results 1 st Year Fractal FeaturesResults 1 st Year  Audio-Visual ASR Baseline + Visual Features  Multi-microphone arrayExploratory Phase  VADPrelim. Results WP2  Speaker NormalizationBaseline  Non-native Speech DatabaseCompleted

17 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Fractal Features N-d Cleaned Embedding N-d Signal Local SVD speech signal Filtered Dynamics - Correlation Dimension (8) Noisy Embedding Filtered Embedding FDCD Multiscale Fractal Dimension (6) MFD Geometrical Filtering

18 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Databases: Aurora 2 Task: Speaker Independent Recognition of Digit Sequences TI - Digits at 8kHz Training (8440 Utterances per scenario, 55M/55F)  Clean (8kHz, G712)  Multi-Condition (8kHz, G712) 4 Noises (artificial): subway, babble, car, exhibition 5 SNRs : 5, 10, 15, 20dB, clean Testing, artificially added noise  7 SNRs: [-5, 0, 5, 10, 15, 20dB, clean]  A: noises as in multi-cond train., G712 (28028 Utters)  B: restaurant, street, airport, train station, G712 (28028 Utters)  C: subway, street (MIRS) (14014 Utters)

19 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +40%

20 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +27%

21 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +61%

22 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Future Directions on Fractal Features Refine Fractal Feature Extraction. Application to Aurora 3. Fusion with other features.

23 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ICCS-NTUA in HIWIRE: 1 st Year Evaluation  DatabasesCompleted  BaselineCompleted WP1  Noise Robust FeaturesResults 1 st Year  Audio-Visual ASR Baseline + Visual Features  Multi-microphone arrayExploratory Phase  VADPrelim. Results WP2  Speaker NormalizationBaseline  Non-native Speech DatabaseCompleted

24 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Visual Front-End Aim: Extract low-dimensional visual speech feature vector from video Visual front-end modules: Speaker's face detection ROI tracking Facial Model Fitting Visual feature extraction Challenges: Very high dimensional signal - which features are proper? Robustness Computational Efficiency

25 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Face Modeling ● A well studied problem in Computer Vision: ● Active Appearance Models, Morphable Models, Active Blobs ● Both Shape & Appearance can enhance lipreading ● The shape and appearance of human faces “live” in low dimensional manifolds = =

26 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Image Fitting Example step 2step 6step 10 step 14step 18

27 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Example: Face Interpretation Using AAM original video shape track superimposed on original video reconstructed face This is what the visual-only speech recognizer “sees”! ● Generative models like AAM allow us to evaluate the output of the visual front-end

28 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Evaluation on the CUAVE Database

29 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Audio-Visual ASR: Database Subset of CUAVE database used:  36 speakers (30 training, 6 testing)  5 sequences of 10 connected digits per speaker  Training set: 1500 digits (30x5x10)  Test set: 300 digits (6x5x10) CUAVE database also contains more complex data sets: speaker moving around, speaker shows profile, continuous digits, two speakers (to be used in future evaluations) CUAVE was kindly provided by the Clemson University

30 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Recognition Results (Word Accuracy) Data  Training: ~500 digits (29 speakers)  Testing: ~100 digits (4 speakers) AudioVisualAudiovisual Classification99%46%85% Recognition98%26%78%

31 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Future Work Visual Front-end  Better trained AAM  Temporal tracking Feature fusion  Experimentation with alternative DBN architectures  Automatic stream weight determination Integration with non-linear acoustic features Experiments on other audio-visual databases Systematic evaluation of visual features

32 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ICCS-NTUA in HIWIRE: 1 st Year Evaluation  DatabasesCompleted  BaselineCompleted WP1  Noise Robust FeaturesResults 1 st Year Modulation Features Results 1 st Year Fractal Features Results 1 st Year  Audio-Visual ASR Baseline + Visual Features  Multi-microphone arrayExploratory Phase  VADPrelim. Results WP2  Speaker NormalizationBaseline  Non-native Speech DatabaseCompleted

33 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 User Robustness, Speaker Adaptation VTLN Baseline  Platform: HTK  Database: AURORA 4  Fs = 8 kHz  Scenarios: Training, Testing  Comparison with MLLR Collection of non-Native Speech Data Completed  10 Speakers  100 Utterances/Speaker

34 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Vocal Tract Length Normalization Implementation: HTK Warping Factor Estimation  Maximum Likelihood (ML) criterion Frequency Warping Figures from Hain99, Lee96

35 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 VTLN Training  AURORA 4 Baseline Setup  Clean (SIC), Multi-Condition (SIM), Noisy (SIN) Testing  Estimate warping factor using adaptation utterances (Supervised VTLN) Per speaker warping factor (1, 2, 10, 20 Utterances)  2-pass Decoding 1 st pass  Get a hypothetical transcription Alignment and ML to estimate per utterance warping factor 2 nd pass  Decode properly normalized utterance

36 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Databases: Aurora 4 Task: 5000 Word, Continuous Speech Recognition WSJ0: (16 / 8 kHz) + Artificially Added Noise 2 microphones: Sennheiser, Other Filtering: G712, P341 Noises: Car, Babble, Restaurant, Street, Airport, Train Station Training (7138 Utterances per scenario)  Clean: Sennheiser mic.  Multi-Condition: Sennheiser – Other mic., 75% w. artificially added noise @ SNR: 10 – 20 dB  Noisy: Sennheiser, artificially added noise SNR: 10 – 20 dB Testing (330 Utterances – 166 Utterances each. Speaker # = 8)  SNR: 5-15 dB  1-7: Sennheiser microphone  8-14: Other microphone

37 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 VTLN Results, Clean Training

38 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 VTLN Results, Multi-Condition Training

39 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 VTLN Results, Noisy Training

40 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Future Directions for Speaker Normalization Estimate warping transforms at signal level  Exploit instantaneous amplitude or frequency signals to estimate the warping parameters, Normalize the signal Effective integration with model-based adaptation techniques (collaboration with TSI)

41 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ICCS-NTUA in HIWIRE: 1 st Year Evaluation  DatabasesCompleted  BaselineCompleted WP1  Noise Robust FeaturesResults 1 st Year  Audio-Visual ASR Baseline + Visual Features  Multi-microphone arrayExploratory Phase  VADPrelim. Results WP2  Speaker NormalizationBaseline  Non-native Speech DatabaseCompleted

42 WP1: Appendix Slides Aurora 3

43 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 ASR Results Ι 18.1755.9732.8426.1538.6043.6959.92 MFCC*+ FMP 55.30 56.50 54.35 52.75 TIMIT+ Car TIMIT-Based Speech Databases (Correct Phone Accuracies (%)) 15.5830.9225.3836.8743.7059.34 MFCC*+ IF-Mean * MFCC+C 0 +D+DD, # states=3, # mixtures=16 17.6231.0526.0339.2543.5359.61 MFCC*+ IA-Mean 24.2638.4034.7441.6142.4058.89 TEner. CC -18.6017.7227.7142.4258.40MFCC* Av. Rel. Improv. TIMIT+ Pink TIMIT+ White TIMIT+ Babble NTIMITTIMIT Features

44 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Experimental Results IIa (HTK) 59.59 %89.8782.7392.4694.41MFCC*+FMP Aurora3 (Spanish Task) (Correct Word Accuracies (%)) 84.20 87.99 90.70 83.86 74.93 Average -51.5580.3192.94 Aurora Front-End (WI007) * MFCC+log(Ener)+D+DD+CMS, # states=14, # mixtures=14 72.36 77.70 86.85 65.18 HM 36.98 %89.5290.71MFCC*+IF-Mean 52.09 %92.2294.05MFCC*+IA-Mean 62.90 %91.6193.64 TEnerCC+log(Ener ) +CMS +35.62 %92.7393.68MFCC* Av. Rel. Improv. MMWM Scenario Features

45 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Aurora 3 Configs HM  States 14, Mix’s 12 MM  States 16, Mix’s 6 WM  States 16, Mix’s 16

46 WP1: Appendix Slides Aurora 2

47 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Baseline: Aurora 2 Database Structure:  2 Training Scenarios, 3 Test Sets, [4+4+2] Conditions, 7 SNRs per Condition: Total of 2x70 Tests Presentation of Selected Results:  Average over SNR.  Average over Condition.  Training Scenarios: Clean- v.s Multi- Train.  Noise Level: Low v.s. High SNR.  Condition: Worst v.s. Easy Conditions.  Features: MFCC+D+A v.s. MFCC+D+A+CMS Set up: # states 18 [10-22], # mix’s [3-32], MFCC+D+A+CMS

48 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Average Baseline Results: Aurora 2 57.58 84.17 Developed Baseline Baseline* 82.0484.0481.9078.66 Multi 59.1055.4858.26 Clean Best CMS Best PLAIN CMSPLAIN Training Scenario * Average HTK results as reported with the database. Average over all SNR’s and all Conditions Plain: MFCC+D+A, CMS: MFCC+D+A+CMS. Mixture #: Clean train (Both Plain,CMS) 3, Multi train Plain: 22, CMS: 32. Best: Select for each condition/noise the # mix’s with the best result.

49 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +12%

50 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +40%

51 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +27%

52 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Results: Aurora 2 Up to +61%

53 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Aurora 2 Distributed, Multicondition Training Multicondition Training - Full AB C Subw ay Babb leCar Exhibiti on Avera ge Restaur ant Stre et Airp ort Stati on Avera ge Subway M Street M Avera ge Clean98,68 98,5 2 98, 3998,4998,5298,68 98,5 2 98,3 9 98,4 998,5298,5098,5898,5498,52 20 dB97,61 97,7 3 98, 0397,4197,7096,87 97,5 8 97,4 4 97,0 197,2397,3096,5596,9397,35 15 dB96,47 97,0 4 97, 6196,6796,9595,30 96,3 1 96,1 2 95,5 395,8296,3595,5395,9496,29 10 dB94,44 95,2 8 95, 7494,1194,8991,96 94,3 5 93,2 9 92,8 793,1293,3492,5092,9293,79 5 dB88,36 87,5 5 87, 8087,6087,8383,54 85,6 1 86,2 5 83,5 284,7382,4182,5382,4785,52 0 dB66,90 62,1 5 53, 4464,3661,7159,29 61,3 4 65,1 1 56,1 260,4746,8254,4450,6359,00 -5dB26,13 27,1 8 20, 5824,3424,5625,51 27,6 0 29,4 1 21,0 725,9018,9124,2421,5824,50 Avera ge88,76 87,9 5 86, 5288,0387,8285,39 87,0 4 87,6 4 85,0 186,2783,2484,3183,7886,39

54 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Aurora 2 Distributed, Clean Training Clean Training - Full AB C Subw ay Babb leCar Exhibiti on Avera ge Restaur ant Stre et Airp ort Stati on Avera ge Subway M Street M Avera ge Clean98,93 99,0 0 98, 9699,2099,0298,93 99,0 0 98,9 6 99,2 099,0299,1498,9799,0699,03 20 dB97,05 90,1 5 97, 4196,3995,2589,99 95,7 4 90,6 4 94,7 292,7793,4695,1394,3094,07 15 dB93,49 73,7 6 90, 0492,0487,3376,24 88,4 5 77,0 1 83,6 581,3486,7788,9187,8485,04 10 dB78,72 49,4 3 67, 0175,6667,7154,77 67,1 1 53,8 6 60,2 959,0173,9074,4374,1765,52 5 dB52,16 26,8 1 34, 0944,8339,4731,01 38,4 5 30,3 3 27,9 231,9351,2749,2150,2438,61 0 dB26,019,28 14, 4618,0516,9510,96 17,8 4 14,4 1 11,5 713,7025,4222,9124,1717,09 -5dB11,181,57 9,3 99,607,943,47 10,4 68,238,457,6511,8211,1511,498,53 Avera ge69,49 49,8 9 60, 6065,3961,3452,59 61,5 2 53,2 5 55,6 355,7566,1666,1266,1460,06

55 WP1: Appendix Slides Audio Visual: Details

56 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Introduction: Motivations for AV-ASR Audio-only ASR does not work reliably in many scenarios:  Noisy background (e.g. car's cabin, cockpit)  Interference between talkers Need to enhance the auditory signal when it is not reliable Human speech perception is multimodal:  Different modalities are weighed according to their reliability  Hearing impaired people can lipread  McGurk Effect (McGurk & MacDonald, 1976) Machines should also be able to exploit multimodal information

57 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Audio-Visual Feature Fusion Audio-visual feature integration is highly non-trivial:  Audio & visual speech asychrony (~100 ms)  Relative reliability of streams can vary wildly  Many approaches to feature fusion in the literature:  Early integration  Intermediate integration  Late integration  Highly active research area (mainly machine learning)  The class of Dynamic Bayesian Networks (DBNs) seems particularly suited for the problem:  Stream interaction explicitly modeled  Model parameter inference is more difficult than in HMM

58 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Visual Front-End AAM Parameters First frame of the 36 videos manually annotated 68 points on the whole face as shape landmarks Color appearance sampled at 10000 pixels Eigenvectors retained explain 70% variance  5 eigenshapes & 10 eigenfaces Initial condition at each new frame the converged solution at the previous frame Inverse-compositional gradient descent algorithm Coarse-to-fine refinement (Gaussian pyramid - 3 scales)

59 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 AV-ASR Experiment Setup Features:  Audio: 39 features (MFCC_D_A)  Visual (upsampled from 30 Hz to 100 Hz):  5 shape features (Sh)  10 appearance features (App)  Audio-Visual: 39+45 feats (MFCC_D_A+SHAPP_D_A) Two-stream HMM  8 state, left-to-right HMM whole-digit models with no state skipping  Single Gaussian observation probability densities  Separate audio & video feature streams with equal weights (1,1)

60 WP1: Appendix Slides Aurora 4

61 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Aurora 4, Multi-Condition Training 7138 utterances 3569 utterances (Sennheiser mic) 3569 utterances (2nd mic) 893 (no noise added) 2676 (1 out of 6 noises added at SNRs between 10 and 20 dB) Multicondition training 2676 (1 out of 6 noises added at SNRs between 10 and 20 dB) 893 (no noise added)

62 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Aurora 4, Noisy Training 7138 utterances 3569 utterances (Sennheiser mic) 3569 utterances (Sennheiser2nd mic) 893 (no noise added) 2676 (1 out of 6 noises added at SNRs between 10 and 20 dB) Multicondition Noisy training 2676 (1 out of 6 noises added at SNRs between 10 and 20 dB) 893 (no noise added)

63 HIWIRE ICCS - NTUA HIWIRE Meeting, Granada June 2005 Aurora 4, Noisy Training 7138 utterances 3569 utterances (Sennheiser mic) 3569 utterances (Sennheiser2nd mic) 893 (no noise added) 2676 (1 out of 6 noises added at SNRs between 10 and 20 dB) Multicondition Noisy training 2676 (1 out of 6 noises added at SNRs between 10 and 20 dB) 893 (no noise added)


Download ppt "ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: Computer Vision, Speech Communication and."

Similar presentations


Ads by Google