Download presentation
Presentation is loading. Please wait.
Published byBenjamin Bates Modified over 9 years ago
1
Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee ( 李鴻欣 ) Based on Chen and Wang in ISCSLP’08 and Interspeech’09
2
Page-2 Detection-Based ASR Knowledge Detection Knowledge Detection Integration Knowledge (Higher Level) Knowledge (Higher Level) Phonological attr. Prosodic attr. Acoustic attr. … Human SR HMM CRF … HMM CRF … DB ASR Detectors Integrator Results Phone Syllable Word Sentence Semantic info … Phone Syllable Word Sentence Semantic info …
3
Page-3 Phonological Systems SPE (Sound Pattern of English) MV (Multi-valued Feature) GP (Government Phonology) Literatures (N. Chomsky & M. Halle, 1968) (S. King, 2000)?(J. Harris, 1994) Feature Types Production-based, Binary Production-based, 2-10 values Sound structure primes, Binary Feature Number 13611 Examples anterior, nasal, round centrality, front back, manner, phonation, place, roundness
4
Page-4 Phonological Feature Detection (1) MLP (Detectors) hidden layer posterior probability quantization SPE_14 0101...010101...01 0101...010101...01 GP_11 011..01011..01 011..01011..01 ii-4i+4 9 frames 13 MFCCs input layer recurrent time-delay
5
Page-5 Phonological Feature Detection (2) ii-4i+4 9 frames 13 MFCCs MLP (Centrality) MLP (Front-Back) MLP (Roundness) 01000100 01000100 100100 100100 010010 010010 0100100.........0100100100.........010 0100100.........0100100100.........010 MV_29 time-delay 6 MV Features
6
Page-6 Conditional Random Field (CRF) Integrator General Chain CRF state feature functiontransition feature function λ j, μ k : feature function weight parameters............ X y i-1 Output (phone) Input (phonological features) yiyi x i-1 xixi x i+1 Y........................
7
Page-7 CRF Integrator – Training Issues Required Label for CRF Training –Phone: y –Phonological features: x Detectors MLP Detectors MLP Speech Detected-data trained CRF Phonological features (with errors) DT CRF DT CRF Phone labels Mapping phones → phonological features Mapping phones → phonological features Phone labels Oracle-data trained CRF Phonological features OT CRF OT CRF Training Data
8
Page-8 Experiments Corpus: TIMIT –No SA1, SA2 –Training set (3296 utts), Dev set (400 utts) –Test set (1344 utts) Phone set: TIMIT61 –Evaluation: CMU/MIT 39 Baseline –CI-HMM Toolkits –Nico Toolkit (for MLP), CRF++ (for CRF)
9
Page-9 Results (1) Phone Corr. %Phone Acc. % SPE1493.2893.20 GP1198.3998.36 MV2988.7588.56 Model:OT CRF Test:OD Features Phone Corr. %Phone Acc. % HMM-baseline69.0263.45 OT CRF SPE1466.1929.68 GP1169.0331.38 MV2959.2430.33 DT CRF SPE1456.5655.27 GP1155.7454.53 MV2951.8450.68 Model:OT/DT CRF Test:DD Features
10
Page-10 Results (2) Methods# SystemPhone Corr. (%)Phone Acc. (%) HMM baseline169.0263.45 OT: SPE+GP+MV361.9760.65 DT: SPE+GP+MV352.9052.06 OT+DT: SPE+GP+MV660.8159.20 OT: SPE+GP+MV +HMM465.5364.31 DT: SPE+GP+MV +HMM459.5758.64 OT+DT: SPE+GP+MV +HMM764.2262.59 System Fusion
11
Page-11 System Fusion with CRF............ X y i-1 Combined Results (Phone) Phone Sequence yiyi x i-1 xixi x i+1 Y........................ SPE Sys. MV Sys. GP Sys. HMM Sys.
12
Page-12 Two Types of AFDT Imperfection h# n eh ow kcl k w eh ae eh s tcl t ix n Phone AF(A) AF(A’) AF asynchronyAFDT errors
13
Page-13 CRF Training (1) Phone y AFs x t Mapping Table Phone AFs Oracle Data Training Phone y AFs x t AFDT Detected Data Training Detected Errors
14
Page-14 CRF Training (2) Phone y AFs x t AFDT Aligned Data Training AF Sequence
15
Page-15 Results (3) SystemPhone Corr. (%)Phone Acc. (%) Upper Bound OT CRF98.3198.28 AT CRF71.4970.31 Real Case OT CRF70.5534.38 DT CRF57.3056.14 AT CRF64.8762.32 27.97 % acc. drops on the introduction of AF asynchrony Detection Error causes further 7.99 % acc. drop
16
Page-16 AF Asynchrony Compensation AF asynchrony is caused by context variation We can reduce AF asynchrony by letting our systems learn context variation directly – Long-Term information Windows + DCTs MLP Windows + DCTs Right Context Left Context 23 dim Mel MLP 310ms 144Dim 72Dim
17
Page-17 Results (4) Test Data TypeSystemCorrAcc - CI-HMM69.0263.45 - CD-HMM75.7665.78 Detected (real case) OT CRF (±3)75.2447.97 Long Term AFDT + DT CRF (±3)64.5863.12 Ideal (upper bound) Long Term AFDT + AT CRF74.9673.64 MFCC AFDT + AT CRF (±3)72.8771.62 Long Term AFDT + AT CRF (±3)76.8374.97 Detected (real case) Long Term AFDT + AT CRF69.8366.97 MFCC AFDT + AT CRF (±3)66.2163.16 Long Term AFDT + AT CRF (±3)71.0167.67
18
Page-18 Conclusions A well-designed phonological feature system is important –AF asynchrony minimization training and AF-phone synchronization could also be investigated Oracle Trained CRF is able to retrieve more phonological information from speech –High phone correction rate (but sensitive to detection error) –Helpful for combination Detection-Based ASR is promising –A front-end detector is a major issue
19
Page-19 AF and Phone Alignment Using AFDT t t t t t phone sequence AF sequence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.