Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof.

Similar presentations


Presentation on theme: "1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof."— Presentation transcript:

1 1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace

2 2 Overview Phonetic-based index  open-vocabulary Based on lattice-spotting technique Two-tier database Dynamic-match rules Algorithmic optimisations NOTE: Patented technology

3 3 Concept greasy ? Phone decomposition …………… aenxmdow nxrnayth iysaxrg griys

4 4 Concept Target sequence: Observed sequences: Costs graxsih thaynrnx owdmnxae …………… Dynamic matching axih griys

5 5 Indexing Feature Extraction Segmentation Speech Recognition Sequence Generation Lattices Sequence DB Hyper- Sequence Generation Hyper- Sequence DB Audio

6 6 Hyper-sequence Mapping Map individual phones to “parent” classes –We use Vowels, Fricatives, Glides, Stops and Nasals Simple example –Parent classes: Vowels, Consonants –Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DB

7 7 Hyper-sequence Mapping Hyper-sequence DB Search term: Hyper-sequence: groysih tlowpiy nxsehray draxbae bfaxdaa oybraaf ehgriym …………… Sequence DB CCVCV VCCVC CVCVC …………… …………… …………… griys CCVCV

8 8 Searching Term Sequence DB Hyper- Sequence DB Results Dynamic Matching Keyword Verification Hyper- mapping Phone decomp. Split long terms Merge long terms

9 9 Dynamic Matching Minimum Edit Distance (MED) i.e. Levenshtein Distance Insertions, deletions, substitutions Finds minimum cost of transformation

10 10 Dynamic Matching Substitution costs –Derived from phone confusion statistics

11 11 Optimisations Prefix sequence optimisation Early stopping optimisation Linearised MED search approximation

12 12 Long Term Merging olympic sites owlihmp ksayts owlihmp kp ksayts Search Merge Results

13 13 Keyword Verification Acoustic –Use acoustic score from lattice to boost occurrences with high confidence Neural Network –Produce a confidence score by fusing MED score and Acoustic score Term phone length Term phone classes

14 14 Results Source Type DevSet phone error rate Primary system Contrastive systems No Acous.LTS Only Bnews24%0.2460.2450.208 CTS45%0.1040.1020.080 Confmtg56%0.0210.0190.016 Index size558 MB/Sh (297 MB/Sh for No Acous.) Index speed18x real-time Search speed3 hr searched / CPU-sec Maximum Term-Weighted Value on EvalSet terms

15 15 Conclusion Open-vocabulary and phone-based Patented technology utilises –sequence and hyper-sequence databases –optimisations for rapid searches Advantages –Other languages –Economy of scale

16 16 Conclusion Limitations –Indexing speed and size –Need to split long sequences Future work –Keyword Verification Word-level information (e.g. LVCSR) Acoustic features (e.g. prosody) –Indexing/searching frameworks –Spoken Document Retrieval and other semantic applications

17 17 References 1.A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 2.K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication 3.CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict 4.S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. 5.V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.


Download ppt "1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof."

Similar presentations


Ads by Google