Download presentation
Presentation is loading. Please wait.
Published byMagdalen Stone Modified over 9 years ago
1
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Experiments and Results EMG-based speech recognition: Electrical potentials of a user’s facial muscles are captured in order to recognize speech Multi-stream setup consisting of 8 streams Each stream corresponds to a phonetic feature and is modeled with GMMs 20.88% WER on the EMG-UKA corpus (session-dependent, 108 word vocabulary) Airwriting recognition: User’s hand is used as a stylus and text is written in the air and captured by a wearable device Handwriting motion is measured by an accelerometer and a gyroscope Corpus contains recordings of 9 subjects with 80 sentences each 11% WER with an 8000 word vocabulary (leave-one-out cross-validation) 3% WER for the user-dependent case Automatic Speech Recognition – Real-time factor: Real-time factor of BioKIT using a Kaldi trained DNN acoustic model on Vietnamese DNN output layer size is 2,630 Language model is 3/5-gram respectively Test is run on an Intel Core i7-3770 with 3.4GHz Executing 4 threads in parallel Automatic Speech Recognition – Decoding: Comparing error rates of BioKIT and Kaldi Using Kaldi trained DNN acoustic models Tested on Bulgarian(BG), Czech(CZ), German(GE), Mandarin(CH), and Vietnamese(VN) With exception of Mandarin tested with two different language models each Pruning parameters were similar Same number of active nodes Same global pruning beam Both achieve similar performance Differences in results not significant at a significance level of 0.05 except for BGs and CH systems BioKIT - Real-time decoder for biosignal processing Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology (KIT) Goals and Motivation Fast setup of experiments and flexible code-base Python scripting layer for fast setup of experiments Modular C++ core to allow for expansion with new algorithms Flexible processing and modeling Matrices are represented as NumPy arrays in Python Any functions in SciPy library are directly available Processing of big amounts of data Utterance-level parallelization and sharing of components between threads Online-capability Example scripts allow for easy adaptation to different applications Error analysis Integrated tool allows for analysis of decoding passes Accessibility General terminology Tutorials for various use-cases Error Analysis Capabilities of current tool are similar to work presented by Lin Chase* Statistics for analysis can be directly created with the decoding Results yield confusion tables as well as a sentence by sentence listing of errors *L. L. Chase, “Error-Responsive Feedback Mechanisms for Speech Recognizers,” Ph.D. dissertation, Pittsburgh, PA: Carnegie Mellon University, 1997. Conclusion Toolkit is suitable for several human-machine interfaces Can be used with a variety of different biosignals Easily extendable due to two layer design Comparable results to the Kaldi decoder Error Analysis yields additional feedback SystemWordsPPLN-gramsBioKITKaldi BGs100k3861.7m12.50%12.84% BGb100k30647.3m11.76%12.16% CZs33k1,6444.3m9.19%9.23% CZb33k1,46915.3m8.73%8.66% GEs37k6732.2m10.89%10.85% GEb37k55233.3m9.50%9.76% CH71k5035.0m17.14%16.90% VNs30k2471.7m8.17%8.10% VNb30k17950.6m6.99%7.10% Reference-Frames:73 - 8586 - 103104 - 147 Hypothesis-Frames:73 - 8586 - 103104 - 147 Reference:with(3)thatsurge has(2) Hypothesis:with(3)thatsearches Scorer-Ref:758.64995.682799.51 Scorer-Hypo:758.64995.682725.93 TSM-Ref:5.0658.48179.89 TSM-Hypo:5.0658.48136.58 Error-Category:CORRECT SCORER_TSM_ERROR
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.