KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Experiments and Results EMG-based speech.

Slides:



Advertisements
Similar presentations
UI_PRI Communication Concept Service Robotics Group Institute of System Engineering and Robotics Bulgarian Academy of Sciences.
Advertisements

© 2010 Wipro Ltd - Confidential SGSN Automation Testing Using TTCN3 Authors: Jyothi Gavara Nikhil Rahul Ekka.
Joshua Fabian Tyler Young James C. Peyton Jones Garrett M. Clayton Integrating the Microsoft Kinect With Simulink: Real-Time Object Tracking Example (
14th week, Applications Hand Gesture Recognition and Virtual Game Control Based on 3D Accelerometer and EMG Sensors Spring Semester, 2010.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Ubiquitous Computing Definitions Ubiquitous computing is the method of enhancing computer use by making many computers available throughout the physical.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
1 Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett 3/26/99.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Advances in WP2 Trento Meeting – January
Computational Steering on the GRID Using a 3D model to Interact with a Large Scale Distributed Simulation in Real-Time Michael.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
David Kim 396 Devin Galutira 396 Calvin Choy 396 Audio Feedback Handwriting Accessor.
–Streamline / organize Improve readability of code Decrease code volume/line count Simplify mechanisms Improve maintainability & clarity Decrease development.
Workflow API and workflow services A case study of biodiversity analysis using Windows Workflow Foundation Boris Milašinović Faculty of Electrical Engineering.
PART A Emac Lisp   Emac Lisp is a programming language  Emacs Lisp is a dialect.
16/05/ Use of Hybrid System Modeling Technique for Robot-Assisted Rehabilitation Systems Duygun Erol Yeditepe University Electrical & Electronics.
31 st October, 2012 CSE-435 Tashwin Kaur Khurana.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Leslie Luyt Supervisor: Dr. Karen Bradshaw 2 November 2009.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Fall 2002CS/PSY Pervasive Computing Ubiquitous computing resources Agenda Area overview Four themes Challenges/issues Pervasive/Ubiquitous Computing.
SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Final Presentation Industrial project Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,
Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.
Development of a Distributed MATLAB Environment with Real-Time Data Visualization Authors: Joseph Diamond, Richard McEver Affiliation: Dr. Jian Huang,
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Integration of behavioral and physiological data with imaging data James Voyvodic, Ph.D. Brain Imaging and Analysis Center Duke University June 30, 2008.
ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*, Laurent Besacier*, Tanja Schultz** * CLIPS-IMAG Laboratory,
Hand Motion Identification Using Independent Component Analysis of Data Glove and Multichannel Surface EMG Pei-Jarn Chen, Ming-Wen Chang, and and Yi-Chun.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani Thuraisingham Dr. Latifur Khan Dr. Murat Kantarcioglu.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Google Web Toolkit for Mobile Applications Development INGENUITY AT ITS BEST……………….
TTCN-3 Testing and Test Control Notation Version 3.
Modular Abstraction of Complex Real Time Analysis Benjamin P. Campbell Faculty Advisor: Dr Joel Henry, Ph.D. Department of Computer Science University.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
CIRP Annals - Manufacturing Technology 60 (2011) 1–4 Augmented assembly technologies based on 3D bare-hand interaction S.K. Ong (2)*, Z.B. Wang Mechanical.
Olivier Siohan David Rybach

Automatic Speech Recognition
Developing an Android application for
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Electrical & Electronics Engineering Department
Speech Capture, Transcription and Analysis App
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Automatic Speech Recognition: Conditional Random Fields for ASR
Pervasive Computing Ubiquitous computing resources
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Raveen Wijewickrama Anindya Maiti Murtuza Jadliwala
Deep Neural Network Language Models
Presentation transcript:

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Experiments and Results EMG-based speech recognition: Electrical potentials of a user’s facial muscles are captured in order to recognize speech Multi-stream setup consisting of 8 streams Each stream corresponds to a phonetic feature and is modeled with GMMs 20.88% WER on the EMG-UKA corpus (session-dependent, 108 word vocabulary) Airwriting recognition: User’s hand is used as a stylus and text is written in the air and captured by a wearable device Handwriting motion is measured by an accelerometer and a gyroscope Corpus contains recordings of 9 subjects with 80 sentences each 11% WER with an 8000 word vocabulary (leave-one-out cross-validation) 3% WER for the user-dependent case Automatic Speech Recognition – Real-time factor: Real-time factor of BioKIT using a Kaldi trained DNN acoustic model on Vietnamese DNN output layer size is 2,630 Language model is 3/5-gram respectively Test is run on an Intel Core i with 3.4GHz Executing 4 threads in parallel Automatic Speech Recognition – Decoding: Comparing error rates of BioKIT and Kaldi Using Kaldi trained DNN acoustic models Tested on Bulgarian(BG), Czech(CZ), German(GE), Mandarin(CH), and Vietnamese(VN) With exception of Mandarin tested with two different language models each Pruning parameters were similar Same number of active nodes Same global pruning beam Both achieve similar performance Differences in results not significant at a significance level of 0.05 except for BGs and CH systems BioKIT - Real-time decoder for biosignal processing Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology (KIT) Goals and Motivation Fast setup of experiments and flexible code-base  Python scripting layer for fast setup of experiments  Modular C++ core to allow for expansion with new algorithms Flexible processing and modeling  Matrices are represented as NumPy arrays in Python  Any functions in SciPy library are directly available Processing of big amounts of data  Utterance-level parallelization and sharing of components between threads Online-capability  Example scripts allow for easy adaptation to different applications Error analysis  Integrated tool allows for analysis of decoding passes Accessibility  General terminology  Tutorials for various use-cases Error Analysis Capabilities of current tool are similar to work presented by Lin Chase* Statistics for analysis can be directly created with the decoding Results yield confusion tables as well as a sentence by sentence listing of errors *L. L. Chase, “Error-Responsive Feedback Mechanisms for Speech Recognizers,” Ph.D. dissertation, Pittsburgh, PA: Carnegie Mellon University, Conclusion Toolkit is suitable for several human-machine interfaces Can be used with a variety of different biosignals Easily extendable due to two layer design Comparable results to the Kaldi decoder Error Analysis yields additional feedback SystemWordsPPLN-gramsBioKITKaldi BGs100k3861.7m12.50%12.84% BGb100k m11.76%12.16% CZs33k1,6444.3m9.19%9.23% CZb33k1, m8.73%8.66% GEs37k6732.2m10.89%10.85% GEb37k m9.50%9.76% CH71k5035.0m17.14%16.90% VNs30k2471.7m8.17%8.10% VNb30k m6.99%7.10% Reference-Frames: Hypothesis-Frames: Reference:with(3)thatsurge has(2) Hypothesis:with(3)thatsearches Scorer-Ref: Scorer-Hypo: TSM-Ref: TSM-Hypo: Error-Category:CORRECT SCORER_TSM_ERROR