The Speech Solution www.infovox.se www.babeltech.com GENERAL OVERVIEW OF THE BABEL DEMONSTRATOR SYSTEM RESPITE PROJECT.

Slides:



Advertisements
Similar presentations
Bringing Procedural Knowledge to XLIFF Prof. Dr. Klemens Waldhör TAUS Labs & FOM University of Applied Science FEISGILTT 16 October 2012 Seattle, USA.
Advertisements

Data Manipulation Overview and Applications. Agenda Overview of LabVIEW data types Manipulating LabVIEW data types –Changing data types –Byte level manipulation.
IQSM Intelligent Quality Security Management Demonstration of features.
Bidirectional Interface to SolidWorks
Exploring Word Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 4: Advanced Features: Tables, Styles, and Sections.
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
Microsoft Word Objectives: Word processing using Microsoft Word
1 ADVANCED MICROSOFT POWERPOINT Lesson 5 – Using Advanced Text Features Microsoft Office 2003: Advanced.
An End-User Perspective On Using NatQuery Building a Datawarehouse T
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
 2006 Pearson Education, Inc. All rights reserved Introduction to the Visual C# 2005 Express Edition IDE.
Integrating Access with the Web and with Other Programs.
ITCS 6010 XML Grammars. What is a Grammar? Specifies what can be said—all the possible sentences and phrases that can be recognized Includes entry via.
Introduction To Form Builder
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Visual Documentation v User Interface Active class (for selection and some processes)
ADVANCED MICROSOFT POWERPOINT Lesson 6 – Creating Tables and Charts
Bertrand Bellenot root.cern.ch ROOT I/O in JavaScript Reading ROOT files from any web browser ROOT Users Workshop
Reading ROOT files in any browser ROOT I/O IN JAVASCRIPT Bertrand Bellenot CERN, PH-SFT.
Oppenheimer Technologies Rick King Jonathan Creekmore.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Tutorial 7: Sub and Function Procedures1 Tutorial 7 Sub and Function Procedures.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Getting Started with Expression Web 3
WorkPlace Pro Utilities.
WFDB / PhysioNet Formats George B. Moody Harvard-MIT Division of Health Sciences and Technology Cambridge, Massachusetts.
ULI101 – XHTML Basics (Part II) What is Markup Language? XHTML vs. HTML General XHTML Rules Block Level XHTML Tags XHTML Validation.
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
Ansys Workbench 1 Introduction
Chapter 11 Adding Sound and Video. Chapter 11 Lessons 1.Work with sound 2.Specify synchronization options 3.Modify sounds 4.Use ActionScript with sound.
XP Dreamweaver 8.0 Tutorial 3 1 Adding Text and Formatting Text with CSS Styles.
Linux Audio Mangler Project Design Presentation Yu Chong Hector Urtubia Tony Zuliani.
Tutorial 111 The Visual Studio.NET Environment The major differences between Visual Basic 6.0 and Visual Basic.NET are the latter’s support for true object-oriented.

PowerBuilder Online Courses - by Prasad Bodepudi
© 2011 Delmar, Cengage Learning Chapter 8 Using Styles and Design Style Sheets for Design.
Interfaces to External EDA Tools Debussy Denali SWIFT™ Course 12.
IBM Software Group ® Overview of SA and RSA Integration John Jessup June 1, 2012 Slides from Kevin Cornell December 2008 Have been reused in this presentation.
Key Applications Module Lesson 21 — Access Essentials
Fundamentals of C and C++ Programming. EEL 3801 – Lotzi Bölöni Sub-Topics  Basic Program Structure  Variables - Types and Declarations  Basic Program.
HDF and HDF-EOS Workshop VIII, October 26-28, /12 Peter Cao, National Center for Supercomputing Applications This work is supported in part by a.
Rails & Ajax Module 5. Introduction to Rails Overview of Rails Rails is Ruby based “A development framework for Web-based applications” Rails uses the.
DBT544. DB2/400 Advanced Features Level Check Considerations Database Constraints File Overrides Object and Record Locks Trigger Programs.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Virtual Instrumentation With LabVIEW. Front Panel Controls = Inputs Indicators = Outputs Block Diagram Accompanying “program” for front panel Components.
Chapter 14 Applets and Advanced GUI  The Applet Class  The HTML Tag F Passing Parameters to Applets F Conversions Between Applications and Applets F.
Tutorial 3 Adding and Formatting Text with CSS Styles.
Acoustics Research Institute S_TOOLS -STx Supplement Speech Formant Tracking and Fundamental Frequency Extraction Default Parameter Setting.
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
ElVis Improvements Summer 2008 Eric Zatz PPPL Summer Intern Mentor – Eliot Feibush August 11, 2008.
Source Controller software Ianos Schmidt The University of Iowa.
© 2011 Delmar, Cengage Learning Chapter 11 Adding Sound and Video.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 8 1 Microsoft Office Access 2003 Tutorial 8 – Integrating Access with the.
Learning Aim B.  In this section, you will consider the resources necessary for designing your website.  You will also think about any constraints that.
TEAM FOUNDATION VERSION CONTROL AN OVERVIEW AND WALKTHROUGH By: Michael Mallar.
1 Lesson 6 Introducing JavaScript HTML and JavaScript BASICS, 4 th Edition.
Control System Overview J. Frederick Bartlett Fermilab June 1,1999.
The Speech Solution BABEL DEMONSTRATOR RESPITE PROJECT.
8 Copyright © 2005, Oracle. All rights reserved. Managing Schema Objects.
Preface IIntroduction Course Objectives I-2 Course Content I-3 1Introduction to Oracle Reports Developer Objectives 1-2 Business Intelligence 1-3 Enterprise.
1 Middle East Users Group 2008 Self-Service Engine & Process Rules Engine Presented by: Ryan Flemming Friday 11th at 9am - 9:45 am.
Working in the Forms Developer Environment
James K Beard, Ph.D. April 20, 2005 SystemView 2005 James K Beard, Ph.D. April 20, 2005 April 122, 2005.
Case-Based Reasoning System for Bearing Design
I/O in C Lecture 6 Winter Quarter Engineering H192 Winter 2005
Introduction C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell.
Exploring Microsoft Word 2003
Presentation transcript:

The Speech Solution GENERAL OVERVIEW OF THE BABEL DEMONSTRATOR SYSTEM RESPITE PROJECT

The Speech Solution Some General features Based on the wavesurfer program developped by KTH. Why ? –Platform independent - TCL-TK based programming –Plug-in based -> easy to extend –Free What ? –Plug-in for the integration of the work of all the partners in a single program

The Speech Solution Babel demonstrator The demo interface provides access to the ASR process at different anchor points such as : sampled speech signal, acoustic features, state likelihoods, recognized sentences.

The Speech Solution Customisable interface Each block can be processed independently by calling a user-defined external program, The only constraint is the compatibility with the input/output data format.

The Speech Solution Data display Display is fully handled by the ASR interface by specific plug-ins for the 3 different data types: samples, acoustic features, probabilities/likelihoods. Automatic time-alignment of the different data streams on the speech signal. Allow display of block’s internal data (multi-stream format). Dynamic internal data management : Automatic update of data display when necessary The definition of the data format must include all the information required by these constraints.

The Speech Solution Data format Frame based Each frame can contain several synchronized data streams of any of the pre-defined data-types Binary header: Sample rate in Hzunsigned long Frame length in msfloat Frame shift in msfloat Number of streamsunsigned long Name stream #1 32 char string Type stream #1unsigned long Frame size stream #1unsigned long Name stream #232 char string Type stream #2unsigned long Frame size stream #2unsigned long The stream names are used for identifying displayed windowpane.

The Speech Solution Data format (con’t) The stream type is one of the following: 1 – samples (PCM16)signed short 2 – featuresfloat 3 – probabilitiesfloat The actual data are then formatted the following way, for each frame: Time index in milliseconds for current frame - float Data stream #1 Data stream #2 … The time index is used for time alignment of different data streams. A time index of –1 means the end of the current utterance.

The Speech Solution Data format (con’t) The word hypothesis are written in Timit format: start_time(samples) end_time(samples) word_hyp for instance: Sil one eight six

The Speech Solution Status point The interface consists in 4 plug-ins: - samples.plug - features.plug - recognize.plug - probabilities.plug Those plug-ins are compatible with WAVESURFER v1.2 and higher. Under Windows, copy these files in %HOME%/.wavesurfer/1.3/plugins Under Linux, copy these files in $HOME/.wavesurfer/1.3/plugins

The Speech Solution Display of samples The display of the samples is very similar to the one of WAVESURFER. You can either plot the waveform or the spectrogram by selecting the menu of the samples window pane.

The Speech Solution Display of features The features are displayed as pseudo-spectrograms. By default, the features values are normalized, i.e. each feature parameter is normalized over time. This can be modified by selecting the menu of the features window pane. You can also select the range of feature parameters you want to display.

The Speech Solution Display of probabilities The probabilities/likelihoods can either be plotted or displayed as a pseudo- spectrogram. Again some options are available from the menu. You can, for instance, specify the name of a file containing the symbols related to each probability.

The Speech Solution Status point Integration of other partners work :  FPMS : integration of the multi-band approach. Display of the frequency bands features and probabilities.  ICP : ?  Sheffield University : Missing data ?

The Speech Solution Demonstration … KTH agreed to integrate the demonstration package into the distribution of the wavesurfer program. Link to the RESPITE web page ? Publically available for research purpose