1 2009 Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
Study Strategies for Mid-Terms! Online Workshop Russell Conwell Center Natalie Walker.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Creating a Program In today’s lesson we will look at: what programming is different types of programs how we create a program installing an IDE to get.
٠ Orientation ٠ Lingma Acheson Department of Computer and Information Science, IUPUI CSCI N331 VB.NET Programming.
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
1 Intro to CIT 594 ~matuszek/cit594.html.
Intro to CIT 594
CMSC 132: Object-Oriented Programming II Nelson Padua-Perez William Pugh Department of Computer Science University of Maryland, College Park.
OBJECT ORIENTED PROGRAMMING I LECTURE 1 GEORGE KOUTSOGIANNAKIS
CSC 171 – FALL 2004 COMPUTER PROGRAMMING LECTURE 0 ADMINISTRATION.
Using MyMathLab Features You must already be registered or enrolled in a current MyMathLab class in order to use MyMathLab. If you are not registered or.
Intro to CIT 594 ~matuszek/cit594.html.
Geant4 Documentation and User Support Geant4 Users Workshop February 2002 Dennis Wright (SLAC)
Documentation 1. User Documentation 2. Technical Documentation 3. Program Documentation.
PART A Emac Lisp   Emac Lisp is a programming language  Emacs Lisp is a dialect.
This chapter is extracted from Sommerville’s slides. Text book chapter
Introduction to Automatic Speech Recognition
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 6 Academic Software START This multimedia product and its contents are protected under copyright.
CMU-Statistical Language Modeling & SRILM Toolkits
Parts of a Computer Why Use Binary Numbers? Source Code - Assembly - Machine Code.
Capstone Experience at UNH Manchester Student Guided Mentoring for an Undergraduate Research Group in Speech Capstone Objectives Challenges Technology.
© 2008 by PACT PACT Scorer Training Pilot.
Sadegh Aliakbary Sharif University of Technology Fall 2010.
COMP Introduction to Programming Yi Hong May 13, 2015.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.
Sadegh Aliakbary Sharif University of Technology Fall 2012.
Configuration Management (CM)
Introduction to R Lecture 1: Getting Started Andrew Jaffe 8/30/10.
The Basics of Javadoc Presented By: Wes Toland. Outline  Overview  Background  Environment  Features Javadoc Comment Format Javadoc Program HTML API.
Mater Research Support Centre Statistics Course 2006 Introductory Talk.
TGP2281: Game Programming III also better known as Game AI.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
CSC 230: C and Software Tools Rudra Dutta Computer Science Department Course Introduction.
MITM743 Advanced Project Management Introduction To The Class.
CSE S. Tanimoto Java Introduction 1 Java A Programming Language for Web-based Computing with Graphics.
Using MyMathLab Features of MyMathLab You must already be registered or enrolled in a current MyMathLab class in order to use MyMathLab. If you are not.
Digital Learning India 2008 July , 2008 Mrs. C. Vijayalakshmi Department of Computer science and Engineering Indian Institute of Technology – IIT.
Lecture 1. Introduction to Programming and Java MIT- AITI 2003.
Training Tied-State Models Rita Singh and Bhiksha Raj.
Fall 2010 ICS321 Data Storage & Retrieval Mon & Wed 12-1:15 PM Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
University of Macau Faculty of Science and Technology Computer and Information Science SFTW 241 Programming Languages Architecture 1 Group B5.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
NAS101, Appendex A, Page 1 DOCUMENTATION This section briefly describes the MSC.Nastran documentation. A quick overview of these documents is shown in.
Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Lesson Plans Objectives
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Welcome to HWC Math Department Colloquium #2 A rapid intro to Mathematical Software (Mathematica especially)
CSM06: Information Retrieval Notes about writing coursework reports, revision and examination.
20-753: Fundamentals of Web Programming Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Lecture 15: Java Basics Fundamentals of Web Programming.
Spring 2008 Mark Fontenot CSE 1341 – Honors Principles of Computer Science I Note Set 1 1.
Founded by Big Five Consulting ex-employees Oracle Gold Partner Focus on PeopleSoft 15 years of PeopleSoft experience Worked in both technical and functional.
Component 1.6.
CS6501 Advanced Topics in Information Retrieval Course Policy
Introduction to Our Programming Tools
Course Introduction – Fall 2014
Tools for Natural Language Processing Applications
2017年6月4日更新 1. イントロダクション 東北大学 大学院工学研究科 嶋田 慶太.
CMPE 152: Compiler Design ANTLR 4 and C++
CALO Decoder Progress Report for April/May
Jeremy Bolton, PhD Assistant Teaching Professor
Sphinx Recognizer Progress Q2 2004
Accelerated Introduction to Computer Science
CSCE 221 Professor Lupoli TAMU CSCE 221 Intro.
Review of Previous Lesson
Presentation transcript:

Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome

2 What will the course be about We will cover most relevant topics of speech recognition The focus will be on the theory and practice –We will not discuss code for the most part –We will keep maths out of it as far as possible, however We will discuss algorithms and implementation details

3 Instructors Bhiksha Raj: Carnegie Mellon University –Expert in speech recognition Rita Singh: Carnegie Mellon University –Expert in speech recognition Peter Wolf: Independent Consultant –Previously in Dragon Systems Inc. –Sphinx4 expert, expert in speech recogintion application development –Brought in primarily as a resource for helping with sphinx4 and answering applications related questions

4 Format of Course 3 Lectures daily –Morning: 8.00 AM, 1.00 – 1.30 ours –Late Morning / Early Afternoon: 11:00 AM –Afternoon: 2.30 PM The schedule is flexible – timings may vary depending on how much is covered Lectures expected to last 1.00 – 1.5 hours each Intervening times expected to be taken up by exercises

5 Instruction Format Lectures will be pictorially oriented Although we will cover general topics, the specific implementations described will be based on CMU Sphinx –Most other systems are similar Exercises will be based on sphinx

6 Lecture Outline: Day 1 Lecture 1: “Speech recognition for dummies” –a quick development of speech recognition as string matching Lecture 2: “Feature computation” –Explaining how features are computed for speech recognition, including all signal processing Lecture 3: “Hidden Markov Models” –Describing HMMs and all associated problems

7 Lecture Outline: Day 2 Lecture 1: “Training From Continuous Speech” –How to train models from continuous speech –Phonemes, why we need them and how to train them Lecture 2: “Context dependent phonemes” –What are context dependent phonemes –Various types of context dependent phonemes –Training CD phonemes Lecture 3: “Decision Trees and State Tying” –All about decision trees for parameter sharing in ASR systems

8 Lecture Outline: Day 3 Lecture 1: “Training context-dependent models with tied states” –A (relatively) short lecture explaining the final overall process for training models Lecture 2: “Language Modelling” –How to model “language” for speech recognition –Statistical language modelling Lecture 3: “Decoding: Basics” –Describing the basic ideas behind the decoding strategies for continuous speech

9 Lecture Outline: Day 4 Lecture 1: “Decoding: Advanced” –Explaining various more advanced approaches to decoding Arriving at the state of art Lecture 2: “Advanced Topics” –Adaptation, Normalization, Discriminative Training etc. Session 3: Open. –Any spillover –Question Answering

10 Exercises: Day 1 There will be exercises following most lectures Lecture 1: None Lecture 2: Exercise on capture and feature computation from speech signals Lecture 3: None

11 Exercises: Day 2 Lecture 1: “Training From Continuous Speech” –Exercise on training phoneme models and recognizing with them Lecture 2: “Context dependent phonemes” –Exercise on training models for context-dependent phonemes and recognizing with them Lecture 3: “Decision Trees and State Tying” –Exercise on learning decision trees

12 Exercises: Day 3 Lecture 1: “Training context-dependent models with tied states” –Exercise on complete training of the ASR system Lecture 2: “Language Modelling” –Exercises on building JSGF grammars and Ngram LMs for speech recognition Lecture 3: “Decoding: Basics”

13 Lecture Outline: Day 4 Lecture 1: “Decoding: Advanced” –Decoding with various speech recognition system variants: Sphinx3 flat, Sphinx3 tree, Sphinx4 Lecture 2: “Advanced Topics” –No exercises Session 3: Open. –No exercises

14 Software to Install We will be using the CMU sphinx extensively –Sphinxtrain –Sphinx3 decoder –Sphinx4 decoder –CMU LM Toolkit or SRI LM Toolkit We will need additional software to go with it –Java, ant, groovy for S4

15 Sphinx Downloads:

16 Sphinxbase: –Click on the “sphinxbase” link on the left –Click “all releases” –Download version tar.bz2?use_mirror=superb-easthttp://downloads.sourceforge.net/cmusphinx/sphinxbase tar.bz2?use_mirror=superb-east Sphinx3: –Click on “sphinx3” link on left –Click on “all releases” –Download version zip?use_mirror=internap Sphinx Downloads:

17 Cepview: –Click on the “cepview” link on the left lm3g2dmp: –Click on “lm3g2dmp” link on left The above two are visualization / data-structure optimization tools and are not critical –But they are small, so you might as well download them CMULM toolkit: You may install SRI LM toolkit instead –Better maintained – CMU toolkit is not currently maintained Sphinx Downloads:

18 Sphinx4: –For this workshop download a copy of sphinx that is under development at github.com – Click on download link –Caveat: some scripts may not run; if so we will revert to release version Sphinx4 will also need –Java JDK from –Apache ant -- from –A useful scripting tool (some of our latest scripts are in it): Groovy –Groovy can be had from Bookmark this link: – html Sphinx Downloads:

19 Operating Systems Sphinxbase and Sphinx3 packages have been tried and tested on linux –We are not windows people Suggestion: Prefer linux-based machines –You may also try to run these programs on cygwin under windows Sphinx* should compile under cygwin Install “tcsh” under cygwin We will provide tcsh scripts Sphinx4 is platform independent

20 Additional Packages Would be useful to have a visualization tool –Need to visualize matrices as surfaces Matlab would be great If you don’t have matlab, download octave –

21 Data You may use any data you wish to For exercise we will attempt to provide a small amount of data –As much as can be dealt with on your computers

22 Questions ?