Speaker Diarisation and Large Vocabulary Recognition at CSTR: The AMI/AMIDA System Fergus McInnes 7 December 2011 History – AMI, AMIDA and recent developments.

Slides:



Advertisements
Similar presentations
Chapter 13 Review Questions
Advertisements

Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Contributing source code to CSDMS Albert Kettner.
Kashif Jalal CA-240 (072) Web Development Using ASP.NET CA – 240 Kashif Jalal Welcome to week – 2 of…
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
27-Jun-15 Rails. What is Rails? Rails is a framework for building web applications This involves: Getting information from the user (client), using HTML.
Java Server Pages Russell Beale. What are Java Server Pages? Separates content from presentation Good to use when lots of HTML to be presented to user,
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
May 30th, 2006Speech Group Lunch Talk Features for Improved Speech Activity Detection for Recognition of Multiparty Meetings Kofi A. Boakye International.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
A Simple Introduction to Git: a distributed version-control system CS 5010 Program Design Paradigms “Bootcamp” Lesson 0.5 © Mitchell Wand, This.
Version Control with Subversion. What is Version Control Good For? Maintaining project/file history - so you don’t have to worry about it Managing collaboration.
GRAPPA Part of Active Notebook Science Portal project A “notebook” like GRAPPA consists of –Set of ordinary web pages, viewable from any browser –Editable.
M Gallas CERN EP-SFT LCG-SPI: SW-Testing1 LCG-SPI: SW-Testing LCG Applications Area GridPP 7 th Collaboration Meeting LCG/SPI LCG.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Design and Programming Chapter 7 Applied Software Project Management, Stellman & Greene See also:
Tutorial build Main ideas –Reuse as much previously obtained configuration information as possible: from Babel, cca-spec-babel, etc. –Extract all irrelevant.
Python – Part 1 Python Programming Language 1. What is Python? High-level language Interpreted – easy to test and use interactively Object-oriented Open-source.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
CERN - IT Department CH-1211 Genève 23 Switzerland t DB Development Tools Benthic SQL Developer Application Express WLCG Service Reliability.
PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Florida Public Hurricane Loss Model - v5.0 ( Computer Science ) Diana Machado - Raul Garcia School of Computing and Information Sciences Florida International.
A Simple Introduction to Git: a distributed version-control system CS 5010 Program Design Paradigms “Bootcamp” Lesson 0.5 © Mitchell Wand, This.
Copyright © Software Carpentry 2011 This work is licensed under the Creative Commons Attribution License See
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
© Paradigm Publishing, Inc. 4-1 Chapter 4 System Software Chapter 4 System Software.
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
Dionex Corporation Designs, manufactures and sells chemical analysis equipment Based in Sunnyvale, California Employs more than 1,200 people worldwide.
Part 4: FCM and the UM University of Reading, December 2015.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
1 09/2003 Processing Library Update CF Checker – Script made available as a web based form on the BADC site -
JRA1 Testing Current Status Leanne Guy Testing Coordination Meeting, 13 th September 2004 EGEE is a project funded by the European.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
© STZ Language Learning Media Telos Language Partner (TLP Pro) TLP Pro combines communication-oriented interactive self-study activities with intuitive.
Editing and Debugging Mumps with VistA and the Eclipse IDE Joel L. Ivey, Ph.D. Dept. of Veteran Affairs OI&T, Veterans Health IT Infrastructure & Security.
DIGITAL REPOSITORIES CGDD Job Description… Senior Tools Programmer – pulled August 4 th, 2011 from Gamasutra.
System Architecture CS 560. Project Design The requirements describe the function of a system as seen by the client. The software team must design a system.
SWIM Project Meeting, Bloomington, IN September 2006 Working with the SWIM Code Repository David E. Bernholdt Oak Ridge National Laboratory
Compute and Storage For the Farm at Jlab
Databases (CS507) CHAPTER 2.
NETSTORM.
Development Environment
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
A Simple Introduction to Git: a distributed version-control system
GOCDB New Requirements
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Getting Started with R.
Juicer: A weighted finite-state transducer speech decoder
Documentation Generators
HSA Reusability Issues
The Development of the AMI System for the Transcription of Speech in Meetings Thomas Hain, Lukas Burget, John Dines, Iain McCowan, Giulia Garau, Martin.
A Simple Introduction to Git: a distributed version-control system
Concurrent Version Control
OGSA Data Architecture Scenarios
Building and Testing using Condor
Web Development Using ASP .NET
CALO Decoder Progress Report for April/May
Software models - Software Architecture Design Patterns
Programming Arc.
Contributing source code to CSDMS
Games Development 2 Entity / Architecture Review
Presentation transcript:

Speaker Diarisation and Large Vocabulary Recognition at CSTR: The AMI/AMIDA System Fergus McInnes 7 December 2011 History – AMI, AMIDA and recent developments Architecture – processing graph, modules, directories Getting and running the system Points for further work

History (1): AMI and AMIDA AMI Project (Augmented Multi-party Interaction): Edinburgh, Sheffield, Brno, Twente, IDIAP, ICSI, TNO and others; Capture, analysis, indexing and browsing of meeting data AMIDA Project (AMI with Distance Access): AMI Consortium as above; Extension to videoconferences Meeting transcription system: Modules developed by multiple partners Multiple versions: IHM/MDM, offline/online, different architectures and platforms Took part in NIST Rich Transcription evaluations in 2007 and 2009 – hence “RT07” and “RT09” versions

History (2): Developments at CSTR Legacy of AMI and AMIDA: RT09 offline system for multiple distant microphones (MDM), running on ECDF compute server Eddie (also individual headset microphone (IHM) and RT07 systems) – used since 2009 by several people at CSTR Developments in 2011 (FRM – Cisco Project): Documentation written Scripts and config files tidied up Changes to a few modules Files placed in new Subversion repository Additional modules and interfacing to support PodCastle application

Architecture (1): Overall structure Beamforming Audio signals from multiple microphones Beamforming Speech/non-speech segmentation Speaker diarisation Speech recognition Speaker-attributed text with timings and scores Padding and noise reduction

Architecture (2): Details of speech recognition PLP with VTLN, CMN/CVN, HLDA Decoding pass 1 CMLLR adaptation Decoding pass 2 VTLN estimation(warp factor per speaker) PLP coding, CMN/CVN Fbank with VTLN, CMN/CVN, HLDA Feature merging, CMN/CVN Decoding pass 3 CMLLR adaptation Decoding pass 4MLLR adaptation Waveform Decoding by Juicer; HMMs and LM differ from passes 1-2 to passes 3-4

Architecture (3): ROTK framework (Resource Optimisation Toolkit) Modules (strictly module instances) connected together in processing graph (“mpg” file) – read by Python script sgproc, which creates directories, calls runmod script for each module instance and keeps track of progress Directory structure per module instance: MI directory in idal links to data files in preceding MIs’ out directories in.dlp (list of dal files & job numbers) ms0000.dal ms0001.dal... (data lists) out odal output data files out.dlp (list of dal files & job numbers) ms0000.dal ms0001.dal... (data lists) working files and subdirectories (module-specific) [created by runmod] [created by sgproc]

Architecture (4): Parallel processing Dependent on runmod script for each module, but typically... Different recording sessions, and speakers within each session, are processed in parallel (some modules also subdivide a speaker’s data if amount is large) runmod (or a subsidiary script) submits jobs to grid and records the job numbers Jobs for a later MI may be submitted before input data from an earlier MI are ready – using “-w ” option to submitjob, which calls qsub with “-hold_jid ”

Architecture (5): File locations In Subversion repository ( : pkg/rotk/b0013 – sgproc and system utilities pkg/jet/v0.04 – submitjob and gridenv..csh files pkg/mod/ – module-specific files: runmod, subsidiary scripts, source code mdm/mpg/*.mpg – processing graphs mdm/cfg/*.env – config files for all module instances mdm/global.cfg – template for global config file mdm/run-mdm.* – templates for top-level scripts to call sgproc with specific processing graphs On Eddie ( in /exports/work/inf_hcrc_cstr_nst/amiasr/asrcore – locations specified in config files ): exp/sysopt/bin/*/* – program files (sox, SHoUT, HTK, STK, Juicer etc) exp/sysopt/mdm-sys09dev/lib/*/* – HMMs, language models etc

Getting and running the system Get an account on Eddie Get access to repository (give me your UUN) Create a system directory and check out a copy of the system there Run copy_files (to copy and compile some binary files) Create a working directory (somewhere with enough space – probably best under /exports/work) Copy global.cfg and run-mdm.pad to and edit to specify and project code Create /data and put your wav files there (src- _ses- _chn-.wav – 16kHz mono) Create list of data files (without “.wav” extension) in /default.dal Run run-mdm.* – results will appear in /JU-M1-CMLLR4_MLLR32_0_D/out

Issues and points for further work We don’t have source code for some of the programs (e.g. SFeaCat, SfeaStack) – if possible we should replace these by our own or other open-source equivalents Many of the scripts are opaque (tcsh scripts calling Perl, building other scripts and then running them, etc) Licensing for some components is too restrictive for a commercial application Use of Juicer makes it difficult to adapt LM and vocabulary – desirable for many applications