Speech and Music Retrieval INST 734 Doug Oard Module 12.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Year 1 Overview English Spelling Word Reading Spoken Language
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.
Rapid and Accurate Spoken Term Detection David R. H. Miller BBN Technolgies 14 December 2006.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Meeting Recorder Adam Janin
Speech and Music Retrieval LBSC 796/CMSC828o Session 12, April 19, 2004 Douglas W. Oard.
MALACH Multilingual Access to Large spoken ArCHives Survivors of the Shoah Visual History Foundation Human Language Technologies IBM T. J. Watson Research.
Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.
Speech and Music Retrieval LBSC 796/INFM 718R Session 12, November 18, 2007 Douglas W. Oard.
Access to News Audio User Interaction in Speech Retrieval Systems by Jinmook Kim and Douglas W. Oard May 31, th Annual Symposium and Open House.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard.
By
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems.
Beyond Text INFM 718X/LBSC 708X Session 10 Douglas W. Oard.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Multilingual Access to Large Spoken Archives Douglas W. Oard University of Maryland, College Park, MD, USA.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Court Reporting A Great Career Starts Here. A Tradition of Responsibility Profession Dates back to 4 th century BCE –The “&” symbol we use today is one.
Cross-Language Access to Recorded Speech in the MALACH Project Douglas Oard, Dina Demner-Fushman, Jan Hajic, Bhuvana Ramabhadran, Sam Gustman, Bill Byrne,
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Automatic Spoken Document Processing for Retrieval and Browsing Zahra Ahmadi.
Audio Retrieval LBSC 708A Session 11, November 20, 2001 Philip Resnik.
1 Introducing The Buckeye Speech Corpus Kyuchul Yoon English Division, Kyungnam University March 21, 2008 School of English,
Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.
ORAL LANGUAGE AND COMMUNICATION. ORAL LANGUAGE INCLUDES:  Listening Skills  Speaking Skills  Listening and Speaking vocabulary Growth  Structural.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Word and Sub-word Indexing Approaches for Reducing the Effects of OOV Queries on Spoken Audio Beth Logan Pedro J. Moreno Om Deshmukh Cambridge Research.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
Structure of IR Systems INST 734 Module 1 Doug Oard.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
CLEF-2005 CL-SR at Maryland: Document and Query Expansion Using Side Collections and Thesauri Jianqiang Wang and Douglas W. Oard College of Information.
CS 416 Artificial Intelligence Lecture 19 Reasoning over Time Chapter 15 Lecture 19 Reasoning over Time Chapter 15.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Image and Video Retrieval INST 734 Doug Oard Module 13.
Evidence from Metadata INST 734 Doug Oard Module 8.
September 16, 2004CLEF 2004 CLEF-2005 CL-SDR: Proposing an IR Test Collection for Spontaneous Conversational Speech Gareth Jones (Dublin City University,
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
How Spelling Supports Reading Based on the article “Why Spelling Supports Reading And Why It Is More Regular and Predictable Than You May Think” By Louisa.
Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval Representation  Retrieval Thanks for David Doermann for most of these.
Being developed by Kris Seque and Morgan Pickford.
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Experiments for the CL-SR task at CLEF 2006
Course Projects Speech Recognition Spring 1386
3.0 Map of Subject Areas.
Five Components of a Comprehensive Reading Program
Automatic Speech Recognition
Audio Books for Phonetics Research
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Speech and Music Retrieval INST 734 Doug Oard Module 12

Agenda Music retrieval  Speech retrieval Interactive speech retrieval

Spoken Word Collections Broadcast programming –News, interview, talk radio, sports, entertainment Scripted stories –Books on tape, poetry reading, theater Spontaneous storytelling –Oral history, folklore Incidental recording –Speeches, oral arguments, meetings, phone calls

Speech Compression Opportunity: –Human voices vary in predictable ways Approach: –Predict what’s next, then send only any corrections Standards: –Rule of thumb: 1 kB/sec for (highly compressed) speech

Description Strategies Transcription –Manual transcription (with optional post-editing) Annotation –Manually assign descriptors to points in a recording –Recommender systems (ratings, link analysis, …) Associated materials –Interviewer’s notes, speech scripts, producer’s logs Automatic –Create access points with automatic speech processing

Three-Step Speech Recognition What sounds were made? Convert from waveform to subword units (phonemes) How could the sounds be grouped into words? –Identify the most probable word segmentation points Which of the possible words were spoken? –Based on likelihood of possible multiword sequences

Using Speech Recognition Phone Detection Word Construction Word Selection Phone n-grams Phone lattice Words Transcription dictionary Language model One-best transcript Word lattice

Phone Lattice

Phoneme Trigrams Manage -> m ae n ih jh –Dictionaries provide accurate transcriptions But valid only for a single accent and dialect –Rule-base transcription handles unknown words Index every overlapping 3-phoneme sequence –m ae n –ae n ih –n ih jh

Key Results from TREC/TDT Recognition and retrieval can be decomposed –Word recognition/retrieval works well in English Retrieval is robust with recognition errors –Up to 40% word error rate is tolerable Retrieval is robust with segmentation errors –Vocabulary shift/pauses provide strong cues

English Transcription Accuracy Training: 200 hours from 800 speakers

Other Languages 10/01 4/02 10/02 4/03 10/03 4/04 10/04 4/05 10/05 4/06 10/ WER [%] Czech Russian Slovak Polish 45h + LM Tr 84h + LM Tr + LM Tr+TC + standard % 45.91% 41.15% 38.57% 35.51% + adapt. 20h + LM Tr 66.07% 50.82% 34.49% Hungarian 100h + LM Tr + stand.+LM Tr+TC 100h + LM Tr + stand.+LM Tr+TC 45.75% 40.69% 50% 25h

Somewhere in ASROnly in Metadata wallenberg (3/36)* rescue jews wallenberg (3/36)eichmann abusive female (8/81) personnel minsko (21/71) ghetto underground art auschwitz labor campsig farben slave labortelefunkenaeg holocaustsinti roma sobibor (5/13) death camp witnesseichmann jewsvolkswagen (ASR/Metadata) Error Analysis (2005) CLEF-2005 training + test – (metadata < 0.2), ASR2004A only, Title queries, Inquery 3.1p1

Agenda Music retrieval Speech retrieval  Interactive speech retrieval