Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead.

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
COMMERCIAL METADATA APPROACH BY ANDREA DE POLO (ALINARI)
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
Richard Yu.  Present view of the world that is: Enhanced by computers Mix real and virtual sensory input  Most common AR is visual Mixed reality virtual.
ACCESSIBLE TECHNOLOGIES FOR SPEECH MANAGEMENT “Making media accessible to all” ITU workshop – Geneva October 2013.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Nexidia Confidential “Searching Audio and Video Sources On the Web” SpeechTEK West 2007.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
Meeting Recorder Adam Janin
Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
Why is ASR Hard? Natural speech is continuous
Overview of Search Engines
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP3 – Retrieval systems.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Web Content Management at GCN.com The Gilbane Conference: Content Technologies for Government Alec Dann SVP of Internet Publishing PostNewsweek Tech Media.
Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo.
DIVA - University of Fribourg - Switzerland Seminar presentation, jan Lawrence Michel, MSc Student Portable Meeting Recorder.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Crossing Media for Video Search: enabling usability beyond traditional broadcast & TV Katerina Pastra and Stelios Piperidis Language Technology Applications,
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Simple Database.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Producción de Sistemas de Información Agosto-Diciembre 2007 Sesión # 8.
How People with Disabilities Access the Web Web Design – Sec 2-5 Part or all of this lesson was adapted from the University of Washington’s “Web Design.
1 Looking for The Next Great Band An inside look at Yahoo! Audio Search March 5, 2006 Michael Spiegelman.
Improving the OER Experience: Enabling Rich Media Notebooks of OER Video and Audio Brandon Muramatsu Andrew McKinney
Supervisor: Dr. Elsayed Eissa Hemayed. o Marwa Ibrahim Lamey. Mayada Ibrahim Aly. o Mona Sherif Ahmed. o Suad Mohamed Barakat. o Marwa Ibrahim Lamey.
Information Retrieval
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
SAPIR Search in Audio-Visual Content using P2P Information Retrival For more information visit: Support.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Precision, Recall, F- score, Map.
Natural Language and Speech (parts of Chapters 8 & 9)
March 15 – 17, Las Vegas Sascha P. Corti Microsoft
Speech Recognition Created By : Kanjariya Hardik G.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
Information Retrieval in Practice
G. Anushiya Rachel Project Officer
2/21/ :54 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Workshop Oral History and Speech Technology
Digital Video Library - Jacky Ma.
Technologies: for Enhancing Broadcast Programmes with Bridgets
Visual Information Retrieval
DATA INTEGRATION FOR LANGUAGE DOCUMENTATION
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Machine Learning and Office 365 Collaboration
3.0 Map of Subject Areas.
11/23/2018 8:30 AM BRK3037 BRK3037: Dive deep on building apps and services with the Office 365 Communications Platform David Newman Senior Program Manager.
David Cyphert CS 2310 – Software Engineering
ITS 2.0 Enriched Terminology Annotation Showcase
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Technical Capabilities
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
A HCL Proprietary Utility for
Artificial intelligence for everyone
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead Researcher Microsoft Research Kit Thambiratnam Researcher Microsoft Research

Microsoft Research

Multimedia Research Speech Search Video summarization Semantic extraction Face identification Object recognition Visual search 3D Modeling

Speech Applications Indexing Search Metadata extraction Advertisin g Transcription Meeting notes Closed caption Voic Translation Translating phone Speech as interface Speech as 1 st class content Mobile access Search Automation PC application Web service Text input Dictation Mobile access Search Automation PC application Web service Text input Dictation Indexing Search Metadata extraction Advertising Transcription Meeting notes Closed caption Voic Translation Translating phone

meta-data – surrounding & anchor text, URL – top-N lists, collaborative filtering – editorial meta-data file content itself – keyword search in audio track using speech recognition Searching Media Today

Demo

Spectral Analysis Matching (Decoding) time alignment  most likely hypothesis W’=argmax (w 1..w N ) p(o t..o  |w 1..w N ) P(w 1..w N ) Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N ) “Hello World” o 1..o T (w 1..w N )^ Speech recognition

speech recognition in a nutshell Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N ) Speech recordings + full manual transcripts Speech recognition

Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N )... microscopem:s ay:n k:n r:n ax:n s:n k:n ow:n p:e microsecond m:s ay:n k:n r:n ax:n s:n eh:n k:n ax:n n:n d:e microsecondm:s ay:n k:n r:n ow:n s:n eh:n k:n ax:n n:n d:e microsoftm:s ay:n k:n r:n ax:n s:n ao:n f:n t:e microsoftm:s ay:n k:n r:n ow:n s:n ao:n f:n t:e … Speech recognition

Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N ) this is a this is about this is absolutely this is accomplished this is actually is a barnyard is a barometer is a baseball is a baseless is a baseline Speech recognition

Challenges Speaker accent Background noise Reverberation Vocabulary Language

lattice-based indexing “into this bank account”

lattice-based indexing “into this bank account” expected benefits from indexing lattices: – alternative recognition candidates  recall++ – confidence scores  precision++ – (time information  user experience) expected benefits from indexing lattices: – alternative recognition candidates  recall++ – confidence scores  precision++ – (time information  user experience)

Speech Word statistics Metadata NP extraction Web query builder Recognizer Bing Search Docs Queries Docs Base Dict Base LM Adapt Dictionary Adapt Language Model Adapted Dict Adapted LM Vocabulary Adaptation from NLC group

Architectural decisions

SQL Server(s) 1. Submit audio/video to index 2. Get back AIB 3. Import AIB in SQL Web server(s)Media server(s) 4. Search/Retrieve results video RSS feed Azure integration

Cloud computing made simple Windows Azure + Power shell = Cloud computing at your fingertips Demo media content submission

Microsoft Research – Tell us if you are interested Tell us if you are interested – Visit us: Visit us:

Thank you! Questions?