Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead.

Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research behroozc@microsoft.com Frank Seide Lead Researcher Microsoft Research fseide@microsoft.com Kit Thambiratnam Researcher Microsoft Research kit@microsoft.com

Microsoft Research

Multimedia Research Speech Search Video summarization Semantic extraction Face identification Object recognition Visual search 3D Modeling

Speech Applications Indexing Search Metadata extraction Advertisin g Transcription Meeting notes Closed caption Voicemail Translation Translating phone Speech as interface Speech as 1 st class content Mobile access Search Automation PC application Web service Text input Dictation Mobile access Search Automation PC application Web service Text input Dictation Indexing Search Metadata extraction Advertising Transcription Meeting notes Closed caption Voicemail Translation Translating phone

meta-data – surrounding & anchor text, URL – top-N lists, collaborative filtering – editorial meta-data file content itself – keyword search in audio track using speech recognition Searching Media Today

Spectral Analysis Matching (Decoding) time alignment  most likely hypothesis W’=argmax (w 1..w N ) p(o t..o  |w 1..w N ) P(w 1..w N ) Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N ) “Hello World” o 1..o T (w 1..w N )^ Speech recognition

speech recognition in a nutshell Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N ) Speech recordings + full manual transcripts Speech recognition

Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N )... microscopem:s ay:n k:n r:n ax:n s:n k:n ow:n p:e microsecond m:s ay:n k:n r:n ax:n s:n eh:n k:n ax:n n:n d:e microsecondm:s ay:n k:n r:n ow:n s:n eh:n k:n ax:n n:n d:e microsoftm:s ay:n k:n r:n ax:n s:n ao:n f:n t:e microsoftm:s ay:n k:n r:n ow:n s:n ao:n f:n t:e … Speech recognition

Acoustic Models p(o t..o  |phoneme) Dictionary P(phonemes|w) Grammar (Language Model) P(w 1..w N )... -0.8790 this is a -2.3045 this is about -3.1858 this is absolutely -5.2820 this is accomplished -1.9542 this is actually... -5.8492 is a barnyard -5.1004 is a barometer -4.2270 is a baseball -5.4292 is a baseless -4.4304 is a baseline Speech recognition

Challenges Speaker accent Background noise Reverberation Vocabulary Language

lattice-based indexing “into this bank account”

lattice-based indexing “into this bank account” expected benefits from indexing lattices: – alternative recognition candidates  recall++ – confidence scores  precision++ – (time information  user experience) expected benefits from indexing lattices: – alternative recognition candidates  recall++ – confidence scores  precision++ – (time information  user experience)

Speech Word statistics Metadata NP extraction Web query builder Recognizer Bing Search Docs Queries Docs Base Dict Base LM Adapt Dictionary Adapt Language Model Adapted Dict Adapted LM Vocabulary Adaptation from NLC group

Architectural decisions

SQL Server(s) 1. Submit audio/video to index 2. Get back AIB 3. Import AIB in SQL Web server(s)Media server(s) 4. Search/Retrieve results video RSS feed Azure integration

Cloud computing made simple Windows Azure + Power shell = Cloud computing at your fingertips Demo media content submission

Microsoft Research – Tell us if you are interested Tell us if you are interested mmms@microsoft.com – Visit us: Visit us: http://research.microsoft.com/mavis http://research.microsoft.com http://twitter.com/MSFTResearch http://www.facebook.com/microsoftresearch# http://www.flickr.com/photos/msr_redmond/

Thank you! Questions?

Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead.

Similar presentations

Presentation on theme: "Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead.

Similar presentations

Presentation on theme: "Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead."— Presentation transcript:

Similar presentations

About project

Feedback