Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS.

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Exploring the news | Always multi- source, multimodal and personalized.
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Introduction to Computational Linguistics
Data Compression CS 147 Minh Nguyen.
Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.
Assistive Technology Training Online (ATTO) University at Buffalo – The State University of New York USDE# H324M Write:Outloud.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Welcome to National 5 French
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Chapter 4 Listening 「 Learning and Teaching English 」 Chapter 4 Listening Mun, Yeji Lim, Haerim.
Using the NASA Thesaurus to Support the Indexing of Streaming Media Gail Hodge Information International Associates, Inc. Janet Ormes & Patrick Healey.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
© Copyright 2008 STI INNSBRUCK Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections.
Input By Hollee Smalley. What is Input? Input is any data or instructions entered into the memory of a computer.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Multimedia By: Marcus Bobian Multimedia period 1.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Image and Video Retrieval INST 734 Doug Oard Module 13.
© 2013 by Larson Technical Services
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Unlocking Audio/Video Content with Speech Recognition Behrooz Chitsaz Director, IP Strategy Microsoft Research Frank Seide Lead.
Getting Started 1) Open Read & Write Gold 2) Open Word 3) Click on textHELP drop down arrow 4) Choose General Options.
Getting Started 1) Open Read & Write Gold 2) Open Word 3) Click on textHELP drop down arrow 4) Choose General Options.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Speech Recognition Created By : Kanjariya Hardik G.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Genoa – May 23, 2006 LREC workshop From Media Crossing to Media Mining Franciska de Jong University of Twente/TNO ICT
Using the Automatic Captions Feature. Objectives Learn how to use the Automatic Captions feature in YouTube  Edit the generated captions  Extract the.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
CS 445/656 Computer & New Media
Workshop Oral History and Speech Technology
Artificial Intelligence for Speech Recognition
Say What? The Importance of Dialogue.
Summarising skills and professional standards
Course Projects Speech Recognition Spring 1386
3.0 Map of Subject Areas.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Machine Learning Ali Ghodsi Department of Statistics
Using Speech Recognition for Input: A Powerful and Readily Available Tool Dr. Donna Olsen Instructional Technologist Central Wyoming College
Creating Transcripts of Your Narrated PowerPoints Richard Oliver Department of Information Systems 2018 Quality in Online Education Conference.
Speech Capture, Transcription and Analysis App
Chapter 15 & 16 Study Guide.
ConnectPro User Guide for Students
University of West Bohemia – Department of Cybernetics
Pilar Orero, Spain Yoshikazu SEKI, Japan 2018
Multimedia Information Retrieval
Audio and Speech Computers & New Media.
Command Me Specification
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Huawei CBG AI Challenges
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS

PUT RAW DATA NOW and then LINK DATA

PUT RAW DATA NOW Text Data (numbers, statistics) Data (audio, video)

LINKED DATA Information is in the relationship between data Find relationship between them

IBMs Watson and Jeopardy

Proposal Information Extraction in radio and television documents – Industrial Partners: CEDROM Sni Irosoft – Universities and Research Center CRIM ÉTS INRS-EMT McGill NSERC Strategic Project Proposal

Process Raw Audio Data Automatic Speech Recognition (ASR) Parsing Indexation ASR Parsing Indexation

Closed-captioning / Subtitling VOICEWRITER

Closed- captioning / Subtitling Done with the help of a VoiceWriter that: – Respeaks – Adds punctuation – Selects proper dictionary – Does not speak during advertising – Wraps up information when more than one speakers speak in the same time or when the speech rate is too fast. – Translates

How to process raw audio data? ASR Parsing Indexation Audio Diarization Audio Diarization Speaker Diarization Speaker Recognition Speaker Role Punctuation Structural Segmentation Structural Segmentation Topic Recognition Topic Recognition

Audio Diarization Aims to segment an audio recording into acoustically homogeneous parts – Distinguish between speech and music – Distinguish between advertising and news

Speaker diarization Aims to segment a speech signal into its speech turns

Speaker Recognition

Speaker Role In broadcast news speech, most speech is from anchors and reporters. The remaining is from excerpts from quotations or interviews and are referred as sound bites. Detecting speaker role is important to improve: – acoustice speech recognizer – information extraction

Punctuation Some language analysis tasks such as parsing and entity extraction needs punctuations (dots and commas) in order to work properly.

Structural Segmentation Sentence segmentation, paragraph segmentation, story segmentation are important features for speech understanding applications from parsing and information extraction at the basic level. This problem is absent in text processing but has to be solved in speech processing.

Topic Spotting Aims to identify the topic of a speech signal. It is useful to adapt the different components of the system as well as to add metatag on a speech signal. Example: La belle ferme le voile – La: the, her – Belle: beautiful, beauty – Ferme: farm, closes – Le: the, his – Voile: veil, blocks the view – Two hypothetic translations: The veil is closed by the beauty The beautiful farm blocks his view

How to improve Information Extraction from speech? By improving ASR Components

Automatic Speech Recognizer Performance drops when Out-of-vocabulary (Lexical models) Multiple users (Acoustic models) Multiple microphones (Acoustic models) Multiple topics (Language models) Cross-over talks (All models)

How to improve Information Extraction from speech? More data are better data. More similar data are better data. Similar in terms of – Topic – Coming from the same time period. Specifically, more recent. Example: Japan – Prediction of what will happen and who will speaks.

More data are better data Use of the huge amount of web information Use super computer infrastructure in order to model it in a reasonable time: – Compute Canada infrastructure: CLUMEQ – Cluster of university computers

More similar data are better data Exploiting redundancies in different media information: – Anchor speech is predominant. – Reporters often appear at specific times, day after day – Advertisings appear (and repeat) near specific time slot, day after day. – The same news is often reused from one media to another.

Exploiting redundancies in different media information

And then …. ASR Parsing Indexation Audio Diarization Audio Diarization Speaker Diarization Speaker Recognition Speaker Role Punctuation Structural Segmentation Structural Segmentation Topic Recognition Topic Recognition