ESWC 2005, Crete, Greece Semantically Enhanced Television News through Web and Video Integration Multimedia and the Semantic Web workshop Borislav PopovMike.

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

Visit the ccScan Website Scan, Import, and Automatically File documents to the Cloud SCAN, IMPORT, AND AUTOMATICALLY FILE DOCUMENTS TO SALESFORCE ® Introduction.
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Broadcast News Parsing Using Visual Cues: A Robust Face Detection Approach Yannis Avrithis, Nicolas Tsapatsoulis and Stefanos Kollias Image, Video & Multimedia.
Mining the web to improve semantic-based multimedia search and digital libraries
Information Retrieval in Practice
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Text mining tool for ontology engineering based on use of product taxonomy and web directory Jan Nemrava and Vojtech Svatek Department of Information and.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
TV-Anytime (and the myTV project) Ronald Tol Philips Research.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 3, APRIL 2008.
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Chapter 6: Information Retrieval and Web Search
Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
National Taiwan University, Taiwan
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
1 Language Technologies (2) Valentin Tablan University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Genoa – May 23, 2006 LREC workshop From Media Crossing to Media Mining Franciska de Jong University of Twente/TNO ICT
LREC – Workshop on Crossing media for Improved Information Access, Genova, Italy, 23 May Cross-Media Indexing in the Reveal-This System Murat Yakici,
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Multimedia Semantic Analysis in the PrestoSpace Project Valentin Tablan, Hamish Cunningham, Cristian Ursu NLP Research Group University of Sheffield Regent.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
University of Sheffield, NLP Introduction to Text Mining Module 4: Applications (Part 2)
Information Retrieval in Practice
Digital Video Library - Jacky Ma.
Search Engine Architecture
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Ryan McFall, Herb Dershem Dept. of Computer Science Hope College
Multimedia Information Retrieval
Speech Capture, Transcription and Analysis App
Discovering Emerging Entities with Ambiguous Names
Multimedia Information Retrieval
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Presentation transcript:

ESWC 2005, Crete, Greece Semantically Enhanced Television News through Web and Video Integration Multimedia and the Semantic Web workshop Borislav PopovMike Dowman Valentin Tablan Cristian Ursu Hamish Cunningham

2 ESWC 2005, Crete, Greece Motivation Pile of media material - Broadcasters produce lots of material (BBC has 8 TV and 11 radio national channels and more are comming) Need for rapid recycling - Some of this material can be reused in new productions Quickly finding the needle in the stack of hay - Access to archive material is provided by some form of semantic annotation and indexing Helping documentarists - Manual annotation is time consuming and expensive. Currently some 90% of BBC’s output is only annotated at a very basic level

3 ESWC 2005, Crete, Greece RichNews Aims high - A prototype addressing the automation of semantic annotation for multimedia material Well, not that high - Not aiming at reaching performance comparable to that of human documentarists Fully automatic Targets news material - Further extensions possible. TV and radio news broadcasts from the BBC were used during development and testing

4 ESWC 2005, Crete, Greece Overview Input: multimedia file Output: OWL/RDF descriptions of content – Headline (short summary) – List of entities (Person/Location/Organization/…) – Related web pages – Segmentation

5 ESWC 2005, Crete, Greece Key Problems Obtaining a good transcript: Speech recognition produces poor quality transcripts with many mistakes (error rate ranging from 10 to 90%) More reliable sources (subtitles/closed captions) not always available Automatic broadcast segmentation: A news broadcast contains several stories. How do we work out where one starts and another one stops?

6 ESWC 2005, Crete, Greece Architecture THISL Speech Recogniser C99 Topical Segmenter TF.IDF Key Phrase Extraction Media File Manual Annotation (Optional) Entity Validation Semantic Index Web-Search and Document Matching KIM Information Extraction Degraded Text Information Extraction

7 ESWC 2005, Crete, Greece Using ASR Transcripts ASR is performed by the THISL system. Based on ABBOT connectionist speech recognizer Optimized specifically for use on BBC news broadcasts Average word error rate of 29% Error rate of up to 90% for out of studio recordings

8 ESWC 2005, Crete, Greece ASR error examples he was suspended after his arrest [SIL] but the process were set never to have lost confidence in him he was suspended after his arrest [SIL] but the Princess was said never to have lost confidence in him and other measures weapons inspectors have the first time entered one of saddam hussein's presidential palaces United Nations weapons inspectors have for the first time entered one of saddam hussein's presidential palaces

9 ESWC 2005, Crete, Greece Topical Segmentation Uses C99 segmenter: Removes common words from the ASR transcripts. Stems the other words to get their roots. Then looks to see in which parts of the transcripts the same words tend to occur. (These parts will probably report the same story)

10 ESWC 2005, Crete, Greece Key Phrase Extraction Term frequency inverse document frequency (TF.IDF): Chooses sequences of words that tend to occur more frequently in the story than they do in the language as a whole. Any sequence of up to three words can be a phrase. Phrases that occurred at least twice in a story are extracted. Up to four phrases extracted per story.

11 ESWC 2005, Crete, Greece Web Search and Document Matching Google the Key-phrases on the BBC, Times, Guardian and Telegraph newspaper websites, for web pages reporting stories in the broadcast. Searches are restricted to the day of broadcast Parallel combined searches – Multiple searches using different combinations of the extracted key-phrases Transcript2WebDocument matching - The text of the returned web pages is compared to the text of the transcript to find matching stories.

12 ESWC 2005, Crete, Greece Using the Web Pages The web pages contain: A headline, summary and section for each story. Good quality text that is readable, and contains correctly spelt proper names. They give more in depth coverage of the stories.

13 ESWC 2005, Crete, Greece Semantic Annotation of NEs A Semantic Annotation of the named entities (NEs) in a text includes: -a recognition of the type of the entities in the text -out of a rich taxonomy of classes (not a flat set of 10 types); -an identification of the entities, which is also a reference to their semantic description. The traditional (IE-style) NE recognition approach results in: Lama Ole Nydahl The Semantic Annotation of NEs results in: Lama Ole Nydahl

14 ESWC 2005, Crete, Greece XYZ was established on 03 November 1978 in London. It opened a plant in Bulgaria in … Ontology & KB Company type HQ establOn CityCountry Location partOf type “03/11/1978” XYZ London UK Bulgaria HQ partOf Semantic Annotation: Example K I M – K nowledge and I nformation M anagement

15 ESWC 2005, Crete, Greece Entity Matching Dual process, through which confidence scores are assigned to the found entities: ASR2Web - Entities from the ASR transcript are matched against the Web entities (receive the highest confidence) and if not found, against the text of the matching Web document Web2ASR - Remaining Web entities are matched against the ASR transcript and if not found, confidence scores are given depending on how often they occur inside the Web document. The often they occur in the Web document, the more likely they are to appear in the broadcast.

16 ESWC 2005, Crete, Greece

17 ESWC 2005, Crete, Greece Search for Entities

18 ESWC 2005, Crete, Greece Story Retrieval

19 ESWC 2005, Crete, Greece Evaluation Success in finding matching web pages was investigated: Evaluation based on 66 news stories from 9 half-hour news broadcasts. Web pages were found for 40% of stories. 7% of pages reported a closely related story, instead of that in the broadcast. Results are based on earlier version of the system, only using BBC web pages.

20 ESWC 2005, Crete, Greece Future Improvements Use teletext subtitles (closed captions) when they are available Better story segmentation through visual cues and latent semantic analysis Use for content augmentation for interactive media consumption

21 ESWC 2005, Crete, Greece Acknowledgments This work has been supported by European Union grants under the Sixth Framework Program projects PrestoSpace (FP ) / and SEKT (EU IST IP ) More Information Thank you! phew