Multimedia Semantic Analysis in the PrestoSpace Project Valentin Tablan, Hamish Cunningham, Cristian Ursu NLP Research Group University of Sheffield Regent.

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

Collection Service. 19 February 2001CYCLADES Kick-off meeting Collection A set of documents A set of services on the documents A set of polices that regulate.
1 ECHO - European Chronicles On-line Pasquale Savino I.E.I. - C.N.R. Via Alfieri, Pisa
Yansong Feng and Mirella Lapata
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Mining the web to improve semantic-based multimedia search and digital libraries
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
1 20/06/2015 ISO 9001 certified Herwig Rehatschek LREC Workshop, Genoa 23 May 2006 JOANNEUM RESEARCH a TRADITION of INNOVATION Herwig Rehatschek Institute.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
School of something FACULTY OF OTHER University Library The Library’s Digital Repository or Whatever happened to MIDESS? Michael Emly Jonathan Ainsworth.
Yannis Ioannidis University of Athens, Hellas Digital Libraries at a Crossroads Toward the Future Generation of Digital Library Mgmt Systems.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
1 Archive-It Training University of Maryland July 12, 2007.
Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System.
Creating Access to Europe’s Television Heritage Prof. Dr. Sonja de Leeuw (project-coordinator, Utrecht University) Johan Oomen MA (technical director,
Exploring Europe's Television Heritage in Changing Contexts Connected to: Funded by the European Commission within the eContentplus programme
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Presentation of results Challenges How can we use new media or social network in the field of adult education in the context of lifelong learning? What.
Television Heritage in Europeana Drs. Johan Oomen Netherlands Institute for Sound and Vision.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.
Exploring Europe's Television Heritage in Changing Contexts Connected to: Funded by the European Commission within the eContentplus programme
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
TRECVID Evaluations Mei-Chen Yeh 05/25/2010. Introduction Text REtrieval Conference (TREC) – Organized by National Institute of Standards (NIST) – Support.
MicroThoughts Richard Bailey Kim Spreen G Stepanov.
1 NumericNumeric Developing a statistical framework for measuring the digitisation of Europe’s cultural heritage  Numeric  Phillip Ramsdale The study.
SILVER TAAVET KOLLOM 11.E British media. Types TV Newspapers Radio Magazines Web sites.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Distributed Rendering Tool for Voices (DRTV) Familiar, Expressive Voices & Personalities Speech Technology & Media Solutions By Dale Schalow SCHALOW Innovations.
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
1 Language Technologies (2) Valentin Tablan University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
The Boston TV News Digital Library: Partners WGBH Media Library and Archives (WGBH) Northeast Historic Film (NHF) Boston Public Library (BPL)
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Creating Access to Europe’s Television Heritage Vienna, EDL Workshop November Dr. Alexander Hecht (Austrian Broadcasting Corporation ORF) Johan.
GreenFIE-HD: A “Green” Form-based Information Extraction Tool for Historical Documents Tae Woo Kim.
VIDEO ACTIVE Creating Access to European Television History Project Update FIAT World Conference, Lisbon October 15th, 2007 Alexander Hecht (ORF, A) –
Video Active and the European Digital Library EDL International Conference Frankfurt am Main, 31/1-1/ Sonja de Leeuw.
Genoa – May 23, 2006 LREC workshop From Media Crossing to Media Mining Franciska de Jong University of Twente/TNO ICT
LREC – Workshop on Crossing media for Improved Information Access, Genova, Italy, 23 May Cross-Media Indexing in the Reveal-This System Murat Yakici,
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
CLARIN ERIC Franciska de Jong Oxford April 2016
ESWC 2005, Crete, Greece Semantically Enhanced Television News through Web and Video Integration Multimedia and the Semantic Web workshop Borislav PopovMike.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
University of Sheffield, NLP Introduction to Text Mining Module 4: Applications (Part 2)
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Digital Video Library - Jacky Ma.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Key Linguistic DEVICES Concepts
Richard Waller NOF Technical Advisor UKOLN is supported by:
Social Knowledge Mining
DIGITAL LIBRARY.
Speech Capture, Transcription and Analysis App
Peggy van der Kreeft Deutsche Welle
Searching and browsing through fragments of TED Talks
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Presentation transcript:

Multimedia Semantic Analysis in the PrestoSpace Project Valentin Tablan, Hamish Cunningham, Cristian Ursu NLP Research Group University of Sheffield Regent Court, 211 Portobello Street, Sheffield, S1 4DP, UK

LREC 2006, Genoa, Italy – Crossing Media Workshop 2 Project Mission The 20th Century was the first with an audiovisual record. Audiovisual media became the new form of cultural expression. These historical, cultural and commercial assets are now entirely at risk from deterioration. PrestoSpace aims to provide technical devices and systems for digital preservation of all types of audio-visual collections.

LREC 2006, Genoa, Italy – Crossing Media Workshop 3 The Partners IP, 34 partners Steering Board:  Institut National de l’Audiovisuel INA (France)  British Broadcasting Corporation BBC (UK)  Radiotelevisione Italiana RAI (Italy)  Joanneum Research JRS (Austria)  Netherlands Institute for Sound and Vision - Beeld en Geluid B&G (The Netherlands)  Oesterreichischer Rundfunk ORF (Austria)  University of Sheffield USFD (UK)

LREC 2006, Genoa, Italy – Crossing Media Workshop 4 Project Organisation

LREC 2006, Genoa, Italy – Crossing Media Workshop 5 Semantic Analysis – Motivation Sizeable archives plus new material produced daily (BBC has 8 TV and 11 radio national channels). Some of this material can be reused in new productions. Access to archive material can be provided by some form of semantic annotation and indexing, but manual annotation is time consuming (up to 10x real time) and expensive. Archive budgets alone cannot support digitisation effort.

LREC 2006, Genoa, Italy – Crossing Media Workshop 6 English SA - RichNews A prototype addressing the automation of semantic annotation for multimedia material. Not aiming at reaching performance comparable to that of human annotators. Fully automatic. Aimed at news material, further extensions envisaged. TV and radio news broadcasts from the BBC were used during development and testing.

LREC 2006, Genoa, Italy – Crossing Media Workshop 7 Overview Input: multimedia file Output: OWL/RDF descriptions of content  Headline (short summary)  List of entities (Person/Location/Organization/…)  Related web pages  Segmentation Multi-source Information Extraction system  Automatic speech transcript  Subtitles/closed captions  Related web pages  Legacy metadata

LREC 2006, Genoa, Italy – Crossing Media Workshop 8 Using ASR Transcripts ASR is performed by the THISL system. Based on ABBOT connectionist speech recognizer. Optimized specifically for use on BBC news broadcasts. Average word error rate of 29%. Error rate of up to 90% for out of studio recordings. No capitalisation – limited IE capability.

LREC 2006, Genoa, Italy – Crossing Media Workshop 9 ASR error examples he was suspended after his arrest [SIL] but the process were set never to have lost confidence in him he was suspended after his arrest [SIL] but the Princess was said never to have lost confidence in him and other measures weapons inspectors have the first time entered one of saddam hussein's presidential palaces United Nations weapons inspectors have for the first time entered one of saddam hussein's presidential palaces

LREC 2006, Genoa, Italy – Crossing Media Workshop 10 Architecture THISL Speech Recogniser C99 Topical Segmenter TF.IDF Key Phrase Extraction Media File Manual Annotation (Optional) Entity Validation Semantic Index Web-Search and Document Matching KIM Information Extraction Degraded Text Information Extraction

LREC 2006, Genoa, Italy – Crossing Media Workshop 11 Search for Related Pages ASR transcript segmented using C99. Key-phrases found for each segment using TF/IDF.  Any sequence of up to three words can be a phrase; up to four phrases extracted per story. Key-phrases used to search the BBC, Times, Guardian and Telegraph newspaper websites. Searches are restricted to the day of broadcast, or the day after. The text of the returned web pages is compared to the text of the transcript to find matching stories.

LREC 2006, Genoa, Italy – Crossing Media Workshop 12 Using the Web Pages The web pages contain: A headline, summary and section for each story. Good quality text that is readable, and contains correctly spelt proper names. They give more in depth coverage of the stories.

LREC 2006, Genoa, Italy – Crossing Media Workshop 13 Semantic Annotation The KIM knowledge management system can semantically annotate the text derived from the web pages:  KIM will identify people, organizations, locations etc.  KIM performs well on the web page text, but very poorly when run on the transcripts directly. This allows for semantic ontology-aided searches for stories about particular people or locations etcetera.  So we could search for people called Sydney, which would be difficult with a text-based search.

LREC 2006, Genoa, Italy – Crossing Media Workshop 14 Entity Matching

LREC 2006, Genoa, Italy – Crossing Media Workshop 15 Evaluation Evaluation based on 66 news stories from 9 half-hour news broadcasts. Web pages were found for 40% of stories. 7% of pages reported a closely related story, instead of that in the broadcast. Lenient recall: 47%, precision: 100%. Results are based on earlier version of the system, only using BBC web pages.

LREC 2006, Genoa, Italy – Crossing Media Workshop 16 Future Improvements Use teletext subtitles (closed captions) when they are available Better story segmentation through visual cues. Use for different domains and languages.

LREC 2006, Genoa, Italy – Crossing Media Workshop 17 Thank you! More information: