WP5.4 - Introduction  Knowledge Extraction from Complementary Sources  This activity is concerned with augmenting the semantic multimedia metadata basis.

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Knowledge Management and Engineering David Riaño.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Towards a zoomable cell abstract cell natural coordinate system Data > D Protein Structures from PDB ? A IHGFBCDE > Images from scientific.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Visual Event Detection & Recognition Filiz Bunyak Ersoy, Ph.D. student Smart Engineering Systems Lab.
Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Semantic Rich Internet Application (RIA) Modeling, Deployment and Integration Zoran Balkić, Marina Pešut, Franjo Jović Faculty of Electrical Engineering,
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Presented by Zeehasham Rasheed
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
MUSCLE WP9 E-Team Integration of structural and semantic models for multimedia metadata management Aims: (Semi-)automatic MM metadata specification process.
Information Retrieval in Practice
Data Mining Techniques
© Ramesh Jain Ramesh Jain CTO, PRAJA inc. and Professor Emeritus, UCSD Emergent Semantics and Experiential Computing.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
WP5.4/3.1/4.2/5.5 meeting 29th of November 2007, DFKI.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
An Architecture for Mining Resources Complementary to Audio-Visual Streams J. Nemrava, P. Buitelaar, N. Simou, D. Sadlier, V. Svátek, T. Declerck, A. Cobet,
A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 3, APRIL 2008.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Query Processing In Multimedia Databases Dheeraj Kumar Mekala Devarasetty Bhanu Kiran.
MPEG-7 Interoperability Use Case. Motivation MPEG-7: set of standardized tools for describing multimedia content at different abstraction levels Implemented.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Tactic Analysis in Football Instructors: Nima Najafzadeh Mahdi Oraei Spring
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP2 – Media Semantics and Ontologies.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
Andreas Abecker Knowledge Management Research Group From Hypermedia Information Retrieval to Knowledge Management in Enterprises Andreas Abecker, Michael.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Image and Video Retrieval INST 734 Doug Oard Module 13.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Computational Linguistics Courses Experiment Test.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
METADATA MANAGEMENT AT ISTAT: CONCEPTUAL FOUNDATIONS AND TOOLS Istituto Nazionale di Statistica ITALY.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
An Ontology framework for Knowledge-Assisted Semantic Video Analysis and Annotation Centre for Research and Technology Hellas/ Informatics and Telematics.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
Working meeting of WP4 Task WP4.1
Digital Video Library - Jacky Ma.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Social Knowledge Mining
Web Mining Department of Computer Science and Engg.
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Presentation transcript:

WP5.4 - Introduction  Knowledge Extraction from Complementary Sources  This activity is concerned with augmenting the semantic multimedia metadata basis by analysis of complementary textual, speech and semi-structured data  Focus in first 12 months  Joint work between DFKI, UEP and DCU on aligning event extraction from textual football match reports with event recognition in video coverage of the same match  Focus in following 12 months  Joint work between DFKI, UEP and DCU on the extension of the event alignment work towards cross-media feature extraction (aligning low-level image/video features with events extracted in aligned textual and semi-structured data)  Joint work between DFKI, UEP, TUB and GET (cross-WP cooperation with WP3.3) on analyzing textual metadata in primary sources (OCR applied to text detected in images).

Text-Video Mapping in the Football Domain  Cooperation: DFKI, UEP, DCU  Resources:  DFKI: SmartWeb Data Set (textual and tabular match reports)  DFKI/UEP: Additional minute-by-minute textual match reports (‚tickers‘) from other web resources  DCU: Video Detectors (Crowd image detector, Speech-Band Audio Activity, On-Screen Graphics Tracking, Motion activity measure, Field Line orientation, Close-up)  Textual and semi-structured data (tabular, XML files) are exploited as background knowledge in filtering the video analysis results and will possibly help in further improving the corresponding video analysis algorithms Alignment of extracted events from unstructured textual data and from events that are provided by the semi-structured tabular data in the SmartWeb corpus (DFKI) with events that were detected by the video analysis results (DCU).

 The SmartWeb Data Set as provided by DFKI is an experimental data set for ontology-based information extraction and ontology learning from text that has been compiled for the SmartWeb project.  The data set consists of:  An ontology on football (soccer) that is integrated with foundational (DOLCE), general (SUMO) and task-specific (discourse, navigation) ontologies.  A corpus of semi-structured and textual match reports (German and English documents) that are derived from freely available web sources. The bilingual documents are not translations, but are aligned on the level of a particular match (i.e. they are about the same match).  A knowledge base of events and entities in the world cup domain that have been automatically extracted from the German documents.  For the purposes of the experiment described here we were mostly interested in the events that are described by the semi-structured data. Resources

SmartWeb Data Example

 Framework for event detection in broadcast video of multiple different field sports as provided by DCU  Video detectors used by DCU  Crowd image detector  Speech-Band Audio Activity  On-Screen Graphics Tracking  Motion activity measure  Field Line orientation  Close-up DCU: Video Analysis Data

crowd audio confidence Visual_motion

DFKI/UEP: Extraction of Tickers Ard.deLigalive.debild.de Minute-by-minute reports from different Web resources

Shallow Processing with Unification and Typed Feature Structures (SProUT) tool for multilingual shallow text processing and information extraction SProUT java web service that takes the minute-by-minute reports as an input, parses them and extracts a new XML file for each minute of a particular match Information Extraction from Text Information Extraction with DFKI Tool „SProUT“

Information Extraction Results (SProUT) Aligning and Aggregation of Textual Events Events alignment from various tickers alignment Example: minute 40 Tabular Reports + video event detection data (features) from DCU Minute-by-minute reports VIDEO – TEXTUAL DATA TIME ALIGNMENT CROSS-MEDIA FEATURE EXTRACTION Data aggregation for later use

Match vs Video Time Freekick evaluation Time differences tracking Possible OCR on video

 Purpose: Cross-Media features describe information that occurs in textual/semi-structured data as well as in video data and can therefore be used as additional support in video analysis.  Goal: Use video detectors aligned with events extracted from text/semi- structured data as cross—media features  Example: Cross-media Features

Summary  Extracted: 1200 events, 45 event-types  After alignment: 850 events describing five matches from World Cup 2006 Final  170 events per game on average  Cross-media descriptors for every event-type

Future plans  In WP5.4.1 continue work on mapping between results of video analysis and complementary resource analysis in the following way:  Use extracted image descriptors from training data (video + aligned text extraction) for the classification of fine-grained events in test data (i.e. other videos) -- all based on minute-by-minute alignment  Cooperate with TUB in Video OCR to help time video-text alignment  WP5.4.2 Images and text as mutually complementary resources  WP5.4.3: Image retrieval based on enhanced query processing and complementary resource analysis

 Apart from identifying individual events, it might be useful to find out about general statistical dependencies (associations) among types of events  Initial experiments carried out on a single type of resource – structured data  In the future, events extracted from text and video could be considered as well  Use of LISp-Miner tool (UEP)  Data mining procedure 4ft-Miner mines for various types of association rules and conditional association rules  Potential application: Discovering new relationships to be inserted into the domain ontology or knowledge base, Mining over Football Match Data: Seeking Associations among Explicit and Implicit Events

Joint Work Example