Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 1 Informedia 03/12/97.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.
K nearest neighbor and Rocchio algorithm
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
2000 Final Year Projects Prof. Michael R. Lyu
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
1 CS 430 / INFO 430 Information Retrieval Lecture 22 Metadata 4.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Information retrieval. perspectives: information retrieval application(s) - surveillance psychological - information overload experimental - search engines.
ISP 433/633 Week 5 Multimedia IR. Goals –Increase access to media content –Decrease effort in media handling and reuse –Improve usefulness of media content.
1 Discussion Class 10 Informedia. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Outline of Presentation Introduction of digital video libraries Introduction of the CMU Informedia Project Informedia: user perspective Informedia:
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
Overview of Search Engines
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
1 Lessons Learned From Building a Terabyte Digital Video Library Presented by Jia Yao Multimedia Communications and Visualization Laboratory Department.
CC 2007, 2011 attrbution - R.B. Allen Text and Text Processing.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
1999 Final Year Projects Prof. Michael R. Lyu. Lyu9901: TravelNet Design a Web-based travel manager to reserve airplane tickets and hotel. A distributed.
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.
Blogging By Yun Taiho. Your Favorite Blog and Why.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal VideoConference Archives Indexing System.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 1 Informedia 03/12/97.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Chapter 23: Probabilistic Language Models April 13, 2004.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
1 Language Technologies (2) Valentin Tablan University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Accessing News Video Libraries through Dynamic Information Extraction, Summarization, and Visualization Mike Christel Carnegie Mellon University, USA June.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
LREC – Workshop on Crossing media for Improved Information Access, Genova, Italy, 23 May Cross-Media Indexing in the Reveal-This System Murat Yakici,
Digital Video Library - Jacky Ma.
CS 430: Information Discovery
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Video Google: Text Retrieval Approach to Object Matching in Videos
Multimedia Information Retrieval
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Video Google: Text Retrieval Approach to Object Matching in Videos
Discussion Class 9 Informedia.
Presentation transcript:

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 1 Informedia 03/12/97

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 2 Multilingual Informedia: Innovations Robust Indexing and Retrieval –Spanish Speech Recognitiion –Searchable User Annotations –Data Extraction for Further Analysis Multilingual Document Access –English or Spanish Queries –English or Spanish Broadcast Video

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 3 Extending the Informedia Digital Video Library Original Informedia Goal –Full content search and retrieval from digital video, audio and text libraries Technology –Integrated speech, image and language processing for automated library creation (indexing, segmentation, abstraction, summarization)

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 4 Building on the Informedia Infrastructure Video and Audio Segmentation –improved segmentation algorithms –extend to multiple languages Presentation, Reuse and Interoperability –abstractions and video summarization (skims) –“cut and paste” for presentations and reports –Annotations Initially typed, later spoken Incrementally indexed for immediate retrieval

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 5 Multilingual Integration –Spanish News Broadcast –Digitized from PAL to MPEG-1 –Speech Recognition/Alignment by Sphinx-III –Simple Phrase-based Translation –Processed Automatically into the Informedia Digital Video Library

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 6 Multilingual Demo Running prototype demo Demonstration of current technologies

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 7 Title Generation for Informedia News Stories Informedia, a multimedia digital library, stores television broadcast news stories. An extractive summary feature currently locates snippets in news-story transcripts to use as story titles. GOAL: An improved, non-extractive title- generation feature for Informedia.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 8 KNN-based Topic Detection Build training index with pre-labeled topics –45000 Broadcast News stories With new document: Search for top 10 related stories in training index Lookup topics for related stories Re-weight topics by story relevance (select top 5)

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 9 Basic Idea for better Titles Train a statistical model on a corpus of documents with human-assigned titles. Compare title generation methods: –Extractive Titles –Naïve Bayes, EM, –KNN

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 10 Extractive Summarization MS Word 2000 AutoSummarize Extracts sentences/fragments as summaries Similar performance to TF IDF implementation at CMU Does not use our training corpus

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 11 Naïve Bayes Train a statistical model on a corpus of documents with human-assigned titles. Title need not be a snippet from the document (contrasts with extractive-summarization techniques). Suggested by Witbrock & Mittal, P(w Title |w Doc ) –works better if W title = W Doc

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 12 (K) Nearest Neighbor Index a corpus of documents with human- assigned titles. Find the document in the training corpus closest to the current document Use that title (k=1)

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 13 Evaluation of Title Accuracy Apply to unseen documents, (2 * precision * recall) F1 = _________________ (precision + recall) Precision = Correct/Retrieved Recall = Correct/All Possible Correct Only measured word selection, not order Should try String Edit Distance (DTW), or Maximal Substring

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 14 Multi-Lingual Experiment TV news stories with titles from 1998 Broadcast News CD-ROM tested on 1000 held-out stories evaluated on titles Using SYSTRAN (Babelfish.altavista.com) translated English-French-English Vocabulary overlap was about 70% (need) ???

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 15 Example English-French-English Title: CONTINUING COVERAGE OF O. J. SIMPSON CIVIL TRIAL: MORE PHOTOS OF SIMPSON IN MAGLI SHOES MAY SURFACE Translation: AGAIN THE SHOES OF SIMPSON OF O J BECOME A FOCAL POINT IN SA CIVIL TEST. AND AGAIN A PHOTOGRAPH EAST A CRUCIAL PART OF THE IMAGE. FELDMAN OF CHARLES OF C N N EXPLAINS. THE SOURCES INDICATE C N N THAT THE LAWYERS FOR THE FAMILIES CONTINUING SIMPSON OF O J HAVE NOW ACCESS TO SEVERAL PHOTOGRAPHS ALLEGEDLY LATELY CLEARLY DISCOVERED TO SHOW SIMPSON CARRYING A PAIR OF SHOES OF BRUNO MAGLI OF SWEDEN. AN EXPERT AS REGARDS F B I A TESTIFIED WITH THE CIVIL TEST TO SIMPSON TO THAT SUCH A PAIR A LEFT TO THE COPIES TO SHOE BEHIND TO THE SCENE TO MURDER THE FORMER WIFE TO SIMPSON AND HIS GOLDMAN TO RON TO FRIEND. THE AGENT FOR THE FAMILIES OF VICTIMS A PRESENTED IN THE OBVIOUSNESS A PHOTOGRAPH TAKEN BY THE OAR OF HARRY OF PHOTOGRAPHER BY AND PUBLISHED INSIDE QUOTE THE QUOTATION MARK NATIONALS OF INVESTIGATOR. A TESTIFIED EXPERT A THAT PHOTO A SHOWN SIMPSON CARRYING THE SHOES... Method Name Original Machine Generated TitleMachine Generated Title after Translation NBL simpson civil trial simpson's estate news murder investigation simpson civil search victims NBF continuing coverage simpson civil trial verdict continuing coverage simpson civil trial president KNN O. J. SIMPSON TRIAL RESUME MONDAYDAY BACK COURT JURORS SIMPSON CIVIL TRIAL TF.IDF simpson civil trial photo magli shoes continuing coverage simpson civil trial president

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 16 Multilingual Results

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann 17 Effect of Word Order