TRECVID 2004 Search Task by NUS PRIS Tat-Seng Chua, et al. National University of Singapore.

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Multimedia Answer Generation for Community Question Answering.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Information Retrieval Review
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Presented by Zeehasham Rasheed
Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval in Practice
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Multimedia Databases (MMDB)
RIAO video retrieval systems. The Físchlár-News-Stories System: Personalised Access to an Archive of TV News Alan F. Smeaton, Cathal Gurrin, Howon.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Carnegie Mellon TRECVID 2004 Workshop – November 2004 Mike Christel, Jun Yang, Rong Yan, and Alex Hauptmann Carnegie Mellon University
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
HYP Progress Update By Zhao Jin. Outline Background Progress Update.
Multimodal Information Analysis for Emotion Recognition
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Subtask 1.8 WWW Networked Knowledge Bases August 19, 2003 AcademicsAir force Arvind BansalScott Pollock Cheng Chang Lu (away)Hyatt Rick ParentMark (SAIC)
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Semi-Automatic Image Annotation Liu Wenyin, Susan Dumais, Yanfeng Sun, HongJiang Zhang, Mary Czerwinski and Brent Field Microsoft Research.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Digital Video Library - Jacky Ma.
Visual Information Retrieval
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CSE 635 Multimedia Information Retrieval
Introduction to Search Engines
Presentation transcript:

TRECVID 2004 Search Task by NUS PRIS Tat-Seng Chua, et al. National University of Singapore

Outline Introduction and Overview Query Analysis Multi-Modality Analysis Fusion and Pseudo Relevance Feedback Evaluations Conclusions

Introduction Our emphasis is three-fold: – Fully automated pipeline through the use of a generic query analysis module – The use of of query-specific models – The fusion of multi-modality features like text, OCR, visual concepts, etc Our technique is similar to that employed in text- based definition question-answering approaches

Overview of our System Video Query Expansion Multi-Class Analyzer Constraints Detection Text Query Processing Query Formulation Speaker Level Segmentation Speech Recognition Speaker Verification Shot Classification Video Content Processing Output Shots Multimedia Query Video Retrieval Speaker Verification Face Detection and Recognition Pseudo Relevance Feedback using OCR and ASR Shot Boundary Face Detection Video OCR Visual Concepts Feature Database Video Query Processing Text Retrieval based on Speaker level information Re-ranking by Pseudo Relevance Feedback Ranking of Shots based on Textual features Ranking of Shots based on Audio Visual features Fusion of Results

Multi-Modality Features Used ASR Shot Classes Video OCR Speaker Identification Face Detection and Recognition Visual Concepts

Outline Introduction and Overview Query Analysis Multi-Modality Analysis Fusion and Pseudo Relevance Feedback Evaluations Conclusions

Query Analysis Query NLP Analysis (pos, np, vp, ne) Query-class Key Core Query Terms Constraints WordNet, keywords list Morphological analysis to extract: – Part-of-Speech (POS) – Verb-phrase – Noun-phrase – Named entities Extract main core-terms (NN and NP)

Query analysis – 6 query classes PERSON: queries looking for a person. For example: “Find shots of Boris Yeltsin” SPORTS: queries looking for sports news scenes. For example: “Find more shots of a tennis player contacting the ball with his or her tennis racket.” FINANCE: queries looking for financial related shots such as stocks, business Merger & Acquisitions etc. WEATHER: queries looking for weather related shots. DISASTER: queries looking for disaster related shots. For example: “Find shots of one or more building with flood waters around it/them” GENERAL: queries that do not belong to any of the above categories. For example: “Find one or more people and one or more dogs walking together”

Examples of Query Analysis TopicQuery-classConstraintsCore termsClass 0125Find shots of a street scene with multiple pedestrians in motion and multiple vehicles in motion somewhere in the shot. in motion somewherestreetGENERAL 0126Find shots of one or more buildings with flood waters around it/them. with flood waters around it/them Buildings, flood DISASTER 0128Find shots of US Congressman Henry Hyde's face, whole or part, from any angle. whole or part, from any angle Henry HydePERSON 0130Find shots of a hockey rink with at least one of the nets fully visible from some point of view. one of the nets fully visiblehockeySPORTS 0135Find shots of Sam Donaldson's face - whole or part, from any angle, but including both eyes. No other people visible with him whole or part, from any angle, but including both eyes. No other people visible with him Sam Donaldson PERSON

Corresponding Target Shot Class for each query class Query-classTarget Shot Categories PERSONGeneral SPORTSSports FINANCEFinance WEATHERWeather DISASTERGeneral GENERALGeneral Pre-defined Shot Classes: General, Anchor-Person, Sports, Finance, Weather

Query Model -- Determine the Fusion of Multi-modality Features Class Weight of NE in Expanded terms Weight of OCR Weight of Speaker Identifica- tion Weight of Face Recogni -zer Weight of Visual Concepts (total of 10 visual concepts used) PeopleBasket- ball Hockeywater- body fire Etc PERSONHigh Low. SPORTSHighLow High Low. FINANCELowHighLowHighLow. WEATHERLowHighLowHighLow. DISASTERLow High. GENERALLow HighLow. Weights obtained from labeled training corpus

Outline Introduction and Overview Query Analysis Multi-Modality Analysis Fusion and Pseudo Relevance Feedback Evaluations Conclusions

Text Analysis K1K1 QueryASR of Sample video K2K2 Document retrieval by Google news K3K3 Based on class of query to assign weights ASR WordNet Speaker level segments Based on tf.idf retrieval with weighted terms K 1  query terms expanded using its Synset (and/or glossary) from WordNet K 2  ASR (terms with high MI) from sample video clips K 3  Web expansion (terms with high MI) union K 1 & K 2

Other Modalities Video OCR – Based on featured donated by CMU, with error corrections using minimum edit distance during matching Face Recognition – Based on 2DHMM Speaker Identification – HMM model using MFCC and Log of Energy Visual Concepts – Using our concept-annotation approach for feature extraction

Fusion of Features Pseudo Relevance Feedback Treat top 10 returned shots as positive instances Perform PRF using text features only to extract additional keywords K 4 Similarity- based retrieval of shots using K 3 U K 4 Re-rank shots Note for those features that have low confidence values, their weights will be re-distributed to other features

Outline Introduction and Overview Query Analysis Multi-Modality Analysis Fusion and Pseudo Relevance Feedback Evaluations Conclusions

Evaluations Run1 (MAP=0.038) Text only We Submitted 6 runs: Run2 (MAP=0.071) Run1 + External Resource (Web + WordNet) Run3 (MAP=0.094) Run2 + OCR, Visual concepts, shot Classes and Speaker Detector

Evaluations -2 Run4 (MAP=0.119) Run3 + Face Recognizer Run5 (MAP=0.120) Run4 + More emphasis on OCR Run6 (MAP=0.124) Run5 + Pseudo Relevance Feedback

Overall Performance Run6: mean average precision (MAP) of 0.124

Conclusions Actually an automatic system – We focused on using general purpose query analysis to analyze queries Focused on the use of query classes to associate different retrieval models for different query classes Observed successive improvements in performance with use of more useful features, and with pseudo relevance feedback We did a further run (equivalent to Run 5) but use AQUANT (news of 1998) corpus to perform feature extraction, lead to some improvement in performance (MAP > 0.123) Main findings: – text feature effective in finding the initial ranked list, other modality features help in re-ranking the relevant shots – Use of relevant external knowledge is worth exploring

Current/Future Work Employ dynamic Baynesian and other GM models for perform fusion of multi-modality features, learning of query models, and relevance feedback Explore contextual models for concept annotations and face recognizer etc.

Acknowledgments Participants of this project: Tat-Seng Chua, Shi-Yong Neo, Ke-Ya Li, Gang Wang, Rui Shi, Ming Zhao and Huaxin Xu The authors would also like to thanks Institute for Infocomm Research (I2R) for the support of the research project “Intelligent Media and Information Processing” (R ), under which this project is carried out.

Question-Answering