Download presentation
Presentation is loading. Please wait.
1
Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006
2
Outline Video Search Engines Content-Based Video Retrieval
3
Video Search Engines A survey of state-of-the-arts
4
Introduction Who are doing video search engines? Top text search engines 5.6 billion searches 07/2006
5
Introduction Google
6
Introduction Yahoo
7
Introduction MSN/Live Search
8
Introduction YouTube
9
Business Models Web Advertising Site Volume, or keyword customized Video Ads Disable controls (MSN) Subscription MLB, Real Download to own iTunes, Movie Rental Limited time, number of plays Other Desktop Media Search Media player (jukebox) Media Monitoring Media Asset Management
10
Types of video Sites Content Originators Major Broadcasters Affiliates, Local News Major League Baseball Syndication, Aggregation, “Internet Broadcasters” Rental, purchase, advertising, subscription MSN, Google, iTunes ROO Media, FeedRoom Movie and Video Download Share portals Consumer content, blogs YouTube, Putfile, Vsocial, Google, Akimbo Traditional Search Engines (Crawl) / “RSS” Yahoo, Blinkx Other Public (Internet Archive) Media Monitoring, asset management systems
11
Video Search Challenges
12
Current Video Search Engines Metadata File type and context Media file attributes Size, length Structured global metadata RSS content description Content Content Indexing Search within a video Full text of dialog Image or video content Automated Content Indexing
13
Current Video Search Engines Content Search Engines Keyword search with transcripts from speech recognition
14
Content-Based Video Search Engine Architecture
15
Content-Based Video Search Engine Video Processing
16
Content-Based Video Search Engine Research Challenges Speech Recognition Shot Boundary Detection Video Story Segmentation Concept Detection Multi-modal Fusion for Ranking Text/ASR, Audio/Speech, Visual, etc.
17
Content-Based Retrieval Our Research Problem Learning to rank video shots for automatic content-based search tasks ! Challenges Multi-Modal Information Fusion Small Sample Learning (a few pos. & no neg.) Learning on large-scale datasets
18
Multi-modal and Multi-scale Ranking Framework Main Ideas Representing video structures by graphs Using semi-supervised learning to address small labeled sample learning problem Fusing Multi-modal information by Harmonic learning over graphs Multi-scale ranking for achieving efficient performance on large-scale datasets
19
Multi-modal and Multi-scale Ranking Framework Graph-based Modeling Story Text Shot
20
Multi-modal and Multi-scale Ranking Framework Semi-Supervised Learning on Graph To find an optimal real-valued function g: V R on the graph G To minimize a quadratic energy function: Using Gaussian field and Harmonic property of Spectral Graph Theory (J. Zhu’s ICML’03), a harmonic function g can be found:
21
Multi-modal and Multi-scale Ranking Framework Semi-Supervised Learning on Graph Let The solution of the harmonic function g can be expressed in matrix operations:
22
Multi-modal and Multi-scale Ranking Framework Multi-Modal Fusion over Graph To combine text information into SSL on visual modality, we consider the text inputs as the attached nodes on the visual graph: Visual - g Text - f
23
Multi-modal and Multi-scale Ranking Framework Challenges Number of examples in database: N is large For examples: TRECVID 2005: Rep. Key-Frames N = 45,765 TRECVID 2006: Rep. Key-Frames N = 79,487 How to do Semi-Supervised Learning?!
24
Multi-modal and Multi-scale Ranking Framework Multi-Scale Ranking Learning ranking through multi-scale reranking Each stage is associated with different computational costs In our solution, four ranking stages include: Ranking by Text Retrieval using Language Models Re-ranking by NN fusing Text and Visual Re-ranking by SVM fusing Text and Visual Re-ranking by multi-modal Semi-supervised Learning
25
Top M related Stories Text Top N2 related Shots Text + Visual NN SVM/KLR Top N3 related Shots Top N4 related Shots SSR Video Stories Video Shots Top N1 related Shots Text Processing Video Processing User’s Query return top K shots Multi-modal Fusion Multi-scale Ranking Image Processing Raw Video Clips / Streams Semi-Supervised Ranking Supervised Ranking
26
Benchmark Evaluations Dataset TRECVID 2005 Test: 140 video clips, 45,765 rep. key frames 24 queries A query example:
27
Benchmark Evaluations Text-only Retrieval No Pseudo-Relevance Feedback (No-PRF) With Pseudo-Relevance Feedback (PRF) Language Models TF-IDF Okapi KL-JM KL-DIR KL-ABS
28
Benchmark Evaluations Visual Features Color Grid Color Moment 3*3 grid, 81-dimensions Edge Edge Direction Histogram 36 bin+1, 37-dimensions Texture Gabor Moments 5*8=40, 3 moments,120 dimensions 238 dimensions in total COREL Benchmark Photos
29
Benchmark Evaluations Multi-modal Retrieval (Text + Visual) Text-only retrieval Text + NN (Text + Visual) Text + SVM (Text + Visual) MMMS (Text + Visual)
30
Benchmark Evaluations MAPNum_RetImprovement Text0.090316690% Text+NN0.10341705+14.51% Text+SVM0.10831764+19.93% MMMS0.11571764+28.13% Average Performance on TRECVID 2005 Dataset Evaluation Results
31
Benchmark Evaluations Average performance of 24 queries Comparison with other approaches
32
Related Work IBM Solution SVM + NN + Multiple Instance Learning Columbia solution Information-Theoretical Clustering Approach CMU Solution Query-Class Dependent Weighting Ranking
33
Conclusion A tutorial of video search engines Research contributions A Unified framework of Multi-Modal and Multi- Scale Ranking for video retrieval Graph-based Modeling of video structures Semi-Supervised Learning for Multimodal Ranking Making SSL practical for large-scale problems Promising empirical results…
34
Future Work Research is in progress, tough ahead… Any suggestions or comments?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.