Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006.

Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006

Outline  Video Search Engines  Content-Based Video Retrieval

Video Search Engines A survey of state-of-the-arts

Introduction  Who are doing video search engines? Top text search engines 5.6 billion searches 07/2006

Introduction  Google

Introduction  Yahoo

Introduction  MSN/Live Search

Introduction  YouTube

Business Models  Web Advertising  Site Volume, or keyword customized  Video Ads  Disable controls (MSN)  Subscription  MLB, Real  Download to own  iTunes, Movie  Rental  Limited time, number of plays  Other  Desktop Media Search  Media player (jukebox)  Media Monitoring  Media Asset Management

Types of video Sites  Content Originators  Major Broadcasters  Affiliates, Local News  Major League Baseball  Syndication, Aggregation, “Internet Broadcasters”  Rental, purchase, advertising, subscription  MSN, Google, iTunes  ROO Media, FeedRoom  Movie and Video Download  Share portals  Consumer content, blogs  YouTube, Putfile, Vsocial, Google, Akimbo  Traditional Search Engines (Crawl) / “RSS”  Yahoo, Blinkx  Other  Public (Internet Archive)  Media Monitoring, asset management systems

Video Search Challenges

Current Video Search Engines Metadata  File type and context  Media file attributes  Size, length  Structured global metadata  RSS content description Content  Content Indexing  Search within a video  Full text of dialog  Image or video content  Automated Content Indexing

Current Video Search Engines  Content Search Engines Keyword search with transcripts from speech recognition

Content-Based Video Search Engine  Architecture

Content-Based Video Search Engine  Video Processing

Content-Based Video Search Engine  Research Challenges  Speech Recognition  Shot Boundary Detection  Video Story Segmentation  Concept Detection  Multi-modal Fusion for Ranking  Text/ASR, Audio/Speech, Visual, etc.

Content-Based Retrieval  Our Research Problem  Learning to rank video shots for automatic content-based search tasks !  Challenges  Multi-Modal Information Fusion  Small Sample Learning (a few pos. & no neg.)  Learning on large-scale datasets

Multi-modal and Multi-scale Ranking Framework  Main Ideas  Representing video structures by graphs  Using semi-supervised learning to address small labeled sample learning problem  Fusing Multi-modal information by Harmonic learning over graphs  Multi-scale ranking for achieving efficient performance on large-scale datasets

Multi-modal and Multi-scale Ranking Framework  Graph-based Modeling Story Text Shot

Multi-modal and Multi-scale Ranking Framework  Semi-Supervised Learning on Graph  To find an optimal real-valued function g: V  R on the graph G  To minimize a quadratic energy function:  Using Gaussian field and Harmonic property of Spectral Graph Theory (J. Zhu’s ICML’03), a harmonic function g can be found:

Multi-modal and Multi-scale Ranking Framework  Semi-Supervised Learning on Graph  Let  The solution of the harmonic function g can be expressed in matrix operations:

Multi-modal and Multi-scale Ranking Framework  Multi-Modal Fusion over Graph  To combine text information into SSL on visual modality, we consider the text inputs as the attached nodes on the visual graph: Visual - g Text - f

Multi-modal and Multi-scale Ranking Framework  Challenges  Number of examples in database: N is large  For examples:  TRECVID 2005: Rep. Key-Frames N = 45,765  TRECVID 2006: Rep. Key-Frames N = 79,487  How to do Semi-Supervised Learning?!

Multi-modal and Multi-scale Ranking Framework  Multi-Scale Ranking  Learning ranking through multi-scale reranking  Each stage is associated with different computational costs  In our solution, four ranking stages include:  Ranking by Text Retrieval using Language Models  Re-ranking by NN fusing Text and Visual  Re-ranking by SVM fusing Text and Visual  Re-ranking by multi-modal Semi-supervised Learning

Top M related Stories Text Top N2 related Shots Text + Visual NN SVM/KLR Top N3 related Shots Top N4 related Shots SSR Video Stories Video Shots Top N1 related Shots Text Processing Video Processing User’s Query return top K shots Multi-modal Fusion Multi-scale Ranking Image Processing Raw Video Clips / Streams Semi-Supervised Ranking Supervised Ranking

Benchmark Evaluations  Dataset  TRECVID 2005  Test: 140 video clips, 45,765 rep. key frames  24 queries  A query example:

Benchmark Evaluations  Text-only Retrieval  No Pseudo-Relevance Feedback (No-PRF)  With Pseudo-Relevance Feedback (PRF) Language Models  TF-IDF  Okapi  KL-JM  KL-DIR  KL-ABS

Benchmark Evaluations  Visual Features  Color  Grid Color Moment  3*3 grid, 81-dimensions  Edge  Edge Direction Histogram  36 bin+1, 37-dimensions  Texture  Gabor Moments  5*8=40, 3 moments,120 dimensions  238 dimensions in total COREL Benchmark Photos

Benchmark Evaluations  Multi-modal Retrieval (Text + Visual)  Text-only retrieval  Text + NN (Text + Visual)  Text + SVM (Text + Visual)  MMMS (Text + Visual)

Benchmark Evaluations MAPNum_RetImprovement Text0.090316690% Text+NN0.10341705+14.51% Text+SVM0.10831764+19.93% MMMS0.11571764+28.13% Average Performance on TRECVID 2005 Dataset  Evaluation Results

Benchmark Evaluations Average performance of 24 queries  Comparison with other approaches

Related Work  IBM Solution  SVM + NN + Multiple Instance Learning  Columbia solution  Information-Theoretical Clustering Approach  CMU Solution  Query-Class Dependent Weighting Ranking

Conclusion  A tutorial of video search engines  Research contributions  A Unified framework of Multi-Modal and Multi- Scale Ranking for video retrieval  Graph-based Modeling of video structures  Semi-Supervised Learning for Multimodal Ranking  Making SSL practical for large-scale problems  Promising empirical results…

Future Work  Research is in progress, tough ahead…  Any suggestions or comments?

Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006.

Similar presentations

Presentation on theme: "Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006.

Similar presentations

Presentation on theme: "Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006."— Presentation transcript:

Similar presentations

About project

Feedback