Multimedia Retrieval. Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval.

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Light Oaks Junior School Year 5 Computing Curriculum The computing curriculum across all year groups will be made up of six units; - Algorithms and Programs.
Multimedia Database Systems
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Information Retrieval in Practice
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
Overview of Search Engines
Software and Multimedia
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Information Retrieval in Practice
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Multimedia Databases (MMDB)
RIAO video retrieval systems. The Físchlár-News-Stories System: Personalised Access to an Archive of TV News Alan F. Smeaton, Cathal Gurrin, Howon.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Understanding the Semantics of Media Lecture Notes on Video Search & Mining, Spring 2012 Presented by Jun Hee Yoo Biointelligence Laboratory School of.
TRECVID Evaluations Mei-Chen Yeh 05/25/2010. Introduction Text REtrieval Conference (TREC) – Organized by National Institute of Standards (NIST) – Support.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
Class 13 LBSC 690 Information Technology More Multimedia Compression and Recognition, and Social Issues.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
Image and Video Retrieval INST 734 Doug Oard Module 13.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
CS100 Final Review Study the quizzes Find out what you missed on the midterms.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Information Retrieval in Practice
Digital Video Library - Jacky Ma.
Visual Information Retrieval
Search Engine Architecture
CS 430: Information Discovery
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Information Retrieval and Web Search
Introduction to Music Information Retrieval (MIR)
Information Retrieval and Web Search
Information Retrieval and Web Search
Multimedia Information Retrieval
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Information Retrieval and Web Search
Presentation transcript:

Multimedia Retrieval

Outline Audio Retrieval Spoken information Music Document Image Analysis and Retrieval Video Retrieval

A Taxonomy of Audio Sound MusicOther?Speech Classical Country DiscoHip Hop Jazz Rock Sports Announcer Female Male Orchestra String Quartet Choir Piano ?

Spoken Document Retrieval

Acoustic Modeling Describes the sounds that make up speech Lexicon Describes which sequences of speech sounds make up valid words Language Model Describes the likelihood of various sequences of words being spoken Speech Recognition Speech Recognition Knowledge Sources

Speech Recognition in Brief Pronunciation Lexicon Signal Processing Phonetic Probability Estimator (Acoustic Model) Decoder (Language Model) Words Speech Grammar

Hints For Better Recognition Topical information News of the day Image information ? Goal: improve the estimation p(word|acoustic_sig) Main idea: p(word|acoustic_sign)  p(word|acoustic_signal, X) What could be X?

Hints For Better Recognition Topical information News of the day Image information Lip reading Video Optical Character Recognition (VOCR) Goal: improve the estimation p(word|acoustic_sig) Main idea: p(word|acoustic_sign)  p(word|acoustic_signal, X) What could be X?

Speech Recognition Accuracy Word Error Rate

Information Retrieval Precision vs. Speech Accuracy Word Error Rate % of Text IR Relative Precision Indexing and Search of Multimodal Information, Hauptmann, A., Wactlar, H. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP-97), Munich, Germany, April A rather small degradation in retrieval when word error rate is small than 30%

Spoken Document Retrieval Segmentation issue Continuous speech data without story boundaries Typical segmentation approaches ­Overlapping windows (30 sec for each segment) ­Automatic detection of speaker changes

Spoken Document Retrieval: Document Expansion Motivation: documents are erroneous Goal: apply expansion techniques to reduce the impacts of recognition errors in spoken documents Similar to query expansion

Spoken Document Retrieval: Document Expansion Motivation: documents are erroneous Goal: apply expansion techniques to reduce the impacts of recognition errors in spoken documents Similar to query expansion Clean Doc Collection (web docs) Speech Recognized Transcript doc1 doc2 doc3 doc4 Find common words in top ranked docs

Spoken Document Retrieval: Document Expansion Motivation: documents are erroneous Goal: apply expansion techniques to reduce the impacts of recognition errors in spoken documents Similar to query expansion Treat each speech document as a query Find clean documents that are relevant to speech documents Expand each speech document with the common words in the top ranked clean documents.

Document Expansion (Sighal & Piereira, 1999)

A Taxonomy of Audio Sound MusicOther?Speech Classical Country DiscoHip Hop Jazz Rock Sports Announcer Female Male Orchestra String Quartet Choir Piano ?

Music Information Retrieval

Music Retrieval A textual retrieval approach Using meta data: titles, artists, genres, … Content-based music retrieval Query by audio Query by score document/segment

Content-based Music Retrieval Short-term Autocorrelation Note Segmentation Mid-level Representation Similarity Comparison Query results (Ranked song list) Songs Database Midi message Extraction Microphone Signal input Sampling 11KHz Center Clipping Off-line processing On-line processing (Midi representation)

Content-based Music Retrieval  :  : N-gram representation  1 1 2C C –2C310 0 –2 0C C501 A vector representation for each music document A typical information retrieval problem

Document Image Analysis and Retrieval

Document Image Analysis Recognize text (OCR) convert page images to Unicode machine-printed, handwritten Analyze page layout geometry a 2-D problem (unlike speech, text) good ‘language-free’ algorithms Capture logical structure output marked-up text (XML, etc) exploit non-textual clues

Video/Image OCR Block Diagram Text Area Detection Text Area Preprocessing Commercial OCR Video or Image UTF8 Text

Text Detection

Low resolution (as low as 10 pixel height/character) limited by NTSC (352x248) /PAL/SECAM TV standard Complex background Character Hue and Brightness similar to background Video OCR

VOCR Preprocessing Problems

Video Frames (1/2 s intervals) Filtered FramesAND-ed Frames

OCR Document Retrieval Task: find OCR recognized document relevant to a information need Challenge: erroneous documents  needs to handle with word errors

OCR Document Retrieval Correction based approaches Find potential word errors and replace each with the most likely correct one Partial matching approaches Word  a set of n-grams Word matches  n-gram matches

Video Retrieval

Video Retrieval - Application of Diverse Technologies Speech understanding for automatically derived transcripts Image understanding for video “paragraphing”; face, text and other object recognition Natural language for query expansion, topic detection and content summarization Human computer interaction for video display, navigation and reuse Integration overcomes limitation of each

Introduction to TREC Video Retrieval Track NIST TREC Video Track web site: nlpir.nist.gov/projects/trecvid/ Video Retrieval Track started in 2001 Investigation of content-based retrieval from digital video Focus on the shot as the unit of information retrieval rather than the scene or story/segment/clip

The TRECVID Collections hours, 74 queries, 8000 shots hours, 25 queries, shots Video from the Internet Archive between the ‘50’s and ’70’s Advertising, educational, industrial and amateur films Common shot boundaries 2003 – 56 hours, 25 queries, shots 1998 Broadcast News (CNN, ABC, CSpan) + Common Speech Recognition + Common Annotations 2004 – 61 hours, 24 queries, shots More 1998 Broadcast News

Sample Query and Target Query: Find pictures of Harry Hertz, Director of the National Quality Program, NIST Speech: We’re looking for people that have a broad range of expertise that have business knowledge that have knowledge on quality management on quality improvement and in particular … OCR: H,arry Hertz a Director aro 7 wa-,i,,ty Program,Harry Hertz a Director

System Architecture (Trec Video Track 2001) Combine video, audio and text retrieval scores Query TextImageAudio Text ScoreImage ScoreAudio Score Retrieval Agents Final Score

ARRRecall ASR Transcripts1.84%13.2% VOCR5.93%7.52% Image Retrieval14.99%24.45% Combine18.9%28.25% Results for TREC01