Star Challenge – multimedia search competition 2008 NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, 2008 10/1/20151.

Slides:



Advertisements
Similar presentations
ARTIFICIAL PASSENGER.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Hand Gesture for Taking Self Portrait Shaowei Chu and Jiro Tanaka University of Tsukuba Japan 12th July 15 minutes talk.
1 Challenge the future HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences Omar Oreifej Zicheng Liu CVPR 2013.
Learning Techniques for Video Shot Detection Under the guidance of Prof. Sharat Chandran by M. Nithya.
Automated Shot Boundary Detection in VIRS DJ Park Computer Science Department The University of Iowa.
Digital Interactive Entertainment Dr. Yangsheng Wang Professor of Institute of Automation Chinese Academy of Sciences
Detecting Cartoons a Case Study in Automatic Video-Genre Classification Tzvetanka Ianeva Arjen de Vries Hein Röhrig.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
Content-based Video Indexing, Classification & Retrieval Presented by HOI, Chu Hong Nov. 27, 2002.
ICME 2008 Huiying Liu, Shuqiang Jiang, Qingming Huang, Changsheng Xu.
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Recovering Intrinsic Images from a Single Image 28/12/05 Dagan Aviv Shadows Removal Seminar.
ADVISE: Advanced Digital Video Information Segmentation Engine
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
T.Sharon 1 Internet Resources Discovery (IRD) Video IR.
MUSCLE movie data base is a multimodal movie corpus collected to develop content- based multimedia processing like: - speaker clustering - speaker turn.
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications Lucia Maddalena and Alfredo Petrosino, Senior Member, IEEE.
Stockman MSU Fall Computing Motion from Images Chapter 9 of S&S plus otherwork.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Trip Report for The IASTED International Conference on Internet and Multimedia Systems and Applications (EuroIMSA 2006) February 13-15, 2006 Innsbruck,
Multimodal Analysis Video Representation Video Highlights Extraction Video Browsing Video Retrieval Video Summarization.
AKSHAY UTTAMANI( ) DIVYAM JAISWAL( ) SAURABH KHANDELWAL( )
Creating and Exploring a Large Photorealistic Virtual Space INRIA / CSAIL / Adobe First IEEE Workshop on Internet Vision, associated with CVPR 2008.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Working with iMovie: The Basics Created by MJ. Importing Video Footage attach your camera to your computer and set to VTR mode move the camera button.
Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.
Lesson 1 What is Camtasia?. Lesson 2 Editing Objectives After completing the lesson, the student will be able to: Edit a basic recording Camtasia file.
TRECVID 2004 Search Task by NUS PRIS Tat-Seng Chua, et al. National University of Singapore.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Multimodal Information Analysis for Emotion Recognition
AUTOMATIC TARGET RECOGNITION OF CIVILIAN TARGETS September 28 th, 2004 Bala Lakshminarayanan.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Spectrograms Revisited Feature Extraction Filter Bank Analysis EEG.
Levi Smith.  Reading papers  Getting data set together  Clipping videos to form the training and testing data for our classifier  Project separation.
Soccer Video Analysis EE 368: Spring 2012 Kevin Cheng.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.
Expectation-Maximization (EM) Case Studies
SHAREE THIELEMANN Video Production. Introduction Plan a Lesson Content Standards Assessment Student Work Reflections Resources Step Guides Technology.
Image and Video Retrieval INST 734 Doug Oard Module 13.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Detection of Illicit Content in Video Streams Niall Rea & Rozenn Dahyot
Igor Rosenberg Summer internship Creating a building detector June 16 th to September 15 th in Dublin City University, Ireland Supervisor: Alan Smeaton.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.
1 Chapter 15 Creating a Presentation. Practical Computer Literacy, 2 nd edition Chapter 15 2 What’s inside and on the CD? In this chapter, you will learn.
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
Bayesian Decision Theory Case Studies CS479/679 Pattern Recognition Dr. George Bebis.
Digital Video Library - Jacky Ma.
Methods of Computer Input and Output
Lainie Chang Grand Canyon University January 20, 2010 TEC 542
Presenter: Ibrahim A. Zedan
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques
Introductory Seminar on Research: Fall 2017
Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.
Arrows inside the frame suggest movement of subjects
Multimedia Information Retrieval
Vehicle detection and localization
Jiwon Kim Steve Seitz Maneesh Agrawala
Presentation transcript:

Star Challenge – multimedia search competition 2008 NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, /1/20151

Agenda About StarChallenge Approaches – Audio system – Video system Results 10/1/20152

Let’s start with a clip on Tai Chi!

The Star Challenge International Competition organized by Singapore A*STAR Focus on Multimedia Search by Voice and Video Prize: – Free Trip to Singapore (blah!) – USD 100,000 (!!!)

The Tasks Voice Search – AT1: Search by IPA (International Phonetic Alphabet) – AT2: Search by Example – AT3: Search for recurrent voice segments Video Search – VT1: Search by (single) Query Image – VT2: Search by Video Shot – VT3: Scene/Event Categorization AT3 and VT3 replaced by integrated search in the end

Timeline Mar 31: Registration Deadline – Registered as adMIRer – 5 members from NUS-SIGIR – 56 teams registered in total June 18: 1 st Knockout Round – AT1+AT2 – 8 Teams qualified

Timeline July 18: 2 nd Knockout Round – VT1+VT2 – 7 Teams qualified September 4: Qualifying Race – All four tasks with Integrated Search – Only 5 Teams would qualify October 23: Grand Final – On-site evaluation

Audio system – general approach Use MFCC - well reflects speech Use local alignment to align 2 sequences of audio & query Using spectrogram, we cut up long audio into small segments for better matching.  Short demo 10/1/20158

Audio system – system overview 10/1/20159 Test audio files Speech recognizer Audio feature extractor Query audio files Query-test similarity matrix Index dataQuery text Query MFCC vectors Lucene indexing Test MFCC vectors Test text Alignment & matchingLucene matching Results Heuristic fusion

Audio system – Handle IPA " i n t r ^ s t r ei t”: IPA query Translate to CMU phonemes: IH N T R AH S T R EY T INTEREST: IH N T R AH S T RATE: R EY T Query text: input to text module directly synthezied to audio file for audio module 10/1/ IHNTRAHSEYAAAEAO auaibtSdTHee:eiau AWAYBCHDDHEHEREYAW

Audio system – overall performance Not have complete statistics yet, but AT2 (query by example) ~ 30-40% MAP, AT1 ~ 10 % Let’s listen to a few queries … 10/1/201511

Video system – VT1 categories 1. Crowd (>10 people) 2. Building with sky as backdrop, clearly visible 3. Mobile devices including handphone/PDA 4. Flag 5. Electronic chart, e.g. stock charts, airport departure chart 6. TV chart Overlay, including graphs, text, powerpoint style 7. Person using Computer, both visible 8. Track and field, sports 9. Company Trademark, including billboard, logo 10. Badminton court, 10/1/ Swimming pool, sports 12. Closeup of hand, e.g. using mouse, writing, etc 13. Business meeting (> 2 people), mostly seated down, table visible 14. Natural scene, e.g. mountain, trees, sea, no pple 15. Food on dishes, plates 16. Face closeup, occupying about 3/4 of screen, frontal or side 17. Traffic Scene, many cars, trucks, road visible 18. Boat/Ship, over sea, lake 19. PC Webpages, screen of PC visible 120. Airplane

Video system - examples 10/1/ Face closeup 2. Building with sky backdrop 9. Company trademark 3. Mobile devices

Video system – VT2 categories 1. People entering/exiting door/car 2. Talking face with introductory caption 3. Fingers typing on a keyboard 4. Inside a moving vehicle, looking outside 5. Large camera movement, tracking an object, person, car, etc 6. Static or minute camera movement, people(s) walking, legs visible 7. Large camera movement, panning left/right, top/down of a scene 8. Movie ending credit 9. Woman monologue 10. Sports celebratory hug 10/1/201514

Video system – general approach 10/1/ classifiers Classified cateogry Test files Category filtering Query category Filtered test files Matching Query file Matched test files

Video system - Training data size 10/1/ CategorySize CategorySize Dev = 10% labelled data, Train = 90% labelled data Size varies significantly across different categories Development data statistics

Train key frames + categories Layout extractor Edge extractorFace detectorColor extractor Color classifierFace classifierEdge classifier Layout classifier Color histogram (HSV, RGB) Segmentation info Num faces, size, positions Edge histogram Dev key frames Multi-class SVM training Color recall /categories Layout recall /categories Facerecall /categories Edge recall /categories Video system – classifier training Uses as weights

10/1/ faceedgelayouthsvrgblab Classifer recall/categories Uses as weights when fusing all different classifier No miror analysis & n- fold testing yet

Color histogram (HSV, RGB) Segmentation info Num faces, size, positions Edge histogram motion histogram; camera & object motion Test Key frames Classifier merger (weights from dev data) Color classifierFace classifierEdge classifier Layout classifier Video system – Category filtering & Matching Layout extractor Edge extractorFace detectorColor extractor Motion extractor Test video Category filtering Query category Filtered key frames Heuristic category filtering Filtered video Matching Query video/frames Results

Video system – motion 1 10/1/ Camera: panning leftCamera: panning up Object motion: moving Object motion: static

Video system – motion 2 10/1/ Check if most vector ~ 0  static motion Otherwise, filter all small motion vectors Categories motion vectors into circle bins  histogram. + main vector motion If main vector motion dominates  camera motion  panning left, right, up, down To detect zooming, find a focus block/point Object motion is derived after removing camera motion

Conclusion We have built up a full-function system within a short time and in an ad-hoc manner There are plenty of place for performance improvement and detailed analysis. 10/1/201522

Q & A? Thank you !!! 10/1/201523