Presentation is loading. Please wait.

Presentation is loading. Please wait.

Star Challenge – multimedia search competition 2008 NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, 2008 10/1/20151.

Similar presentations


Presentation on theme: "Star Challenge – multimedia search competition 2008 NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, 2008 10/1/20151."— Presentation transcript:

1 Star Challenge – multimedia search competition 2008 NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, 2008 10/1/20151

2 Agenda About StarChallenge Approaches – Audio system – Video system Results 10/1/20152

3 Let’s start with a clip on Tai Chi!

4 The Star Challenge International Competition organized by Singapore A*STAR Focus on Multimedia Search by Voice and Video Prize: – Free Trip to Singapore (blah!) – USD 100,000 (!!!)

5 The Tasks Voice Search – AT1: Search by IPA (International Phonetic Alphabet) – AT2: Search by Example – AT3: Search for recurrent voice segments Video Search – VT1: Search by (single) Query Image – VT2: Search by Video Shot – VT3: Scene/Event Categorization AT3 and VT3 replaced by integrated search in the end

6 Timeline Mar 31: Registration Deadline – Registered as adMIRer – 5 members from NUS-SIGIR – 56 teams registered in total June 18: 1 st Knockout Round – AT1+AT2 – 8 Teams qualified

7 Timeline July 18: 2 nd Knockout Round – VT1+VT2 – 7 Teams qualified September 4: Qualifying Race – All four tasks with Integrated Search – Only 5 Teams would qualify October 23: Grand Final – On-site evaluation

8 Audio system – general approach Use MFCC - well reflects speech Use local alignment to align 2 sequences of audio & query Using spectrogram, we cut up long audio into small segments for better matching.  Short demo 10/1/20158

9 Audio system – system overview 10/1/20159 Test audio files Speech recognizer Audio feature extractor Query audio files Query-test similarity matrix Index dataQuery text Query MFCC vectors Lucene indexing Test MFCC vectors Test text Alignment & matchingLucene matching Results Heuristic fusion

10 Audio system – Handle IPA " i n t r ^ s t r ei t”: IPA query Translate to CMU phonemes: IH N T R AH S T R EY T INTEREST: IH N T R AH S T RATE: R EY T Query text: input to text module directly synthezied to audio file for audio module 10/1/201510 intr^seia:@o: IHNTRAHSEYAAAEAO auaibtSdTHee:eiau AWAYBCHDDHEHEREYAW

11 Audio system – overall performance Not have complete statistics yet, but AT2 (query by example) ~ 30-40% MAP, AT1 ~ 10 % Let’s listen to a few queries … 10/1/201511

12 Video system – VT1 categories 1. Crowd (>10 people) 2. Building with sky as backdrop, clearly visible 3. Mobile devices including handphone/PDA 4. Flag 5. Electronic chart, e.g. stock charts, airport departure chart 6. TV chart Overlay, including graphs, text, powerpoint style 7. Person using Computer, both visible 8. Track and field, sports 9. Company Trademark, including billboard, logo 10. Badminton court, 10/1/201512 11. Swimming pool, sports 12. Closeup of hand, e.g. using mouse, writing, etc 13. Business meeting (> 2 people), mostly seated down, table visible 14. Natural scene, e.g. mountain, trees, sea, no pple 15. Food on dishes, plates 16. Face closeup, occupying about 3/4 of screen, frontal or side 17. Traffic Scene, many cars, trucks, road visible 18. Boat/Ship, over sea, lake 19. PC Webpages, screen of PC visible 120. Airplane

13 Video system - examples 10/1/201513 16. Face closeup 2. Building with sky backdrop 9. Company trademark 3. Mobile devices

14 Video system – VT2 categories 1. People entering/exiting door/car 2. Talking face with introductory caption 3. Fingers typing on a keyboard 4. Inside a moving vehicle, looking outside 5. Large camera movement, tracking an object, person, car, etc 6. Static or minute camera movement, people(s) walking, legs visible 7. Large camera movement, panning left/right, top/down of a scene 8. Movie ending credit 9. Woman monologue 10. Sports celebratory hug 10/1/201514

15 Video system – general approach 10/1/201515 classifiers Classified cateogry Test files Category filtering Query category Filtered test files Matching Query file Matched test files

16 Video system - Training data size 10/1/201516 CategorySize 115 247 333 46 55 618 730 83 923 103 CategorySize 1121 12130 1311 1421 153 1642 176 1810 1943 203 Dev = 10% labelled data, Train = 90% labelled data Size varies significantly across different categories Development data statistics

17 Train key frames + categories Layout extractor Edge extractorFace detectorColor extractor Color classifierFace classifierEdge classifier Layout classifier Color histogram (HSV, RGB) Segmentation info Num faces, size, positions Edge histogram Dev key frames Multi-class SVM training Color recall /categories Layout recall /categories Facerecall /categories Edge recall /categories Video system – classifier training Uses as weights

18 10/1/201518 faceedgelayouthsvrgblab 1 0.02 2 0.210.61 31 0.15 4 5 6 0.770.550.410.33 7 0.260.83 8 9 0.260.170.40.5 10 0.331 11 0.760.710.810.35 12 0.30.57 13 0.270.33 14 0.140.520.250.18 15 16 0.230.160.110.14 17 18 0.18 19 0.340.480.340.28 20 Classifer recall/categories Uses as weights when fusing all different classifier No miror analysis & n- fold testing yet

19 Color histogram (HSV, RGB) Segmentation info Num faces, size, positions Edge histogram motion histogram; camera & object motion Test Key frames Classifier merger (weights from dev data) Color classifierFace classifierEdge classifier Layout classifier Video system – Category filtering & Matching Layout extractor Edge extractorFace detectorColor extractor Motion extractor Test video Category filtering Query category Filtered key frames Heuristic category filtering Filtered video Matching Query video/frames Results

20 Video system – motion 1 10/1/201520 Camera: panning leftCamera: panning up Object motion: moving Object motion: static

21 Video system – motion 2 10/1/201521 Check if most vector ~ 0  static motion Otherwise, filter all small motion vectors Categories motion vectors into circle bins  histogram. + main vector motion If main vector motion dominates  camera motion  panning left, right, up, down To detect zooming, find a focus block/point Object motion is derived after removing camera motion

22 Conclusion We have built up a full-function system within a short time and in an ad-hoc manner There are plenty of place for performance improvement and detailed analysis. 10/1/201522

23 Q & A? Thank you !!! 10/1/201523


Download ppt "Star Challenge – multimedia search competition 2008 NUS.SIGIR group Luong Minh Thang & Zhao Jin WING group meeting – 12 Sep, 2008 10/1/20151."

Similar presentations


Ads by Google