Download presentation
Presentation is loading. Please wait.
Published bySimon Short Modified over 9 years ago
2
Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004
3
Objective Learn model of sound object from few (10-20) examples and distinguish from all other sounds Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog bark, etc
4
Applications “Tell me if you hear a gunshot.” (monitoring) “Get me video clips containing dogs barking.” (search and retrieval) “What’s going on?” (scene understanding)
5
Why its difficult Sound classes have large variations Sounds are often ambiguous without context Overlaid “noise” obscures sound
6
Sound or not? Car horn Laser gun Dog bark Which of these sounds are not from their named classes?
7
Previous work Sound Classification (Wold 1996, Casey 2001, etc) Categorize short sound clips Reasonable accuracy (5-20% error) Sound Detection (Defaux 2000, Piamsa-nga 1999) Localize and recognize sound objects in long clips Poor performance or assumption of unrealistic conditions (e.g., very quiet background)
8
Detection via Windowed Search Long Track … Clip 1 Clip 2 Clip N Break audio track into short overlapping short clips Clip Classifier Independently classify short clips as object or non-object Return locations of detected sound object
9
Representation meows phone rings Raw Representation Time-frequency analysis: windowed Fourier transform Extract power percentage in each band over time and total power over time Features Compute features used for classification
10
Classification Features Diverse feature set: Different sound classes are distinctive in different ways means and standard deviations of power at different frequencies Band-width, peaks, loudness, etc. 138 features in all
11
Classification by Decision Trees Try to find simple rules that discriminate object from non-object Each decision is based on a threshold of a feature value Assign confidence based on likelihood of data for object and non-object classes at each leaf node Decision nodes Leaf Nodes
12
Boosted Trees Problem: One decision tree by itself may not be a great classifier Solution: Use several trees, with each one focusing on the mistakes of previously learned trees Adaboost: Weight training data uniformly Learn a decision tree classifier on weighted data Re-weight data giving more weight to incorrectly classified examples Final classification based on linear combination of confidences from all learned decision trees
13
Examples of Decision Trees Low percentage of power in low frequencies in mid-time of sound Very high power amplitude range MeowGunshot High power amplitude range More complex tree that focuses on examples misclassified by tree above Gunshot
14
Cascade of Classifiers Goal: eliminate false positives with few false negatives in early stages Advantages: Allows use of large set of negative training examples Improves classification speed Dangers: cannot recover from false negatives Stage 1Sound ClipStage 2Stage 3Pass Fail Pass (5%)Pass (2%)Pass (0.005%) Fail
15
Results: Classification Error Best Performance Worst Performance stage 1stage 2stages 3 posnegposnegposneg meow0.0%1.4%0.0%1.2%2.2%0.8% phone0.0%0.4%4.3%0.1%5.9%0.0% car horn0.0%3.9%0.6%2.2%3.6%1.3% door bell1.4%2.1% 0.4%6.3%0.1% swords6.1%1.3%6.7%0.1%6.7%0.0% scream0.3%5.5%2.7%1.4%5.3%1.1% dog bark0.7%1.0%6.0%0.3%7.7%0.2% laser gun0.0%6.8%4.4%5.1%6.7%0.9% explosion4.1%5.2%7.5%1.5%12.0%0.5% light saber4.8%6.8%9.7%1.0%13.9%0.2% gunshot8.1%6.1%12.5%2.3%14.5%1.1% close door7.9%7.8%14.5%4.8%17.6%2.3% male laugh4.3%14.7%9.5%9.7%13.3%7.0% average2.9%4.4%6.0%2.2%8.5%1.1%
16
Results: ROC curves Note: to approximate negative error rate divide FP by 25,000
17
Results: Anecdotal GunshotsFemale LaughMale Laugh Swords Scream
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.