Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cascaded Classification Models

Similar presentations


Presentation on theme: "Cascaded Classification Models"— Presentation transcript:

1 Cascaded Classification Models
Combining Models for Holistic Scene Understanding Geremy Heitz Stephen Gould Ashutosh Saxena Daphne Koller Stanford University NIPS 2008 December 11, 2008

2 Outline Understanding Scene Understanding Related Work CCM Framework
Results

3 “A car passes a bus on the road, while people walk past a building.”
Human View of a “Scene” BUILDING PEOPLE BUS CAR ROAD “A car passes a bus on the road, while people walk past a building.”

4 Computer View of a “Scene”
BUILDING ROAD Can we integrate all of these subtasks, so that whole > sum of parts ? STREET SCENE

5 Related Work = + = Intrinsic Images
[Barrow and Tenenbaum, 1978], [Tappen et al., 2005] Hoiem et al., “Closing the Loop in Scene Interpretation” , 2008 We want to focus more on “semantic” classes We want to be flexible to using outside models = + Problems with Hoiem: 1) Required output to be in the form of an image, 2) Used his own models that he had personally developed over the previous years, and 3) At joint learning time, only learned “surfaces” and “edges/occlusions”, the other models were pre-trained ahead of time. =

6 How Should we Integrate?
Single joint model over all variables Pros: Tighter interactions, more designer control Cons: Need expertise in each of the subtasks Simple, flexible combination of existing models Pros: State-of-the-art models, easier to extend Requires: Limited “black-box” interface to components Cons: Missing some of the modeling power DETECTION Dalal & Triggs, 2006 REGION LABELING Gould et al., 2007 DEPTH RECONSTRUCTION Saxena et al., 2007

7 Other Opportunities for Integration
Text Understanding Audio Signals Source Separation Speaker Recognition Speech Recognition Part-of-speech tagger noun verb adj “Mr. Obama sent himself an important reminder.” Semantic Role Identification Verb: sent Sender: Mr. Obama Receiver: himself Content: reminder Anaphora Resolution

8 Outline Understanding Scene Understanding Related Work CCM Framework
Results

9 Cascaded Classification Models
Image Features fDET fREG fREC DET1 REG1 REC1 DET0 Independent Models REG0 REC0 Context-aware Models Object Detection Region Labeling 3D Reconstruction

10 Integrated Model for Scene Understanding
Object Detection Region Labeling Depth Reconstruction Scene Categorization I’ll show you these

11 Basic Object Detection
Detection Window W = Car = Person Sliding window detection, score for each window = Motorcycle = Boat = Sheep = Cow Score(W) > 0.5

12 Context-Aware Object Detection
Scene Type: Urban scene From Scene Category MAP category, marginals From Region Labels How much of each label is in a window adjacent to W From Depths Mean, variance of depths, estimate of “true” object size Final Classifier % of “building” above W Variance of depths in W P(Y) = Logistic(Φ(W))

13 Region Labeling CRF Model
Label each pixel as one of: {‘grass’, ‘road’, ‘sky’, etc } Conditional Markov random field (CRF) over superpixels: Singleton potentials: log- linear function of boosted detectors scores for each class Pairwise potentials: affinity of classes appearing together conditioned on (x,y) location within the image [Gould et al., IJCV 2007]

14 Context-Aware Region Labeling
Where is the grass? Additional Feature: Relative Location Map

15 Depth Reconstruction CRF
Label each pixel with it’s distance from the camera Conditional Markov random field (CRF) over superpixels Continuous variables Models depth as linear function of features with pairwise smoothness constraints [Saxena et al., PAMI 2008]

16 Depth Reconstruction with Context
Grass is horizontal Sky is far away GRASS SKY BLACK BOX Find d* Reoptimize depths with new constraints: dCCM = argmin α||d - d*|| + β||d - dCONTEXT||

17 Training I fD fS fZ ŶD ŶS ŶZ I fD fS fZ ŶD ŶS * ŶZ I: Image
ŶS ŶZ I: Image f: Image Features Ŷ: Output labels Training Regimes Independent Ground: Groundtruth Input I fD fS fZ ŶD 1 ŶS * ŶZ

18 Training I fD fS fZ ŶD ŶS ŶZ CCM Training Regime
Later models can ignore the mistakes of previous models Training realistically emulates testing setup Allows disjoint datasets K-CCM: A CCM with K levels of classifiers I fD fS fZ ŶD 1 ŶS ŶZ

19 Experiments DS1 DS2 422 Images, fully labeled
Categorization, Detection, Multi-class Segmentation 5-fold cross validation DS2 1745 Images, disjoint labels Detection, Multi-class Segmentation, 3D Reconstruction 997 Train, 748 Test

20 CCM Results – DS1 CATEGORIES PEDESTRIAN CAR REGION LABELS MOTORBIKE
BOAT

21 CCM Results – DS2 Boats & Water Detection Car Person Bike Boat Sheep
Cow Depth INDEP 0.357 0.267 0.410 0.096 0.319 0.395 16.7m 2-CCM 0.364 0.272 0.212 0.289 0.415 15.4m Regions Tree Road Grass Water Sky Building FG INDEP 0.541 0.702 0.859 0.444 0.924 0.436 0.828 2-CCM 0.581 0.692 0.860 0.565 0.930 0.489 0.819 INDEP Pred. Road Pred. Water True Road 4946 251 True Water 1150 2144 Boats & Water 2-CCM Pred. Road Pred. Water True Road 4878 322 True Water 820 2730

22 Example Results INDEPENDENT CCM

23 Example Results Independent Objects Independent Regions CCM Objects
CCM Regions

24 CCM Summary The various subtasks of computer vision do indeed interact through context cues A simple framework can allow off-the-shelf, black-box methods to improve each other Can we train in more sophisticated ways? Downstream models re-train upstream ones Something like EM for missing labels Other applications

25 Thanks!


Download ppt "Cascaded Classification Models"

Similar presentations


Ads by Google