Urban Scene Analysis James Elder & Patrick Denis York University
2 Phase IV Objectives Single-View 3D Reconstruction Scene Dynamics Scene Segmentation and Labelling
Single-View Reconstruction
4
5 Ultimate Goal Our ultimate goal is to automate this process!
6 Immediate Goal Automatic estimation of the three vanishing points corresponding to the “Manhattan directions”.
7 Manhattan Frame Geometry An edge is aligned to a vanishing point if the interpretation plane normal is orthogonal to the vanishing point vector in the Gaussian Sphere (i.e. dot product is 0)
8 Mixture Model Each edge E ij in the image is generated by one of four possible kinds of scene structure: –m 1-3 : a line in one of the three Manhattan directions –m 4 : non-Manhattan structure The observable properties of each edge E ij are: –position –angle The likelihoods of these observations are co-determined by: –The causal process (m 1-4 ) –The rotation Ψ of the Manhattan frame relative to the camera mimi mimi mimi mimi E 11 E 12 E 22 E 21 Ψ Image
9 Mixture Model Our goal is to estimate the Manhattan frame Ψ from the observable data E ij. mimi mimi mimi mimi E 11 E 12 E 22 E 21 Ψ Image
10 E-M Algorithm M Step –Given estimates of the mixture probabilities for each edge, update our estimate of the Manhattan coordinate frame
11
12 Design Criteria Accuracy Speed
13 Design Decisions Features –Dense gradient map –Sparse sub-pixel localized edges Measurement Space –Image –Gauss Sphere Search Method –Coarse-to-Fine (Coughlan & Yuille 2001) –Quasi-Newton –EM –Quasi-EM
14
15
16
17
18
19 Speed Method Time (sec) MW Edge-Based Coarse-to-Fine MW Params Edge-Based Newton MW Params Edge-Based Newton Edge-Based EM Edge-Based Quasi-EM Edge-Based Quasi-EM GS
20
21 Single-View Reconstruction Potential Research Objectives for Phase IV –Recover connected Manhattan cuboids Connected, labelled line segments Connected, labelled rectangular facets –Estimate scale factor From pedestrian, vehicle traffic From building features whose size is approximately known (e.g., doors) –Integrate with other data sources Existing 3D models on coarser scale 3D models from cameras with overlapping fields of view
22 Projects: Pre-Attentive and Attentive Sensing FOVEAL IMAGE WIDE-FIELD IMAGE PAN TILT
23 Statistical Integration of Weak Cues Motion Region Log Likelihood Ratio Foreground Region Log Likelihood Ratio Skin Region Log Likelihood Ratio Joint Region Log Likelihood Ratio
24 Attentive Feedback Loop random sampler posterior prior non-max suppression gaze command likelihood gaze control Attentive sensor high-resolution face detection confirmed face location motion kernel motion kernel mean body indicator spatial prior
25 Wide-Field Person Detection
26 Attentive High-Res Video Surveillance
27 Attentive Snapshots
28 Automatically Confirmed High-Resolution Faces
29 Pose-Invariant Face Recognition (with Simon Prince, UCL)
30 Projects: 3D Facial Estimation and Modelling
31 Scene Dynamics Potential Research Objectives for Phase IV –Person re-identification –Individuation (counting) in crowds
32 Using Prior Knowledge: Example
33 Experimental Results
34 Experimental Results SS vs. MSRC vs. MSEJ vs. MS Mean relative errors Relative Error
35 Scene Segmentation Potential Research Objectives for Phase IV –Application to urban scenes Scene layout –Ground plane –Buildings –Vegetation –Sky Material recognition Integrated text recognition