Vehicle Segmentation and Tracking in the Presence of Occlusions Neeraj K. Kanhere Dr. Stanley T. Birchfield Dr. Wayne A. Sarasua Clemson University
Introduction Traffic parameters such as volume, speeds, turning counts and classification are fundamental for… Transportation planning Traffic impact of land use Transportation engineering applications (e.g. signal timing) Intelligent Transportation Systems (ITS)
Why computer vision? Different types of sensors can be used to gather data: Radar or laser based sensors Inductive loop detectors Video Camera (with Computer Vision techniques) No traffic disruption for installation and maintenance Covers wide area with a single camera Provides rich visual information for manual inspection
Why tracking? Current systems use localized detection within the detection zones which is prone to errors when camera placement in not ideal. Tracking enables prediction of a vehicle’s location in consecutive frames Can provide more accurate estimates of traffic volumes and speeds Potential to count turn-movements at intersections Detect traffic incidents
Related research Region/Contour Based Computationally efficient Good results when vehicles are well separated 3D Model Based Large number of vehicle models needed Limited experimental results Markov Random Field Good results on low angle sequences Accuracy drops by 50% when sequence is processed in true order Feature Tracking Based Handles partial occlusions Good accuracy when sufficient features are tracked from entry region to exit region
Factors to be considered High-angle Mid-angle Planar motion assumption Well-separated vehicles Relatively easy More depth variation Occlusions A difficult problem
Estimation of 3-D Location Overview of the approach Background model Offline Calibration Frame-Block #1 Estimation of 3-D Location Feature Tracking Frame-Block #2 Frame-Block #3 Grouping segmented #3 segmented #2 segmented #1 Counts, Speeds and Classification Block Correspondence and Post Processing
Background model and calibration ` Background model and calibration Adaptive time domain median filtering for background Calibration provides mapping from scene to image Use scene features to estimate correspondences Lane widths Truck heights Approximate calibration is good enough for counts
Processing a frame-block Overlap frames Block # n Block # n+1 Multiple frames are needed for motion information Tradeoff between number of features and amount of motion Typically 5-15 frames yield good results #features in block #frames in block
Frame differencing Partially occluded vehicles appear as single blob Effectively segments well-separated vehicles Goal is to get filled connected components
Estimation using single frame Box-model for vehicles Road projection using foreground mask Works for orthogonal surfaces camera vehicle Road plane
Selecting stable features Shadows, partial occlusions will result into wrong estimates Planar motion assumption is violated more for features higher up Select stable features, which are closer to road Use stable features to re-estimate world coordinates of other features
Estimation using motion Estimate coordinates with respect to each stable feature Choose coordinates which minimized weighted sum of euclidean distance and trajectory error Rigid body under translation Estimate coordinates with respect to each stable feature Select the coordinates minimizing weighted sum of Euclidean distance and trajectory error P : Feature with unknown coordinates Q : Stable feature R : Backprojection on road H : Backprojection at maximum height 0 : First frame of the block t : Last frame of the block Δ : Translation of corresponding point
Affinity matrix Each element represents the similarity between corresponding features Three quantities contribute to the affinity matrix Euclidean distance (AD), Trajectory Error (AE) and Background- Content (AB) Normalized Cut is used for segmentation (Shi, Malik) Number of Cuts is not known
Incremental normalized cuts We apply normalized cut to initial A with increasing number of cuts For each successive cut, segmented groups are analyzed till valid groups are found Valid group: meets dimensional criteria Elements corresponding to valid groups are removed from A and process repeated starting from single cut Avoids specifying a threshold for the number of cuts
Correspondence over blocks Formulated as a problem of finding maximum weight graph Nodes represent segmented groups Edge weights represent number features common over two blocks
Results
Results
Conclusion A novel approach based on feature point tracking Key part of the technique is estimation of 3-D coordinates Results demonstrate the ability to correctly segment vehicles even under severe partial occlusions Vehicle count, speeds and classification (car or heavy vehicle) data can be easily obtained for tracked vehicles Future Work Robust block-correspondence Tracking vehicles at intersections Automatic calibration by detecting lane markings Explicit shadow suppression
Questions ?
Thank you !