The University of Texas at Austin Vision-Based Pedestrian Detection for Driving Assistance Marco Perez
Background The emergence of a set of vehicle capabilities centered around the notion of driver assistance. These involve sensor-based systems, which continuously evaluate the surroundings of a vehicle, displaying relevant information to the driver and sometimes taking control of the vehicle. The objective of these systems is to increase the safety, convenience and efficiency of driving. Different sensors may provide the “eyes” and “ears” of driver assistance systems: video cameras radar sensors laser scanners ultrasound devices Vision systems in the visible spectrum seem the most attractive solution. Video sensors (cameras) provide texture information at very fine angular resolution, allowing the high degree of discrimination necessary for object recognition (lanes, vehicles, pedestrians, traffic signs, traffic lights). The human visual system is the best example of what performance may be achieved with such sensors, if only the appropriate processing is added.
Background “Visual detection of pedestrians from a moving vehicle” The objective is to detect dangerous situations involving pedestrians ahead of time. A challenging problem for the following reasons: Pedestrians appear in highly cluttered/uncontrolled backgrounds. In order to obtain interesting foreground regions containing pedestrians, it is not possible to apply common background subtraction methods due to the moving camera. Wide range of appearances (body size, pose, clothing, light conditions). Sometimes pedestrians will be far away from the camera, appearing small in the image (at low resolution).
Key Paper #1 (Zhao & Thorpe, 2000) It runs in real-time. Employs a stereo vision system to provide range information for foreground/background segmentation. Only concerned about objects close to the vehicle. Hence, detected background objects are eliminated from the disparity map by range thresholding. Small regions are eliminated through size thresholding. The size range of a normal person is obtained from statistic data. Sub-images are converted to intensity gradient to encode shape information. Intensity gradient images are inputs of a trained (back-propagation) neural network for pedestrian recognition: 5318 training data: 1012 of pedestrians and 4306 of objects. Experiments performed on a large number of urban street scenes: Detection rate: 85.2% False alarm rate: 3.1%
Key Paper #2 (Gavrila, Giebel & Munder 2004) Template matching based on contour features to find candidate solutions. Shape matching based on Distance Transforms. A verification method based on a Radial Basis Function is used to dismiss false positives. Experimental results on pedestrian detection off-line and in real time.
Key Paper #3 (Bertozzi, Broggi, Fascioli, Tibaldi, Chapuis & Chausse, 2004) Recognizes pedestrians in different environments and localizes them with the use of a Kalman filter estimator configured as a tracker. Pedestrians are first recognized through the use of edge density and symmetry maps. The former information is passed on to the tracker module which reconstructs an interpretation of the pedestrian positions in the scene