ICCV 2009 Tilke Judd, Krista Ehinger, Fr´edo Durand, Antonio Torralba
Introduction Database of eye tracking data Learning a model of saliency Applications Conclusion
Bottom-up control of selective attention − stimulus salience (defined by color, contrast and orientation) − Saliency map
Current saliency models do not accurately predict human fixations.
Top-down control of selective attention − Scene schema guides fixations (more likely to land on meaningful areas) − Task goals guides fixations to land on objects relevant to the task
The first is a large database of eye tracking experiments with labels and analysis Second is a supervised learning model of saliency which combines both bottom-up image based saliency cues and top-down image semantic dependent cues Goal : Predict where users look without the eye tracking hardware.
Data gathering protocol ◦ 1003 random images from Flickr and LabelMe and recorded eye tracking data from 15 users who free viewed these images. 779 landscape images and 228 portrait images.
Data gathering protocol Gaze tracking paths and fixation locations are recorded for each viewer
Data gathering protocol left. Saliency mapright. most salient 20 percent of the image Gaussia n filter
Analysis of dataset ◦ a strong bias for human fixations to be near the center of the image [19][23]
Analysis of dataset ◦ the performance of human saliency maps to predict eye fixations Ground truth fixations Saliency map as classifier
Analysis of dataset ◦ Object of interest and Size of regions of interest
Features used for machine learning ◦ Low-level features ex: color,orientation,intensity ◦ Mid-level features ex: horizon ◦ High-level features ex: face detector ◦ Center prior : distance to the center
Training sample selection ◦ 903 training images and 100 testing images ◦ 10 positively labeled pixels randomly from the top 20% salient locations 10 negatively labeled pixels from the bottom 70% salient locations Training ◦ used the liblinear support vector machine to train a model
Comparison of saliency maps
Performance on testing images 1. Outperforms than other model 2. Reaches 88% of the way to human performance 3. not benefit from the huge bias of fixations toward the center 4. the overall performance for the object detector model is low
Performance on testing samples (the average of the true positive and true negative rates) 1. performs only as well as chance for the other subsets of samples 2. the later model performs more robustly over all subsets of samples 3. people and cars performs better on the subsets with faces
Using eye tracking data to decide how to render a photograph with differing levels of detail. [4] D. DeCarlo and A. Santella. Stylization and abstraction of photographs. ACM Transactions on Graphics
Contributions ◦ Developed a largest eye tracking database of natural images and permits large-scale quantitative analysis of fixations points and gaze paths. ◦ Using machine learning to train a bottom-up, top-down model of saliency and outperforms several existing Models. future work ◦ understanding the impact of framing, cropping and scaling images on fixations.