Presented by Jason Moore Robust Tracking and Remapping of Eye Appearance with Passive Computer Vision Presented by Jason Moore
1. Introduction This paper discusses a single-camera iris-tracking and remapping approach based on passive computer vision. In other words: Using a camera to track the gaze of an individual specifically for the purpose of computer control.
Gaze Estimation Applications Ophthalmology Psychology Neurology Marketing and Advertising HCI Aids for the disabled
Pupil & Iris The pupil and iris are monitored for gaze estimation.
Gaze Estimation Techniques Categorized by: Degree of Intrusiveness Technology employed Active vs. Passive Cost Target application domains
Intrusive Techniques Require equipment to be put in physical contact with the user. Examples of equipment: Electrodes Contact lenses Head mounted devices
Nonintrusive Techniques Use cameras to capture images of the eye. Most commercial devices use IR light reflected by the eye. These systems are fairly accurate but require special and expensive hardware. Retain a degree of intrusiveness because of active light emission. Can perform poorly with bad lighting conditions or if the user is wearing glasses.
More IR problems Most IR-based systems require the user’s head to remain still. This limits the degree of usability. IR-based systems that do not require the user’s head to remain still do not yield great accuracy.
Active Vs. Passive Active approaches rely on light emission to track the eye. Passive approaches rely only on natural light. Use off-the-shelf hardware to perform iris localization and tracking. Iris is ideal for tracking due to its perfectly circular shape and contrast to sclera.
Gaze Tracking for HCI Is Difficult The remapping transformation of pupil position to the computer screen is time dependent and changes whenever the user moves his head. Although difficult, it is an interesting concept for its potential social and commercial impact.
2. Iris Tracking Composed of three states: Iris localization Iris candidates are selected and passed to the tracing state. Iris Tracing The iris is searched for. If it is found, wait for the next frame, otherwise go back to iris localization. If the eye is closed, go to Wait state. Wait This state is for both voluntary and involuntary eye blinks.
Iris Tracking
Iris Localization In this state, the current frame is analyzed to generate some initial guesses on the position of the iris. The image is filtered to enhance the contrast between the iris and the sclera. The potential iris locations are selected based on image intensity.
Iris Localization One point of interest has been identified on the x-axis. Two points of interest have been identified on the y-axis. Both hypotheses will be passed to the tracing state.
Iris Tracing Before considering the hypotheses presented by the Iris Localization state, search for the iris based on its last location. If the iris is not found near the last position, then consider the hypotheses presented by the Iris Localization state.
Iris Tracing The estimated iris position after initial failure to find the iris based on its previous location.
The RANSAC Algorithm RANSAC stands for Random Sample Consensus. A popular algorithm for model selection in a data set containing both inliers and outliers. Here the RANSAC algorithm has been used to fit a line to a set of data points regardless of the large number of outliers.
C-RANSAC RANSAC modified to have more knowledge about the tracing task. This knowledge concerns the range of possible ellipse dimensions. The left image shows the failure of standard RANSAC to find the iris properly. The image on the right shows the success of C-RANSAC.
C-RANSAC Left image shows success in finding the iris while wearing glasses. Right image shows success in low light conditions. Left image shows success in finding the iris while the eyebrow is in the interest window. Right image shows success when the iris is in a lateral position.
Eye Blink Detection Blinking is detected by a vertical shift in the cumulative histogram. The eyelashes have a similar level of intensity as the iris, but are not in the same vertical position.
3. Remapping Remapping iris position to screen position is somewhat math intensive. Involves an initial calibration phase. Takes head motion into consideration.
4. Experimental Results Hardware: Digital camera with 12x optical zoom and 640x480 image resolution. 19” computer screen with a resolution of 1024x768 Standard laptop with 1.73GHz processor.
User / Hardware Positions
Tracking Results 594 frames (approx. 23.7s @ 25fps) were recorded with the user looking at several different points on the screen. Iris position and shape were manually annotated. Manual annotations then compared to results from RANSAC and C-RANSAC.
RANSAC vs. Ground Truth -Tracking accuracy for the y-coordinate of the ellipse center. - Notice the two spikes due to ocular occlusions.
C-RANSAC vs. Ground Truth - Tracking accuracy for the y-coordinate of the ellipse center. - C-RANSAC does not spike when the eye is occluded.
RANSAC vs. C-RANSAC: Y - RANSAC experiences more error on the y-axis than C-RANSAC.
RANSAC vs. C-RANSAC: X - RANSAC and C-RANSAC do not differ much on their x-axis error.
RANSAC vs. C-RANSAC - The distributions are similar, but C-RANSAC appears to have less error when attempting to find the y-position of the iris’ center.
Calibration User looks at eight points on the screen, staring at each point for four seconds. The points are arranged as follows: four at the corners of the screen, and four (with a rhomboidal layout) at its center. One hundred measurements are collected for the iris center for each calibration point.
Compensated vs. Uncompensated Head Movement Circle/Red represents uncompensated movement. Square/Green represents compensated movement. The compensated clusters are more compact and more reasonably arranged.
Remapping Calibration Clusters to the Screen - The crosses represent the center of the calibration circles.
Remapping Without Feedback Twenty random screen points denoted by crosses. Results obtained shortly after calibration represented by circles. Results obtained ten minutes after calibration represented by stars.
Remapping With Feedback - Results obtained with head compensation represented by circles. Results obtained without head compensation represented by stars. Lack of head compensation is actually better!
Line Tracing With Visual Feedback
Conclusion A robust, single-camera, real-time eye-tracking algorithm is presented. An eye blink detector works equally well for both voluntary and involuntary eye closures. A constrained RANSAC approach for iris tracking is proposed that performs better than standard RANSAC in the presence of distracters and occlusions in the image sequence. The on-screen remapping method is capable of compensating for small head movements. Experiments outlined the importance of providing visual feedback to the user and the benefit gained from performing head compensation, especially during image-to-screen map calibration.
Future Work Improve further the image to screen mapping model, by taking explicitly into account the spherical shape of the eyeball. Relax the “neutral expression” constraint set for head compensation. Generalize the approach to passive interaction surfaces such as books, newspapers, and paintings. Extend the framework to the problem of determining the 3D coordinates of a location pointed at in space.