UNIVERSITY OF MURCIA (SPAIN) ARTIFICIAL PERCEPTION AND PATTERN RECOGNITION GROUP Estimating 3D Facial Pose in Video with Just Three Points Ginés García Mateos, Alberto Ruiz García Dept. de Informática y Sistemas P.E. López-de-Teruel, A.L. Rodriguez, L. Fernández Dept. Ingeniería y Tecnología de Computadores University of Murcia - SPAIN
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Introduction (1/3) Main objective: to develop a new method to estimate the 3D pose of the head of a human user: –Estimation through a video sequence –Working with the minimum necessary information: a 2D location of the face –A very simple method, without training, running in real-time: fast processing –Under realistic conditions: robust to facial expressions, light, movements –Robustness preferred to accuracy
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Introduction (2/3) 3D pose estimation using 3D tracking… Active Appearance Model Shape & texture models Cylindrical Models 3D morphable mesh
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Introduction (3/3) In short, we want to obtain something like this: The result is 3D location (x, y, x), and 3D orientation (roll, pitch, yaw): 6 D.O.F.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Index of the presentation Overview of the proposed method –2D facial detection and location –2D face tracking 3D Facial pose estimation –3D Position –3D Orientation Experimental results Conclusions
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Overview of the Proposed Method The key idea: separate the problems of 2D tracking and 3D pose estimation. Introducing some assumptions and simplifications, pose is extracted with very little information. The proposed 3D pose estimator could use any 2D facial tracker 2D Face detection 2D Face tracking 3D Pose estimation
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Detection, Location and Tracking Using I.P. We use a method based on integral projections (I.P.), which is simple and fast. Definition of I.P.: average of gray levels of an image along rows and columns. i(x, y) PV i : [y min,..., y max ] → R Given by: PV i (y) := i(·, y) PH i : [x min,..., x max ] → R Given by: PH i (x) := i(x, ·)
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Detection with I.P. Global view of the I.P. face detector Input image PVface PHeyes Step 1. Vertical projections by strips Step 2. Horizontal projection of the candidates Step 3. Grouping of the candidates Final result
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Detection with I.P. To improve the results, we combine two face detectors: combined detector. Face Detector 1. Look for candidates Face Detector 2. Verify face candidates Final detection result Haar + AdaBoost [Viola and Jones, 2001] Integral Projections [Garcia et al, 2007]
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Detection with I.P. [Garcia et al, 2007]
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Location with I.P. Global view of the 2D face locator Input image and face Step 1. Orientation estimation Step 2. Vertical alignment Step 3. Horizontal alignment Final result MVface(y) y PV’face(y) PVface(y) y
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Location with I.P. Location accuracy of the 2D face locator Av. time PIV 2.6Gh 1,7 ms IntProjNeuralNetEigenFeat 323,6 ms 20,5 ms
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Tracking with I.P.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Face Tracking with I.P. Sample result of the proposed tracker. 320x240 pixels, 312 frames at 25fps, laptop webcam (e1x, e1y) = location of left eye; (e2x, e2y) = right eye; (mx, my) = location of the mouth
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Facial Pose Estimation In theory, 3 points should be enough to solve the 6 degrees-of-freedom (if focal length and face geometry are known). But… Location errors are high in the mouth for non-frontal faces. Some assumptions are introduced to avoid the effect of this error.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Facial Pose Estimation Fixed body assumption: fixed user’s body, moving the head 3D position is estimated in the first frame; 3D orientation in the following frames. A simple perspective projection model is used to estimate 3D position.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Position Estimation f: focal length (known) (cx,cy): tracked center of the face (0,0,0) p= (px,py,pz) cx= (e1x+e2x+mx)/3 cy= (e1y+e2y+my)/3 cx= (e1x+e2x+mx)/3 cy= (e1y+e2y+my)/3
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, D Position Estimation We have: cx/f = px/pz ; cy/f = py/pz Where: cx= (e1x+e2x+mx)/3; cy= (e1y+e2y+my)/3 So: px= (e1x+e2x+mx)/3·pz/f py= (e1y+e2y+my)/3·pz/f The depth of the face, pz, is computed with: pz= f·t/r, where r is the apparent face size* and t is the real size. * For more information, see the paper..
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Estimation of Roll Angle Roll angle can be approximately associated with the 2D rotation of the face in the image. roll = arctan e2y − e1y e2x − e1x This equation is valid in most practical situations, but it is not precise in all cases. roll = -43,7ºroll = -2,8ºroll = 15,9ºroll = 34,6º
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Estimation of Pitch and Yaw The head-neck system can be modeled as a robotic arm, with 3 rotational DOF. Y Z X roll pitch yaw X Y b b c Z X Y b b a ORTHOGRAPHIC VIEW TOP VIEW FRONT VIEW Z i In this model, any point of the head lies in a sphere its projection is related to pitch and yaw. Y X i (dx0,dy0) (dxt,dyt) r i
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Estimation of Pitch and Yaw r w : radius of the sphere where the center of the eyes lies. r i : radius of the circle where that sphere is projected. (dx0, dy0): initial center of eyes. (dxt, dyt): current center of eyes Y i X i (dx0,dy0) r i r w = sqrt(a 2 +c 2 ) r i = r w ·f/pz ((e1x+e2x)/2, (e1y+e2y)/2) Y i X i (dx0,dy0) (dx1,dy1) r i Y i X i (dx0,dy0) (dx2,dy2) r i Initial frame pitch= 0, yaw= 0 Instant t = 1 Instant t = 2
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Estimation of Pitch and Yaw In essence, we have a problem of computing altitude and latitude for a given point in a circle. The center of the circle is: (dx0, dy0 − a·f/pz) So we have: pitch = arcsin And: yaw = arcsin dyt − (dy0 − a · f/pz) riri - arcsin a/c dxt − dx0 r i · cos(pitch + arcsin(a/c)) Y X i (dx0,dy0) (dxt,dyt) r i
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Experimental Results (1/7) Experiments carried out: –Off-the-shelf webcams. –Different individuals. –Variations in facial expressions and facial elements (glasses). Studies of robustness, efficiency, comparison with a projection-based 3D estimation algorithm. In a Pentium IV at 2.6Gh: ~5 ms file reading, ~3 ms tracking, ~0.006 ms pose estimation
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Experimental Results (2/7) Sample input video: bego.a.avi 320x240 pixels, 312 frames at 25fps, laptop webcam
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Experimental Results (3/7) 3D pose estimation results 320x240 pixels, 312 frames at 25fps, laptop webcam
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Experimental Results (4/7) Pitch Proposed method Projection-based Proposed method Projection-based
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Experimental Results (5/7) Range of working angles… Approx. ±20º in pitch and ±40º in yaw. The 2D tracker is not explicitly prepared for profile faces!
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Experimental Results (6/7) With glasses and without glasses
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Experimental Results (7/7) When fixed-body assumption does not hold Body/shoulder tracking could be used to compensate body movement.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Conclusions (1/3) Our purpose was to design a fast, robust, generic and approximate 3D pose estimation method: –Separation of 2D tracking and 3D pose. –Fixed-body assumption. –Robotic head model. 3D position is computed in the first frame. 3D orientation is estimated in the rest of frames. Estimation process is very simple, and avoids inaccuracies in the 2D tracker.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Conclusions (2/3) Future work: using the 3D pose estimator in a perceptual interface.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Conclusions (3/3) The simplifications introduced lead to several limitations of our system, but in general… Human anatomy of the head/neck system could be used in 3D face trackers. The human head cannot move independently of the body! Taking advantage of these anatomical limitations could simplify and improve current trackers.
ESTIMATING 3D FACIAL POSE IN VIDEO WITH JUST THREE POINTS G. García A. Ruiz P.E. López A.L. Rodríguez L. Fernández 3DFP’2008 ANCHORAGE JUNE, Last This work has been supported by the project Consolider Ingenio-2010 CSD , and TIN C Sample videos: Grupo PARP web page: Thank you very much