Cascaded Models for Articulated Pose Estimation

Slides:

Advertisements

Similar presentations

1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.

Advertisements

Presenter: Duan Tran (Part of slides are from Pedro’s)

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.

- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.

Articulated Pose Estimation with Flexible Mixtures of Parts

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

2D Human Pose Estimation in TV Shows Vittorio Ferrari Manuel Marin Andrew Zisserman Dagstuhl Seminar July 2008.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Detecting Pedestrians by Learning Shapelet Features

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Parsing Human Motion with Stretchable Models

A Study of Approaches for Object Recognition

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.

Features-based Object Recognition Pierre Moreels California Institute of Technology Thesis defense, Sept. 24, 2007.

Lecture 17: Parts-based models and context CS6670: Computer Vision Noah Snavely.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Online Learning Algorithms

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

Generic object detection with deformable part-based models

A Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang BoHao Jiang Institute of Automation, CAS Boston College.

Face Alignment Using Cascaded Boosted Regression Active Shape Models

Efficient Algorithms for Matching Pedro Felzenszwalb Trevor Darrell Yann LeCun Alex Berg.

A coarse-to-fine approach for fast deformable object detection Marco Pedersoli Andrea Vedaldi Jordi Gonzàlez.

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Professor: S. J. Wang Student : Y. S. Wang

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

Face detection Slides adapted Grauman & Liebe’s tutorial

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

More sliding window detection: Discriminative part-based models

Tracking Hands with Distance Transforms Dave Bargeron Noah Snavely.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Modeling Perspective Effects in Photographic Composition Zihan Zhou, Siqiong He, Jia Li, and James Z. Wang The Pennsylvania State University.

Summary of “Efficient Deep Learning for Stereo Matching”

Object detection with deformable part-based models

Recognizing Deformable Shapes

Object detection as supervised classification

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

Introduction to Object Tracking

Lecture 29: Face Detection Revisited

Presentation transcript:

Cascaded Models for Articulated Pose Estimation … Ben Sapp, Alexander Toshev and Ben Taskar University of Pennsylvania

Human Pose Estimation Goal: Image -> Stick Figure 2D locations of anatomical parts from a single image input output We’re interested in human pose estimation. The goal is to take a single, monocular image as input, and recover the locations of body parts as output efficient inference

Human Pose Estimation: It’s Hard pose variation intrinsic scale variations lighting variation I shouldn’t have to convince you that this is a hard problem The individual parts are very difficult to detect – especially lower arms – and the joint configuration of all parts is highly variable background clutter foreshortening

Articulated Pose and Pictorial Structures A popular choice for (articulated) parts-based models A non-exhaustive timeline Ramanan Learning to Parse Images of Articulated Objects Felzenszwalb et al. A Discriminatively Trained, Multiscale, Deformable Part Model. Fischler & Elschlager The representation and matching of pictorial structures Eichner & Ferrari Better Appearance Models for Pictorial Structures Pictorial structures is one of the primary tools use to tackle this problem We show here a non-exhaustive timeline. We can see that in the past 5 years, pictorial structures has been extremely popular. it’s been applied to both rigid and articulated objects. 1972 2005 2006 2008 2009 2010 Felzenszwalb & Huttenlocher PS for Object Recognition Fergus et al. ICCV Short Course Ferrari et al. Progressive Search Space Reduction… Andriluka et al. Pictorial Structures Re-visited Sapp et al. Adaptive Pose Priors for PS

Background: How PS works head : location for part i luarm ruarm torso llarm rlarm ω unary score: detection maps part detectors max-product inference sum-product inference prediction let’s review how pictorial structures works Pictorial Structures models each part’s location in the image as a variable. The joint configured of all parts is described as a log-linear combination of unary and pairwise terms. The unary terms can be thought of as individual part detectors, and express the affinity for the part being at any location and orientation in the image. The pairwise terms are a simple function of the geometric displacement between neighboring parts. This geometric cost is typically independent of the image, and hence refered to as a geometric prior. The PS scoring function can be optimized using standard inference techniques: using max-product inference, the most likely configuration of parts can be inferred - using sum-product inference, we can obtain marginal distributions of location for each part. y ω x pairwise score: geometric prior

Background: The Complexity of PS head llarm rlarm ruarm luarm torso state space for part i typical state space size: n > 150,000 states y ω x While the number of variables in this PS is small, the size of the state space for each variable is huge in a typical discretization used in human pose estimation, we have a state space of over 150,000 locations and orientations standard inference in this type of graphical model is n-squared. the bottleneck computation is that each state must be compared with at least a fraction of states from a neighboring apart. Again in a realistic setting this leads to about 1 billion state pairs for every pair of parts, which need to be checked both for feature generation and inference operations PART X PART = HUGE Standard inference in a tree graphical model is Typical # of valid combinations for two neighboring parts: (80 x 80 x 24)(80/5 x 80/5 x 24) ≈ 1 billion state-pairs! pairwise computation: x =

Background: The Complexity of PS If , efficient inference tricks can be used: [Felzenszwalb & Huttenlocher, 2005 ] Max-prod w/ unimodal cost: Distance transform for Sum-prod w/ linear filter cost: Convolution for + score for part-state pair: unary i unary j pairwise i,j State-of-the-art: As Felzenswalb and Huttenlocher showed in 2005, if the pairwise term is really just a simple function of the geometry, efficient inference tricks can be applied: We can achieve max-product inference in linear time using distance transforms if the geometric cost is unimodal We can achieve sum product inference in n log n using convolution when the cost can be represented as a linear filter This tremendous speed-up has made pictorial structures practical, and all state of the art systems uses this restriction/trick. In summary, the current PS is efficient if it simply pieces together individual part detector scores with geometric consistency But are we paying too much in expressivity and accuracy for this gain in efficiency? %%%Our work shows that we can have both expressivity and efficiency. Q: Are we losing too much in expressivity for this gain in efficiency?

Goal: Integrating richer pairwise terms incorporate image evidence For example, we’d like to incorporate image evidence into pairwise terms, which is not possible in the standard PS model. A simple and intuitive cue along these lines is the distance in color distribution between neighboring parts ( , ) e.g., distance in color distribution:

Computation example ( , ) color histogram χ2 distance computation between all pairs of part hypotheses ( , ) 20 Let’s try to compute this cue and see how long it takes. At a very coarse spatial grid, we need to compute 3.7 million histogram distances, which takes under a second. cpu time 20 <1 second 3.7 million comparisons 20x20 grid 24 angles

Computation example ( , ) color histogram χ2 distance computation between all pairs of part hypotheses ( , ) 40 As we scale up to finer resolution, the processing time gets considerably slower. cpu time 40 20 seconds <1 second 59 million comparisons 20x20 grid 24 angles 40x40 grid 24 angles

Computation example ( , ) color histogram χ2 distance computation between all pairs of part hypotheses ( , ) 5 minutes 80 At the standard resolution, this simple feature computation takes 5 minutes. cpu time 80 20 seconds <1 second 1 billion comparisons 20x20 grid 24 angles 40x40 grid 24 angles 80x80 grid 24 angles

Computation example ( , ) 2 hours! color histogram χ2 distance computation between all pairs of part hypotheses ( , ) 5 minutes 80 And if we want to scale up beyond the standard PS representation to model a scale for each part, this is prohibitively expensive cpu time 80 20 seconds <1 second 20x20 grid 24 angles 40x40 grid 24 angles 80x80 grid 24 angles 80x80 grid 24 angles 3 scales

Computation example ( , ) 375 GB! color histogram χ2 distance computation between all pairs of part hypotheses ( , ) 18 GB storage space 80 similarly, if you wanted to store this feature for learning or anaylysis, you would have to buy a lot of hard drives 80 1 GB 70MB 20x20 grid 24 angles 40x40 grid 24 angles 80x80 grid 24 angles 80x80 grid 24 angles 3 scales

Exhaustive Inference Clearly, exhaustive inference is not going to work.

Our Contribution: Coarse-to-Fine Structured Inference Our solution is to focus inference on promising states. We do this by learning a coarse-to-fine cascade of structured models, built on 2 principles: First, we want to avoid pruning the correct answer Second, at the same time, we want to prune unpromising states as early as possible. This enables us to use richer features and beat the state-of-the-art in terms of accuracy and efficiency. … Focus inference on promising states Learning a cascade of structured models Safety: no groundtruth left behind Efficiency: prune wrong states early Enables richer features and better results

Inspiration: Cascade of Classifiers [Viola & Jones 2001, Fleuret & Geman 2000] simple model complex model level 1 level 2 level 3 … level N face For inspiration, let’s turn to one of the most successful pruning strategies in computer vision: a cascade of classifiers This cascade throws out easy-to-reject portions of the image with only a few feature computations, and focuses more effort areas of the image that are harder to disambiguate . This works well for binary classificaiton, but how do we generalize this to parts-based models? reject reject reject reject Throw out most states with only a few computations Focuses more computational effort on harder areas How do we generalize this to parts-based models?

Generalizing Classifier Cascades Naïve solution: filter states based on part detector scores Independent cascades for each part richer ps model on reduced state space head … The simplest extension would be to have a cascade of classifiers for each part. Each part’s state space is pruned individually, and the reduced state spaces can be combined in the end into a more expressive PS model to make a final prediction. torso … head luarm … luarm ruarm predict ruarm … llarm … llarm rlarm rlarm …

Generalizing Classifier Cascades Naïve solution: filter states based on part detector scores … predict head torso luarm ruarm llarm rlarm Let’s see an example of this in action. on the bottom is a detector heatmap of the image, showing the likelihood of the left lower arm at every location and orientation If we prune this heatmap down to a reasonable number of states, we obtain the following. In the right picture, each state is represented as a joint location and direction vector We are left with lower arm states all over the image, and in fact, we miss the correct hypothesis left elbow joint detector heatmap misses correct elbow joint location+direction prune to 800 states

Generalizing Classifier Cascades Naïve solution: filter states based on part detector scores result … predict head torso luarm ruarm llarm rlarm The fundamental problem with this approach is that it only takes local scores into account and typically misses correct locations with weak signal. We instead want information from other parts to help. For example, a strong belief in an upper arm location should save a lower arm hypothesis that is otherwise pruned Scores locally, prunes locally Want information from other parts to help

Generalizing Classifier Cascades (Our Take) Better: Prune based on a cascade of pictorial structures ps model 1 ps model 2 ps model 3 ps model N … predict coarse fine state space resolution This motivates our approach: a cascade of pictorial structures models We start with a coarse, efficiently computable state space. We compute a global scoring measure, which we explain next. Then, we prune away low scoring states We then refine the resolution; and repeat the process with a refined model 10x10x12 80x80x24 Score globally, prune locally 0. Start with a coarse, efficiently computable state space 1. Compute global scoring measure (to be explained) 2. Throw away low scoring states 3. Refine resolution, refine model

Generalizing Classifier Cascades Better solution: prune based on a cascade of pictorial structures … predict left lower arm (elbow) left upper arm (shoulder) torso top 10x10 (before pruning) 10x10 20x20 40x40 80x80 An illustration of our approach is as follows We start with a coarse grid with an exhaustive set of locations and angles We then prune. The torso and upper arm are much easier to detect than the lower arm, so more of these states get pruned initially We the refine and prune again, and repeat until we are at standard resolution Then we can make a prediction from the states we have left

Global vs. Local Pruning (elbow joint) Prune to 800 states from original 150K Global score pruning (ours) Naïve, local detector score pruning Pruning to 800 states in this fashion, we maintain the correct answer unpruned. By comparision, on the same image, the naïve local detector pruning eliminate the correct answer. correct answer kept correct answer pruned

Computing a Global Pruning Score Define: score of the most likely configuration of parts (a.k.a. MAP score a.k.a. Viterbi score): What about this state for the lower arm? Should it be pruned? Let’s look at how we can compute a global scoring score. First, we can run inference to obtain the global best configuration of all parts. From this we know that all these parts are likely, but it does not tell us anything about the left lower arm at this location, for example. To obtain a score for this hypothesis, let’s fix the lower arm here and re-run inference. We obtain a new score that is a global measure, and directly comparable to the max score or max-marginal scores of other parts. We denote this quantity the max-marginal score, and it is the key ingredient to our pruning. s★ = 27.85 max-marginal score for part i, location li: fix lower arm here; re-run inference s★llarm(x=20,y=80,ω=-π/2) = 24.76

Computing a Global Pruning Score max-marginal score si★(x,y,ω) s★ = 27.85 s★llarm (x,y,ω) = 24.76 s★llarm (x,y,ω) = 14.28 s★llarm (x,y,ω) = 7.10 … lower arm We can continue to get a score by placing the lower arm at all possible locations in the image And do the same for all parts at all locations. The important thing to remember is that this score is a global quantity, and all scores are on the same scale. s★ruarm (x,y,ω) = -3.67 s★ruarm (x,y,ω) = 11.61 s★ruarm (x,y,ω) = 17.89 s★ruarm (x,y,ω) = -8.21 … upperarm s★head (x,y,ω) = 0.85 s★head (x,y,ω) = 13.19 … s★head (x,y,ω) = 6.31 s★head (x,y,ω) = 25.55 head

… Max-Marginals si★ heatmap, lower arm si heatmap, lower arm s★llarm (x,y,ω) = 24.76 s★llarm (x,y,ω) = 14.28 … we can collect all scores for a single part and view it as a heatmap at the coarsest level. we can compare this to the original part detector even though the correct position is locally not very promising, the max-marginal of this location indicates it should not be pruned. %%%%we can compare this to the original part detector and see that the correct location is much less likely to be pruned si★ heatmap, lower arm si heatmap, lower arm correct location

Learning to Prune Goal: “No true pose left behind!” On training data: Max-marginals of groundtruth pose should be above average At test time: Max-marginals of groundtruth above average with high probability (see D. Weiss & B. Taskar, “Structured Prediction Cascades”, AISTATS’10 for details) Cascaded Learning: Each level is trained from the output states of the previous level We want to learn models optimized for the task of pruning The goal of the learning procedure is to never prune the groundtruth On training data, we formalize this by requiring that the groundtruth be above average At test time, we have a guarantee that max marginals of the groundtruth are above average with high probability. For details, please see my colleagues'’ paper from earlier this year. We learn the cascade one level at a time, from coarse to fine, with each successive level using the unpruned states of the previous one. max-marginal histogram low high true pose prune away keep

Learning One Cascade Level Let To formulate learning, let l^t denote the true pose for training example t Let s of l^t and image^t be the score of the true pose on example t let s star bar be the average max-marginal score we represent our unary and pairwise scores as linear combinations of features, where theta are wieghts, and phi are features then we can pose the learning problem as follows … the convex constraint requires that the score of the groundtruth is above the avrage max marginal score we can show that this also implies that the max-marginals of the truth are above average θ: parameters φ: features convex constraint: “score of the true pose should be above average” safe pruning: implies that max-marginals of true pose above average too Learning problem:

Stochastic Sub-gradient Learning features of true pose Pick a random training example : we solve the optim problem using a simple, stochastic sub gradient update: parameters are adjusted by the difference between the groundtruth features and the average features used in computing max-marginals it is interesting to compare this with the standard structured perceptron update which uses the difference between features of the groundtruth and highest scoring non-truth: the complexity of the update is essentially the same the difference is that the perceptron tries to separate the truth from 2nd best we just try to keep the truth near the top λ: regularization η: step size Average of features used in computing max-marginals O(n2) to obtain along with max-marginals themselves Compare with standard structured perceptron update: features of highest scoring non-truth

Recap: At test time refine resolution  prune below avg ✖ ✔ Now that we’ve specified all the details of the model, we can review the processing that takes place during test time We start with an exhaustive coarse state space, compute max-marginals, and prune all states below average. We are only showing the left lower arm here which has a lot of uncertainty, so not very much is pruned in this stage for this particular part We then refine resolution, compute max-marginals with a more refined model, and repeat ✔ ✖ prune below avg max marginals left elbow joint predict

Coarse-to-Fine Pruning Results On Buffy upper body pose dataset cascade level state space size % reduction in state space % true arms closely matched cumulative cascade cpu time* 10x10x12 - 100.0 1.1 s 1 10x10x24 52.5 76.6 1.5 s 3 20x20x24 95.6 72.3 2.6 s 5 40x40x24 98.3 70.5 3.6 s 7 80x80x24 99.7 68.4 5.2 s detector pruning 58.6 * additional time after computing unary scores We quantify how well our cascade model works in practice on the Buffy upper body pose dataset. In practice our cascade is 8 stages long, where we double one of the dimensions of the state space at each level. We can successfully prune down to less than 500 states per part. We also can prune using local detector pruning down to the same number of states. The cascade performs 10% better on matching lower arms while only taking 5 additional seconds. < 500 states left to deal with runs in 5 additional seconds outperforms naïve local pruning

Richer Features Still need to make a final prediction … predict Still need to make a final prediction Now we can afford to include rich features and a complex pairwise cost function, including: Texture Geometry Color Shape from Regions Shape from Contours

Features Texture | Geometry | Color | Shape:Regions | Shape:Contours Standard part detector cue - [Andriluka et al. 2009] HoG Adaboost lower arm detector + =

Features Texture | Geometry | Color | Shape:Regions | Shape:Contours Standard geometric cues: displacement in x, y and angle in part-relative coordinate frame. [Felzenszwalb & Huttenlocher, 2005]

Richer features Texture | Geometry | Color | Shape:Regions | Shape:Contours Unary: Image-adaptive skin and clothing color compatibility – [Eichner & Ferrari, 2009] Pairwise: Quantize color into 8 bins; compute histogram difference face color model torso color model ( , ) new

Richer features Texture | Geometry | Color | Shape:Regions | Shape:Contours Measure shape moments of superpixels supporting part hypothesis NCut oversegmentation good region support weak region support new

Richer features Texture | Geometry | Color | Shape:Regions | Shape:Contours Extract long contours from segment boundaries Assign limb-pair to single contour which aligns well to both. Use alignment score as feature Ncut Segmentation new

Experiments Challenging, real-world upper body human pose estimation datasets Buffy Stickmen v2.1 - from television ETHZ PASCAL Stickmen v1.0 – from flickr [Provided by the ETH Zurich CALVIN research lab: www.vision.ee.ethz.ch/~calvin/datasets.html] Buffy PASCAL

most challenging parts note: #’s here have changed since the talk to exactly match the publicly available reference implementation available at: http://vision.grasp.upenn.edu/video/ End-system results Buffy v2.1, PCP0.5 torso head upper arms lower total Andriluka et al. 2009 98.3 95.7 86.6 52.8 78.8 Eichner et al. 2009 98.7 97.9 82.8 59.8 80.3 Sapp et al. 2010 100 91.1 65.7 85.9 CPS (this paper) 99.6 91.9 64.5 85.2 most challenging parts * not counting part detectors & segmentation time Takes 10 minutes* Takes 1.5 minutes* PASCAL, PCP0.5 torso head upper arms lower total Eichner et al. 2009 97.2 88.6 73.8 41.5 69.3 Sapp et al. 2010 100 98.0 83.9 54.0 79.0 CPS (this paper) 99.2 81.5 53.9 78.3

Results: Us vs. Local Pruning % correctly matched arms on Buffy ours – cascade of PS … naïve approach - score locally, prune locally …

Results: Feature analysis

Results: Feature analysis

Results: Feature analysis % of correct parts geometry geometry +regions geometry +contours geometry +color all features new new new new

Summary A learned cascade of coarse-to-fine ps models “score globally, prune locally” Overcomes state space explosion to enable complex pairwise scores Can be naturally extended to: Higher-order cliques Richer state spaces (e.g., occlusion, scale) Full-frame, temporal modeling of human pose See our upcoming NIPS 2010 paper! D. Weiss, B. Sapp and B. Taskar. Tracking Complex Dynamics with Structured Prediction Cascades.

Code available soon at http://vision.grasp.upenn.edu/video Thanks! Code available soon at http://vision.grasp.upenn.edu/video