Presentation is loading. Please wait.

Presentation is loading. Please wait.

Su-A Kim 3 rd June 2014 Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Real-time Articulated Hand Pose Estimation using Semi-supervised.

Similar presentations


Presentation on theme: "Su-A Kim 3 rd June 2014 Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Real-time Articulated Hand Pose Estimation using Semi-supervised."— Presentation transcript:

1 Su-A Kim 3 rd June 2014 Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Introduction ● ○ ○ ○ ○ Experiments ○ ○ ○ Methodology ○ ○ ○ ○ ○ ○ ○ ○ ○ ※ The slides excerpted parts of the author’s oral presentation at ICCV 2013.

2 Su-A Kim 3 rd June 2014 @CVLAB Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body Challenges for Hand? Labeling is difficult and tedious!

3 Su-A Kim 3 rd June 2014 @CVLAB Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body Method Labeling is difficult and tedious! Hierarchical Hybrid Forest Transductive Learning Semi-supervised Learning

4 Su-A Kim 3 rd June 2014 @CVLAB Generative Approach : use explicit hand models to recover the hand pose - optimization, 현재 hypothesis 를 최적화 하기 위해 앞 결과에 의존 Existing Approaches Oikonomidis et al. ICCV2011 De La Gorce et al. PAMI2010 Hamer et al. ICCV2009 Motion capture Ballan et al. ECCV 2012 Xu and Cheng ICCV 2013 Generative Approach : learn a mapping from visual features to the target parameter space, such as joint labels or joint coordinates(i.e. hand poses), from a labelled training dataset. - classification, regression,.... - each frame independent, error recovery Wang et al. SIGGRAPH2009 Stenger et al. IVC 2007 Keskin et al. ECCV2012

5 achieved great success in human body pose estimation.  Efficient : real-time  Accurate : frame-basis, not rely on tracking  Require a large dataset to cover many poses  Train on synthetic, test on real data Su-A Kim 3 rd June 2014 @CVLAB Discriminative Approach

6 Su-A Kim 3 rd June 2014 @CVLAB Hierarchical Hybrid Forest STR forest: Qa – View point classification quality (Information gain) Viewpoint Classification: Q a Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V To evaluate the classification performance of all the viewpoint labels in dataset

7 Hierarchical Hybrid Forest Su-A Kim 3 rd June 2014 @CVLAB STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Viewpoint Classification: Q a Finger joint Classification: Q P Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V To measure the performance of classifying individual patch

8 Hierarchical Hybrid Forest Su-A Kim 3 rd June 2014 @CVLAB STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Qv – Compactness of voting vectors (Determinant of covariance trace) Viewpoint Classification: Q a Finger joint Classification: Q P Pose Regression: Q V Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V

9 Hierarchical Hybrid Forest Su-A Kim 3 rd June 2014 @CVLAB STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Qv – Compactness of voting vectors (Determinant of covariance trace) (α,β) – Margin measures of view point labels and joint labels Viewpoint Classification: Q a Finger Joint Classification: Q P Pose Regression: Q V Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V Using all three terms together is slow.

10 Transductive Learning Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: labeled unlabeled Target space (Realistic data R) Realistic data R: »Captured from Primesense depth sensor »A small part of R, R l are labeled manually (unlabeled set R u ) Source space (Synthetic data S ) Synthetic data S: »Generated from an articulated hand model. All labeled.

11 Transductive Learning Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: Synthetic data S: »Generated from a articulated hand model, where |S| >> |R| Realistic data R: »Captured from Primesense depth sensor »A small part of R, R l are labeled manually (unlabeled set R u ) Source space (Synthetic data S ) Target space (Realistic data R)

12 Transductive Term Q t Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: Similar data-points in R l and S are paired(if separated by split function give penalty) Q t is the ratio of preserved association after a split Source space (Synthetic data S ) Target space (Realistic data R) Nearest neighbour

13 Semi-supervised Term Q u Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: Similar data-points in R l and S are paired(if separated by split function give penalty) Q u evaluates the appearance similarities of all realistic patches R within a node Source space (Synthetic data S ) Target space (Realistic data R)

14 Kinematic Refinement Su-A Kim 3 rd June 2014 @CVLAB 1. 각 관절에 대하여 GMM 으로 voting, 두 모드의 가우시안 사 이의 euclidean 거리를 측정 2.High Confidence / Low Confidence 3.High Confidence -> query large joint position database choose the uncertain joint positions that are close to the result of the query.

15 Evaluation data: Three different testing sequences 1.Sequence A --- Single viewpoint(450 frames) 2.Sequence B --- Multiple viewpoints, with slow hand movements(1000 frames) 3.Sequence C --- Multiple viewpoints, with fast hand movements(240 frames) Training data: »Synthetic data(337.5K images) »Real data(81K images, <1.2K labeled) Experimental Settings Su-A Kim 3 rd June 2014 @CVLAB

16 Su-A Kim 3 rd June 2014 @CVLAB Self comparison experiment »This graph shows the joint classification accuracy of Sequence A. »Realistic and synthetic baselines produced similar accuracies. »Using the transductive term is better than simply augmented real and synthet ic data. »All terms together achieves the best results.

17 Su-A Kim 3 rd June 2014 @CVLAB

18 Su-A Kim 3 rd June 2014 @CVLAB Reference [1] Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture, CVPR, 2014 [2] A Survey on Transfer Learning, Transactions on knowledge and data engineering, 2010 [3] Motion Capture of Hands in Action using Discriminative Salient Points, ECCV, 2012


Download ppt "Su-A Kim 3 rd June 2014 Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Real-time Articulated Hand Pose Estimation using Semi-supervised."

Similar presentations


Ads by Google