Download presentation
Presentation is loading. Please wait.
Published byGavin Gaines Modified over 9 years ago
1
Su-A Kim 3 rd June 2014 Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Introduction ● ○ ○ ○ ○ Experiments ○ ○ ○ Methodology ○ ○ ○ ○ ○ ○ ○ ○ ○ ※ The slides excerpted parts of the author’s oral presentation at ICCV 2013.
2
Su-A Kim 3 rd June 2014 @CVLAB Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body Challenges for Hand? Labeling is difficult and tedious!
3
Su-A Kim 3 rd June 2014 @CVLAB Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body Method Labeling is difficult and tedious! Hierarchical Hybrid Forest Transductive Learning Semi-supervised Learning
4
Su-A Kim 3 rd June 2014 @CVLAB Generative Approach : use explicit hand models to recover the hand pose - optimization, 현재 hypothesis 를 최적화 하기 위해 앞 결과에 의존 Existing Approaches Oikonomidis et al. ICCV2011 De La Gorce et al. PAMI2010 Hamer et al. ICCV2009 Motion capture Ballan et al. ECCV 2012 Xu and Cheng ICCV 2013 Generative Approach : learn a mapping from visual features to the target parameter space, such as joint labels or joint coordinates(i.e. hand poses), from a labelled training dataset. - classification, regression,.... - each frame independent, error recovery Wang et al. SIGGRAPH2009 Stenger et al. IVC 2007 Keskin et al. ECCV2012
5
achieved great success in human body pose estimation. Efficient : real-time Accurate : frame-basis, not rely on tracking Require a large dataset to cover many poses Train on synthetic, test on real data Su-A Kim 3 rd June 2014 @CVLAB Discriminative Approach
6
Su-A Kim 3 rd June 2014 @CVLAB Hierarchical Hybrid Forest STR forest: Qa – View point classification quality (Information gain) Viewpoint Classification: Q a Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V To evaluate the classification performance of all the viewpoint labels in dataset
7
Hierarchical Hybrid Forest Su-A Kim 3 rd June 2014 @CVLAB STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Viewpoint Classification: Q a Finger joint Classification: Q P Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V To measure the performance of classifying individual patch
8
Hierarchical Hybrid Forest Su-A Kim 3 rd June 2014 @CVLAB STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Qv – Compactness of voting vectors (Determinant of covariance trace) Viewpoint Classification: Q a Finger joint Classification: Q P Pose Regression: Q V Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V
9
Hierarchical Hybrid Forest Su-A Kim 3 rd June 2014 @CVLAB STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Qv – Compactness of voting vectors (Determinant of covariance trace) (α,β) – Margin measures of view point labels and joint labels Viewpoint Classification: Q a Finger Joint Classification: Q P Pose Regression: Q V Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V Using all three terms together is slow.
10
Transductive Learning Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: labeled unlabeled Target space (Realistic data R) Realistic data R: »Captured from Primesense depth sensor »A small part of R, R l are labeled manually (unlabeled set R u ) Source space (Synthetic data S ) Synthetic data S: »Generated from an articulated hand model. All labeled.
11
Transductive Learning Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: Synthetic data S: »Generated from a articulated hand model, where |S| >> |R| Realistic data R: »Captured from Primesense depth sensor »A small part of R, R l are labeled manually (unlabeled set R u ) Source space (Synthetic data S ) Target space (Realistic data R)
12
Transductive Term Q t Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: Similar data-points in R l and S are paired(if separated by split function give penalty) Q t is the ratio of preserved association after a split Source space (Synthetic data S ) Target space (Realistic data R) Nearest neighbour
13
Semi-supervised Term Q u Su-A Kim 3 rd June 2014 @CVLAB Training data D = {R l, R u, S}: Similar data-points in R l and S are paired(if separated by split function give penalty) Q u evaluates the appearance similarities of all realistic patches R within a node Source space (Synthetic data S ) Target space (Realistic data R)
14
Kinematic Refinement Su-A Kim 3 rd June 2014 @CVLAB 1. 각 관절에 대하여 GMM 으로 voting, 두 모드의 가우시안 사 이의 euclidean 거리를 측정 2.High Confidence / Low Confidence 3.High Confidence -> query large joint position database choose the uncertain joint positions that are close to the result of the query.
15
Evaluation data: Three different testing sequences 1.Sequence A --- Single viewpoint(450 frames) 2.Sequence B --- Multiple viewpoints, with slow hand movements(1000 frames) 3.Sequence C --- Multiple viewpoints, with fast hand movements(240 frames) Training data: »Synthetic data(337.5K images) »Real data(81K images, <1.2K labeled) Experimental Settings Su-A Kim 3 rd June 2014 @CVLAB
16
Su-A Kim 3 rd June 2014 @CVLAB Self comparison experiment »This graph shows the joint classification accuracy of Sequence A. »Realistic and synthetic baselines produced similar accuracies. »Using the transductive term is better than simply augmented real and synthet ic data. »All terms together achieves the best results.
17
Su-A Kim 3 rd June 2014 @CVLAB
18
Su-A Kim 3 rd June 2014 @CVLAB Reference [1] Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture, CVPR, 2014 [2] A Survey on Transfer Learning, Transactions on knowledge and data engineering, 2010 [3] Motion Capture of Hands in Action using Discriminative Salient Points, ECCV, 2012
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.