Sun Yat-sen University

Slides:



Advertisements
Similar presentations
Evidential modeling for pose estimation Fabio Cuzzolin, Ruggero Frezza Computer Science Department UCLA.
Advertisements

Face Alignment with Part-Based Modeling
Structured Hough Voting for Vision-based Highway Border Detection
Patch to the Future: Unsupervised Visual Prediction
Real-Time Accurate Stereo Matching using Modified Two-Pass Aggregation and Winner- Take-All Guided Dynamic Programming Xuefeng Chang, Zhong Zhou, Yingjie.
1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.
Outline  Facial Attributes Analysis  Animated Pose Templates(APT) for Modeling and Detecting Human Actions  Unsupervised Structure Learning of Stochastic.
Real Time Motion Capture Using a Single Time-Of-Flight Camera
Robust Object Tracking via Sparsity-based Collaborative Model
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
Interactive Matting Christoph Rhemann Supervised by: Margrit Gelautz and Carsten Rother.
Machine Learning and having it deep and structured
Human tracking and counting using the KINECT range sensor based on Adaboost and Kalman Filter ISVC 2013.
Face Alignment Using Cascaded Boosted Regression Active Shape Models
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
1 University of Texas at Austin Machine Learning Group 图像与视频处理 计算机学院 Motion Detection and Estimation.
IIIT Hyderabad Learning Semantic Interaction among Graspable Objects Swagatika Panda, A.H. Abdul Hafez, C.V. Jawahar Center for Visual Information Technology,
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Skeleton Based Action Recognition with Convolutional Neural Network
Gaussian Conditional Random Field Network for Semantic Segmentation
NLP&CC 2012 报告人:许灿辉 单 位:北京大学计算机科学技术研究所 Integration of Text Information and Graphic Composite for PDF Document Analysis 基于复合图文整合的 PDF 文档分析 Integration of.
Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,
CNN-RNN: A Unified Framework for Multi-label Image Classification
Demo.
WP3 INERTIA Local Control and Automation Hub
Jacob R. Lorch Microsoft Research
Guillaume-Alexandre Bilodeau
Summary of “Efficient Deep Learning for Stereo Matching”
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
C. Canton1, J.R. Casas1, A.M.Tekalp2, M.Pardàs1
学术报告 欢迎各位师生参加! 报告题目: Efficient Context-aware Scene Labeling with
Utilizing AI & GPUs to Build Cloud-based Real-Time Video Event Detection Solutions Zvika Ashani CTO.
Compositional Human Pose Regression
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Synthesis of X-ray Projections via Deep Learning
Recovery from Occlusion in Deep Feature Space for Face Recognition
Real-Time Human Pose Recognition in Parts from Single Depth Image
Efficient Deep Model for Monocular Road Segmentation
Unsupervised Face Alignment by Robust Nonrigid Mapping
Deep Reconfigurable Models with Radius-Margin Bound for
Adversarially Tuned Scene Generation
Video Summarization via Determinantal Point Processes (DPP)
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Learning to See in the Dark
Two-Stream Convolutional Networks for Action Recognition in Videos
Xiaodan Liang Sun Yat-Sen University
Oral presentation for ACM International Conference on Multimedia, 2014
Introduction to Text Generation
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Image to Image Translation using GANs
Outline Background Motivation Proposed Model Experimental Results
Good View Hunting: Learning Photo Composition from Dense View Pairs Zijun Wei1, Jianming Zhang2, Xiaohui Shen2, Zhe Lin2, Radomír Měch2, Minh Hoai1, Dimitris.
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Introduction to Object Tracking
Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
Heterogeneous convolutional neural networks for visual recognition
Human-object interaction
Presented by: Anurag Paul
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Learning and Memorization
Deep Structured Scene Parsing by Learning with Image Descriptions
Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Sun Yat-sen University Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning Keze Wang kezewang@gmail.com Sun Yat-sen University 2016-10-19 尊敬的各位评委老师,大家下午好!我是中山大学林倞。我汇报的题目是视觉计算与模式分析。

Outline Background Motivation Framework Results Conclusion 首先简要介绍下个人基本情况。

Outline Background Motivation Framework Results Conclusion 首先简要介绍下个人基本情况。

Background Objective: Estimate human pose from a still depth image.

Challenging Issues Unusual poses/views: Current solvers often fail in handling exceptional and unusual cases. This is due to the bias of training data and errors introduced by sensor noise. The user may appear in diverse views or poses in real applications

Challenging Issues Self-occlusions: The depth data captured from one single sensor inevitably contains body part occlusions, especially in playing complex gestures.

Outline Background Motivation Framework Results Conclusion 首先简要介绍下个人基本情况。

Motivation To handle unusual pose/views and self-occlusions, Explicitly represent the human pose in a coarse-to-fine manner Constrain the human pose under 3D kinematic tree structure. To achieve towards real-time performance Incorporates the dynamic programming inference into the deep neural network architecture.

Outline Background Motivation Framework Results Conclusion 首先简要介绍下个人基本情况。

Framework: two cascaded tasks First Task: generating human part proposals Second Task: optimal configuration inference

Body Part Proposal Generation Our model generates body part proposals by design a fully convolutional network (FCN), which produces a dense heat map output for each body part. Generate body part proposal

Optimal Configuration Inference We propose to model the human pose by the sum of unary and pairwise terms according to the 3D kinematic tree structure: Appearance Compatibility Contextual Geometry Relationship

Tailored MatchNet Part Templates: Offline Clustered via the ground truth Figure 4: The illustration of the inference built-in MatchNet for a fast part configuration inference.

Inference Built-in MatchNet Fast Inference via dynamic programming Head Figure 4: The illustration of the inference built-in MatchNet for a fast part configuration inference. Neck L-Shoulder R-Shoulder …

Multi-task Learning We employ the widely used batch Stochastic Gradient Descent (SGD) algorithm to train FCN and tailored MatchNet, respectively. We employ a latent learning algorithm extended from the CCCP framework.

Multi-task Learning

Outline Background Motivation Framework Results Conclusion 首先简要介绍下个人基本情况。

Dataset Description Kinect2 Human Gesture Dataset (K2HGD): 100K depth images with various human poses under challenging scenarios. 19 body joints 30 subjects 10 challenging scenes. Evaluation Metric: Percent of Detected Joints (PDJ)

Experimental Comparisons Quantitative Comparison with state-of-the-art approaches Efficiency Comparison on intel 3.4GHz CPU + NVIDIA TITAN X GPU

Experimental Comparisons Qualitative Comparison with state-of-the-art approaches

Visual Comparison with commercial products

Experimental Comparisons Component Analysis

Outline Background Motivation Framework Results Conclusion 首先简要介绍下个人基本情况。

Conclusions We proposed a novel deep inference-embedded multi-task learning framework for predicting human pose from depth data. Detecting a batch of body part proposals via a fully convolutional network (FCN); Searching for the optimal configuration of body parts based on the body part proposals via a fast inference step (i.e., dynamic programming) We developed a inference built-in MatchNet to incorporate the single term of appearance cost and the pairwise 3D kinematic constraint.

Any Questions?

Thank You!