Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real-Time Human Pose Recognition in Parts from Single Depth Image

Similar presentations


Presentation on theme: "Real-Time Human Pose Recognition in Parts from Single Depth Image"— Presentation transcript:

1 Real-Time Human Pose Recognition in Parts from Single Depth Image
Zihang Huang CS2310 seminar

2 References Sub paper Main paper
[1]Jamie Shotton , Toby Sharp , Alex Kipman , Andrew Fitzgibbon , Mark Finocchio , Andrew Blake , Mat Cook , Richard Moore, Real-time human pose recognition in parts from single depth images, Communications of the ACM, v.56 n.1, January 2013 Sub paper [2]J.Shotton,R.Girshick,A.Fitzgibbon,T.Sharp,M.Cook,M.Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake. Efficient human pose estimation from single depth images. PAMI, , 4 [3]Thomas B. Moeslund , Adrian Hilton , Volker Krüger, A survey of advances in vision-based human motion capture and analysis, Computer Vision and Image Understanding, v.104 n.2, p , November 2006  [doi> /j.cviu ] [4]Ronald Poppe, Vision-based human motion analysis: An overview, Computer Vision and Image Understanding, v.108 n.1-2, p.4-18, October, 2007  [doi> /j.cviu ]

3 1. Motivation 2. Approach 3. Experiments 4. Conclusion 5. Reference

4 What is articulated body pose estimation?
introduction What is articulated body pose estimation? Recovers the pose of an articulated body, which consists of joints and parts using image-based observations. Research in pose recognition has been on going for 20+ years. Many assumptions: multiple cameras, manual initialization, controlled/simple backgrounds

5 Application

6 Application Ubiquitous surveillance cameras in public places
More than 60M CCTV in China A person is monitored average 300 times/day in London Current CCTV system mainly record video stream without understanding human action and event in video Understanding human activity is important for intelligent surveillance system

7 Challenges Human pose is deformable Need a way to recognize many of these poses with different body shapes, scales, clothing types Decide how to identify parts of the body and how detailed our labeling should be Backgrounds, light levels, color, and texture invariance

8 Main Idea

9 Using conventional intensity cameras
Previous Approaches Using conventional intensity cameras Learn an initial pose => learn variations of that pose Estimate for locations of body segments from which to build the body Using Depth Cameras Build 3D models, divide the model into parts, then search parts of the body CPU expensive

10 This paper approach Two steps: 1. find body parts -different in depth 2. compute joint positions -random decision forests Large and varied dataset Both synthetic and motion captured data Object recognition approach Intermediate body part representation Pose estimation reduced to per-pixel classification Create scored proposals of body joints

11 Data Gathering Depth image benefits Color and texture invariance Good performance in low light level Accurate scaling estimation Background subtraction Reduce silhouette ambiguity

12 Data Types

13 Synthetic Data Goals: reality and variety Use of randomized rendering pipeline, which produces samples that are fully labeled and can be trained on Learning is used to provide invariance towards: camera position, body pose, body size and body shape Other slight variations are height, weight, mocap frame, camera noise, clothing, hairstyle, etc.

14 Real Data(Motion Capture Data)
Large database is built using motion capture and human actions related to target application (dancing, running, etc.) Expect classifier to generalize unseen poses Wide range of poses vs all possible combinations many redundant poses are discarded based on initial data and furthest neighbor clustering

15 The Rendering Pipeline
Necessary to account for pose variation and model variation Start with: base character and pose - transform on: rotation and translation, hair and clothing, weight and height variations, camera position and orientation, camera noise - add transformations to dataset

16 Different Renderings

17 System overview system overview. from a single input depth image, a per-pixel body part distribution is inferred. (Colors indicate the most likely part labels at each pixel and correspond in the joint proposals.) Local modes of this signal are estimated to give high-quality proposals for the 3D locations of body joints, even for multiple users. finally, the joint proposals are input to skeleton fitting, which outputs the 3D skeleton for each user.

18 Joint Position Proposal
Density estimation are depth invariant Depending on the application, inferred body parts can be pre-accumulated Mean shift algorithm finds modes efficiently Final joint estimation: sum of the pixel weights reaching their give modes multiple body parts over the same area can be merged to form a localized joint

19 Body Part Labeling Body parts are broken down into an intermediate body part representation of 31 body parts (object by parts) Observation: Parts should be small enough to localize different body joints Parts should be small in numbers such that no classifier space is wasted

20 Depth Image Features Feature are weak individually Solution: combine with a decision forest Solution is efficient: - one feature reads at most 3 image pixels - performs at most 5 arithmetic operations - can be implemented on GPU

21 Depth Image Features Can directly get real-time 3D body joints from Kinect by random forest algorithm

22 Number of decision trees

23 Experiments Test data: Set of 5000 synthesized depth images Real dataset of 8808 frames from more than 15 different subjects 28 depth image sequences ranging from short motion to full actions Parameters: 3 trees, depth of 20, 300k training images/tree, 2000 training example pixels/image, 2000 candidate features, 50 candidate thresholds/feature

24 results

25 Conclusions No temporal information Frame-by-frame
Local pose estimate of parts Each pixel & each body joint treated independently Very fast(super real-time estimation) Simple depth image features Decision forest classifier Limited compute budget

26 THANKS! Any questions?


Download ppt "Real-Time Human Pose Recognition in Parts from Single Depth Image"

Similar presentations


Ads by Google