Download presentation
Presentation is loading. Please wait.
Published byGeorgia Barrett Modified over 9 years ago
1
Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012
2
Label body parts in depth map 2 Parallelizing the Training of the Kinect Body Parts Labeling Algorithm Parallelizing the Training of the Kinect Body Parts Labeling Algorithm Mihai Budiu, Jamie Shotton, Derek G. Murray, and Mark Finocchio Big Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16-17, 2011
3
Solution: Learn from Data 3 Classifier Training examples Machine learning
4
Big data 4 1M Training examples 300,000 pixels/image 100,000 features <2 20 tree nodes/tree 31 body parts 3 trees Dryad DryadLINQ Decision forest inference Classifier
5
Execution Application Data-Parallel Computation 5 Storage Language Parallel Databases Map- Reduce GFS BigTable Cosmos Azure HPC Dryad DryadLINQ Sawzall,FlumeJava Hadoop HDFS S3 Pig, Hive SQL≈SQLLINQSawzall, Java
6
Dryad = 2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 6
7
Virtualized 2-D Pipelines 7
8
8
9
9
10
10
11
Virtualized 2-D Pipelines 11 2D DAG multi-machine virtualized
12
Fault Tolerance 12
13
LINQ 13 Dryad => DryadLINQ
14
14 LINQ =.Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
15
DryadLINQ Data Model 15 Partition Collection.Net objects
16
Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 16 DryadLINQ = LINQ + Dryad C# collection results C# Vertex code Query plan (Dryad job) Data
17
Kinect Training Pipeline 17 Train one tree level NormalizationCompute histograms Preprocessing Overlay playersInject noiseAdd background Upload Data DistributeReplicate 20x
18
Partial tree Images Features split New partial tree Query plan for one tree layer 18 Parallelize on: Features Images Tree nodes
19
High cluster utilization 19 Time Machine
20
CONCLUSIONS 20
21
Huge Commercial Success 21
22
Tremendous Interest from Developers 22
23
Consumer Technologies Push The Envelope 23 Price: 6000$ Price: 150$
24
Unique Opportunity for Technology Transfer 24
25
I can finally explain to my son what I do for a living… 25
26
BACKUP SLIDES 26
27
Training efficiency 27
28
Cluster usage for one tree 28 Time (s) Machine (235) Preprocess 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (failed) 19 18.3 hours, 137.2 CPU days, 107421 processes, 29.56 TB data, average parallelism=140 14400 processes Normalize Tree
29
DryadLINQ Language Summary 29 Where Select GroupBy OrderBy Aggregate Join
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.