Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012
Label body parts in depth map 2 Parallelizing the Training of the Kinect Body Parts Labeling Algorithm Parallelizing the Training of the Kinect Body Parts Labeling Algorithm Mihai Budiu, Jamie Shotton, Derek G. Murray, and Mark Finocchio Big Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16-17, 2011
Solution: Learn from Data 3 Classifier Training examples Machine learning
Big data 4 1M Training examples 300,000 pixels/image 100,000 features <2 20 tree nodes/tree 31 body parts 3 trees Dryad DryadLINQ Decision forest inference Classifier
Execution Application Data-Parallel Computation 5 Storage Language Parallel Databases Map- Reduce GFS BigTable Cosmos Azure HPC Dryad DryadLINQ Sawzall,FlumeJava Hadoop HDFS S3 Pig, Hive SQL≈SQLLINQSawzall, Java
Dryad = 2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 6
Virtualized 2-D Pipelines 7
8
9
10
Virtualized 2-D Pipelines 11 2D DAG multi-machine virtualized
Fault Tolerance 12
LINQ 13 Dryad => DryadLINQ
14 LINQ =.Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ Data Model 15 Partition Collection.Net objects
Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 16 DryadLINQ = LINQ + Dryad C# collection results C# Vertex code Query plan (Dryad job) Data
Kinect Training Pipeline 17 Train one tree level NormalizationCompute histograms Preprocessing Overlay playersInject noiseAdd background Upload Data DistributeReplicate 20x
Partial tree Images Features split New partial tree Query plan for one tree layer 18 Parallelize on: Features Images Tree nodes
High cluster utilization 19 Time Machine
CONCLUSIONS 20
Huge Commercial Success 21
Tremendous Interest from Developers 22
Consumer Technologies Push The Envelope 23 Price: 6000$ Price: 150$
Unique Opportunity for Technology Transfer 24
I can finally explain to my son what I do for a living… 25
BACKUP SLIDES 26
Training efficiency 27
Cluster usage for one tree 28 Time (s) Machine (235) Preprocess (failed) hours, CPU days, processes, TB data, average parallelism= processes Normalize Tree
DryadLINQ Language Summary 29 Where Select GroupBy OrderBy Aggregate Join