Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012.

Slides:



Advertisements
Similar presentations
Distributed Data-Parallel Programming using Dryad Andrew Birrell, Mihai Budiu, Dennis Fetterly, Michael Isard, Yuan Yu Microsoft Research Silicon Valley.
Advertisements

Cluster Computing with Dryad Mihai Budiu, MSR-SVC LiveLabs, March 2008.
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Introduction to Data Center Computing Derek Murray October 2010.
The DryadLINQ Approach to Distributed Data-Parallel Computing
Machine Learning in DryadLINQ Kannan Achan Mihai Budiu MSR-SVC, 1/30/
Distributed Data-Parallel Computing Using a High-Level Programming Language Yuan Yu Michael Isard Joint work with: Andrew Birrell, Mihai Budiu, Jon Currey,
Cluster Computing with DryadLINQ
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Introduction to Advanced Computing Platforms for Data Analysis Ruoming Jin.
C# and LINQ Yuan Yu Microsoft Research Silicon Valley.
The Kinect body tracking pipeline Oliver Williams, Mihai Budiu Microsoft Research, Silicon Valley With slides contributed by Johnny Lee, Jamie Shotton.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Big Data Platforms Mihai Budiu, Oct My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer.
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep.
Nectar: Efficient Management of Computation and Data in Data Centers Lenin Ravindranath Pradeep Kumar Gunda, Chandu Thekkath, Yuan Yu, Li Zhuang.
PARALLELIZING LARGE-SCALE DATA- PROCESSING APPLICATIONS WITH DATA SKEW: A CASE STUDY IN PRODUCT-OFFER MATCHING Ekaterina Gonina UC Berkeley Anitha Kannan,
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Cloud computing: Infrastructure, Services, and Applications UC Berkeley,
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep.
Kinect Case Study CSE P 576 Larry Zitnick
Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.
Distributed Computations
From LINQ to DryadLINQ Michael Isard Workshop on Data-Intensive Scientific Computing Using DryadLINQ.
Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley.
Dryad / DryadLINQ Slides adapted from those of Yuan Yu and Michael Isard.
Cluster Computing with DryadLINQ Mihai Budiu, MSR-SVC PARC, May
Tools and Services for Data Intensive Research Roger Barga Nelson Araujo, Tim Chou, and Christophe Poulain Advanced Research Tools and Services Group,
Cloud Computing Systems Lin Gu Hong Kong University of Science and Technology Oct. 3, 2011 Hadoop, HDFS and Microsoft Cloud Computing Technologies.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
MapReduce.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Intel Research Berkeley, Systems Seminar Series October 9, 2008.
Presenters: Abhishek Verma, Nicolas Zea.  Map Reduce  Clean abstraction  Extremely rigid 2 stage group-by aggregation  Code reuse and maintenance.
Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:
Microsoft DryadLINQ --Jinling Li. What’s DryadLINQ? A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. [1]
Programming clusters with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Association of C and C++ Users (ACCU) Mountain View, CA, April 13, 2011.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.
An Introduction to HDInsight June 27 th,
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Artemis Logs Database View Data Collectio n GUI Dryad Overview Data collection Distributed system Plug-ins GUI Plug-ins Hunting for Bugs with Artemis System.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
Dryad and DryaLINQ. Dryad and DryadLINQ Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation Dryad provides.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Large-scale Machine Learning using DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Ambient Intelligence: From Sensor Networks to Smart Environments.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Image taken from: slideshare
CS239-Lecture 3 DryadLINQ Madan Musuvathi Visiting Professor, UCLA
Some slides adapted from those of Yuan Yu and Michael Isard
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
CSCI5570 Large Scale Data Processing Systems
Parallel Computing with Dryad
Real-Time Human Pose Recognition in Parts from Single Depth Image
Introduction to Spark.
Assignment 0 (5 points; Due Jan. 15, 2017)
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
Declarative Transfer Learning from Deep CNNs at Scale
DryadInc: Reusing work in large-scale computations
Lecture 29: Distributed Systems
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Presentation transcript:

Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

Label body parts in depth map 2 Parallelizing the Training of the Kinect Body Parts Labeling Algorithm Parallelizing the Training of the Kinect Body Parts Labeling Algorithm Mihai Budiu, Jamie Shotton, Derek G. Murray, and Mark Finocchio Big Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16-17, 2011

Solution: Learn from Data 3 Classifier Training examples Machine learning

Big data 4 1M Training examples 300,000 pixels/image 100,000 features <2 20 tree nodes/tree 31 body parts 3 trees Dryad DryadLINQ Decision forest inference Classifier

Execution Application Data-Parallel Computation 5 Storage Language Parallel Databases Map- Reduce GFS BigTable Cosmos Azure HPC Dryad DryadLINQ Sawzall,FlumeJava Hadoop HDFS S3 Pig, Hive SQL≈SQLLINQSawzall, Java

Dryad = 2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 6

Virtualized 2-D Pipelines 7

8

9

10

Virtualized 2-D Pipelines 11 2D DAG multi-machine virtualized

Fault Tolerance 12

LINQ 13 Dryad => DryadLINQ

14 LINQ =.Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ Data Model 15 Partition Collection.Net objects

Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 16 DryadLINQ = LINQ + Dryad C# collection results C# Vertex code Query plan (Dryad job) Data

Kinect Training Pipeline 17 Train one tree level NormalizationCompute histograms Preprocessing Overlay playersInject noiseAdd background Upload Data DistributeReplicate 20x

Partial tree Images Features split New partial tree Query plan for one tree layer 18 Parallelize on: Features Images Tree nodes

High cluster utilization 19 Time Machine

CONCLUSIONS 20

Huge Commercial Success 21

Tremendous Interest from Developers 22

Consumer Technologies Push The Envelope 23 Price: 6000$ Price: 150$

Unique Opportunity for Technology Transfer 24

I can finally explain to my son what I do for a living… 25

BACKUP SLIDES 26

Training efficiency 27

Cluster usage for one tree 28 Time (s) Machine (235) Preprocess (failed) hours, CPU days, processes, TB data, average parallelism= processes Normalize Tree

DryadLINQ Language Summary 29 Where Select GroupBy OrderBy Aggregate Join