Large-scale Machine Learning using DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Ambient Intelligence: From Sensor Networks to Smart Environments and Social Media Workshop Stanford, June 11, 2019
Goal of DryadLINQ 2
Software Stack 3 Windows Server Cluster services Cluster storage Dryad DryadLINQ Windows Server Applications.Net + LINQ
Dryad = Execution Layer 4 Job (application) Dryad Cluster Pipeline Unix Shell Machine ≈
Collection.NET objects of type T LINQ Data Model
LINQ Language Summary 6 Where (filter) Select (map) GroupBy OrderBy (sort) Aggregate (fold) Join Input
LINQ 7 Dryad => DryadLINQ
DryadLINQ Data Model 8 Partition Collection.Net objects
Collection collection; static bool IsLegal(Key c); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 9 DryadLINQ = LINQ + Dryad C# collection results C# Code Dryad job Data
Example: Natal Training 10
Natal Problem 11 Recognize players from depth map At frame rate Low resource usage
Learn from Data 12 Motion Capture (ground truth) Classifier Training examples Machine learning Rasterize
Running on Xbox 13
Cluster-based training 14 Classifier Training examples Dryad DryadLINQ Machine learning
You can have it! Dryad+DryadLINQ available for download – Academic license – Commercial evaluation license Runs on Windows HPC platform Dryad is in binary form, DryadLINQ in source Requires signing a 3-page licensing agreement
Conclusions 17 =