Download presentation
Presentation is loading. Please wait.
Published byKarley Baptiste Modified over 10 years ago
1
Machine Learning in DryadLINQ Kannan Achan Mihai Budiu MSR-SVC, 1/30/2008 1
2
2 Goal
3
The Software Stack Windows Server Cluster Services Distributed Filesystem: Cosmos Dryad DryadLINQ Windows Server Large Vector Machine learning Data analysis 3
4
Dryad 4
5
Dryad Jobs RR XXX MMM XX M M Vertices (processes) Channels Output files Input files Stage M RR X 5
6
6 LINQ and C#
7
LINQ Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 7
8
Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; DryadLINQ = LINQ + Dryad C# collection results C# 8 Vertex code Query plan (Dryad job) Data
9
Recall: The Software Stack Windows Server Cluster Services Distributed Filesystem: Cosmos Dryad DryadLINQ Windows Server Large Vector Machine learning Data analysis 9
10
Very Large Vector Library PartitionedVector 10 T Scalar TT T
11
Operations on Large Vectors: Map 1 11 U T T U f f f preserves partitioning
12
V Map 2 (Pairwise) 12 T U f V U T f
13
Map 3 (Vector-Scalar) 13 T U f V V U T f
14
Reduce (Fold) 14 UUU U f fff f UUU U
15
Linear Algebra 15 T U V =,, T
16
Linear Regression Data Find S.t. 16
17
Analytic Solution 17 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A Map Reduce
18
Linear Regression Code 18 Matrices xx = x.PairwiseOuterProduct(x); OneMatrix xxs = xx.Sum(); Matrices yx = y.PairwiseOuterProduct(x); OneMatrix yxs = yx.Sum(); OneMatrix xxinv = xxs.Map(a => a.Inverse()); OneMatrix A = yxs.Map( xxinv, (a, b) => a.Multiply(b));
19
Expectation Maximization 19 160 lines 3 iterations shown
20
Understanding Botnet Traffic using EM 20 3 GB data 15 clusters 60 computers 50 iterations 9000 processes 50 minutes
21
Conclusions Dryad simplifies programming large clusters DryadLINQ = declarative programming for Dryad jobs The Large Vector library provides simple mathematical primitives on top of DryadLINQ Matlab-style coding for writing distributed numeric computations 21 Win Cluster Services Distributed Filesystem Dryad DryadLINQ Win Large Vector ML Data analysis
22
Backup Slides 22
23
Chaining 23 X×X T Y×X T Σ X[0]X[1]X[2]Y[0]Y[1]Y[2] Σ [ ] -1 * A ΣΣΣΣΣΣ
24
EM Structure 24 E stage Input size π σ μ All parameters
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.