Download presentation
Presentation is loading. Please wait.
Published byCrystal Booker Modified over 9 years ago
1
Programming clusters with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Association of C and C++ Users (ACCU) Mountain View, CA, April 13, 2011
2
Goal 2
3
Design Space 3 Throughput (batch) Latency (interactive) Internet Data center Data- parallel Shared memory
4
Execution Application Data-Parallel Computation 4 Storage Language Parallel Databases Map- Reduce GFS BigTable Cosmos Azure SQL Server Dryad DryadLINQ Scope Sawzall,FlumeJava Hadoop HDFS S3 Pig, Hive SQL≈SQLLINQ, SQLSawzall, Java
5
Software Stack: Talk Outline 5 Windows Server Cluster services Cluster storage Dryad DryadLINQ Windows Server Applications
6
DRYAD 6 Windows Server Cluster services Cluster storage Dryad DryadLINQ Windows Server Applications
7
Dryad Continuously deployed since 2006 Running on >> 10 4 machines Sifting through > 10Pb data daily Runs on clusters > 3000 machines Handles jobs with > 10 5 processes each Platform for rich software ecosystem Used by >> 100 developers Written at Microsoft Research, Silicon Valley 7 The Dryad by Evelyn De Morgan. Evelyn De Morgan
8
Dryad = Execution Layer 8 Job (application) Dryad Cluster Pipeline Shell Machine ≈
9
2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl Dryad: 2-D grep 1000 | sed 500 | sort 1000 | awk 500 | perl 50 9
10
Virtualized 2-D Pipelines 10
11
Virtualized 2-D Pipelines 11
12
Virtualized 2-D Pipelines 12
13
Virtualized 2-D Pipelines 13
14
Virtualized 2-D Pipelines 14 2D DAG multi-machine virtualized
15
Dryad Job Structure 15 grep sed sort awk perl grep sed sort awk Input files Vertices (processes) Output files Channels Stage
16
Channels 16 X M Items Finite streams of items distributed filesystem files (persistent) SMB/NTFS files (temporary) TCP pipes (inter-machine) memory FIFOs (intra-machine)
17
Dryad System Architecture 17 Files, TCP, FIFO, Network job schedule data plane control plane NS, Sched RE V VV Job managercluster
18
Fault Tolerance
19
DRYADLINQ 19 Windows Server Cluster services Cluster storage Dryad DryadLINQ Windows Server Applications
20
LINQ 20 Dryad => DryadLINQ
21
21 LINQ =.Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
22
Collections and Iterators 22 class Collection : IEnumerable ; Elements of type T Iterator (current element)
23
DryadLINQ Data Model 23 Partition Collection.Net objects
24
Collection collection; bool IsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 24 DryadLINQ = LINQ + Dryad C# collection results C# Vertex code Query plan (Dryad job) Data
25
Demo 25
26
Example: counting lines var table = PartitionedTable.Get (file); int count = table.Count(); Parse, Count Sum
27
Example: counting words var table = PartitionedTable.Get (file); int count = table.SelectMany(l => l.line.Split(‘ ‘)).Count(); Parse, SelectMany, Count Sum
28
Example: counting unique words var table = PartitionedTable.Get (file); int count = table.SelectMany(l => l.line.Split(‘ ‘)).GroupBy(w => w).Count(); GroupBy; Count HashPartition
29
Example: word histogram var table = PartitionedTable.Get (file); var result = table.SelectMany(l => l.line.Split(' ')).GroupBy(w => w).Select(g => new { word = g.Key, count = g.Count() }); GroupBy; Count GroupBy Count HashPartition
30
Example: high-frequency words var table = PartitionedTable.Get (file); var result = table.SelectMany(l => l.line.Split(' ')).GroupBy(w => w).Select(g => new { word = g.Key, count = g.Count() }).OrderByDescending(t => t.count).Take(100); Sort; Take Mergesort; Take
31
Example: words by frequency var table = PartitionedTable.Get (file); var result = table.SelectMany(l => l.line.Split(' ')).GroupBy(w => w).Select(g => new { word = g.Key, count = g.Count() }).OrderByDescending(t => t.count); Sample Histogram Broadcast Range-partition Sort
32
Example: Map-Reduce public static IQueryable MapReduce ( IQueryable input, Func > mapper, Func keySelector, Func,S> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result; }
33
Map-Reduce Plan 33 M R G M Q G1G1 R D MS G2G2 R X X M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R X M Q G1G1 R D M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R X M Q G1G1 R D G2G2 R G2G2 R map sort groupby reduce distribute mergesort groupby reduce mergesort groupby reduce consumer map partial aggregation reduce dynamic
34
Expectation Maximization 34 160 lines 3 iterations shown
35
Probabilistic Index Maps 35 Images features
36
Language Summary 36 Where Select GroupBy OrderBy Aggregate Join
37
What Is It Good For? 37
38
What is Kinect? 38
39
Input device 39
40
The Innards Source: iFixit 40
41
Projected IR pattern Source: www.ros.org 41
42
Depth computation Source: http://nuit-blanche.blogspot.com/2010/11/unsing-kinect-for-compressive-sensing.html 42
43
Kinect video output 30 HZ frame rate 57deg field-of-view 8-bit VGA RGB 640 x 480 11-bit depth 320 x 240 43
44
Depth map Source: www.insidekinect.com 44
45
Vision Problem: What is a human 45 Recognize players from depth map At frame rate Minimal resource usage
46
XBox 360 Hardware Source: http://www.pcper.com/article.php?aid=940&type=expert 46 Triple Core PowerPC 970, 3.2GHz Hyperthreaded, 2 threads/core 500 MHz ATI graphics card DirectX 9.5 512 MB RAM 2005 performance envelope Must handle real-time vision AND a modern game
47
Why is it hard?
48
Generic Extensible Architecture 48 Expert 1 Expert 2 Expert 3 Arbiter Stateless Raw data Sensor Skeleton estimates Final estimate probabilistic fuses the hypotheses Stateful
49
Background segmentation Player separation Body Part Classifier One Expert: Pipeline Stages 49 Depth mapSensor Body Part Identification Skeleton
50
Sample test frames 50
51
The Classifier 51 Input Depth map Output Body parts Classifier Runs on GPU @ 320x240
52
52 Start from ground-truth data – depth paired with body parts Train classifier to work across – pose – scene position – Height, body shape Getting the Ground Truth
53
53 Use synthetic data (3D avatar model) Inject noise
54
suit / sensors expensive very accurate high frame rate large space calibration
55
Learn from Data 55 Classifier Training examples Machine learning
56
Cluster-based training 56 Classifier Training examples Dryad DryadLINQ Machine learning > Millions of input frames > 10 20 objects manipulated Sparse, multi-dimensional data Complex datatypes (images, video, matrices, etc.)
57
Highly efficient parallellization 57 time machine
58
CONCLUSIONS 58
59
Conclusions 59 =
60
I can finally explain to my son what I do for a living… 60
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.