Download presentation
Presentation is loading. Please wait.
Published byVictor Knight Modified over 9 years ago
1
SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall 2014 9/18/2014
2
Graphical models and applications CS5412 Spring 2014 (Cloud Computing: Birman) 2 Artificial intelligence and machine learning is the core technology in many modern cloud settings Support for social networking mechanisms Creating product placement recommendations Understanding the flow of “influence” within communities Graphical processing can also matter in systems Understand what to cache and what not to cache Learning common patterns to optimize
3
What makes this hard? CS5412 Spring 2014 (Cloud Computing: Birman) 3 Prior generation of solutions was too general Programming languages can do anything, but they aren’t at all specialized for graph structured data Database systems are awesome for tabular data but much less optimized for graphical data There is also an issue of scale We’re good at what can be done on one computer But a company like Facebook has billions of users and their infrastructure runs on massive data centers
4
Today’s papers CS5412 Spring 2014 (Cloud Computing: Birman) 4 TAO paper (I’ll start with this) gives a sense of the challenge Facebook confronts Like an entire distributed operating system But the whole role of the solution is to manage graphical data and support queries against it Massive loads and surreal scale Things to notice? How does the architecture of the solution reflect the special environment in which it runs? How did they identify and optimize the critical paths?
5
Dryad/LINQ CS5412 Spring 2014 (Cloud Computing: Birman) 5 Here we see two concepts combined At Microsoft, LINQ has become very popular It embeds a kind of query processing into C# code Dryad takes this one step further Given a LINQ expression, Dryad can run it on a distributed “computing engine” of their own design Idea is to obtain massive parallelism
6
Basic LINQ concepts CS5412 Spring 2014 (Cloud Computing: Birman) 6 LINQ (“language integrated queries”) starts by allowing you to code lambda expressions In-line functions Evaluated when the value is needed, not when defined For example: myPets.Select(a => a.name); myFriends.Select(f => (f.name, f.loc, f.phone.mobile)). Where(f => distance(myloc, f.loc) < 1miles);
7
How Dryad works CS5412 Spring 2014 (Cloud Computing: Birman) 7 Takes a LINQ expression, unevaluated Maps it to a collection of processor nodes that all have access to the same (read-only, unchanging) data files This spreads out the work and gains parallelism!
8
Basic architecture of Dryad CS5412 Spring 2014 (Cloud Computing: Birman) 8
9
Execution of a LINQ expression CS5412 Spring 2014 (Cloud Computing: Birman) 9
10
A join, done in two ways CS5412 Spring 2014 (Cloud Computing: Birman) 10
11
A join, done in two ways CS5412 Spring 2014 (Cloud Computing: Birman) 11
12
MapReduce in Dryad/LINQ CS5412 Spring 2014 (Cloud Computing: Birman) 12
13
Beyond Dryad CS5412 Spring 2014 (Cloud Computing: Birman) 13 In follow-on work these guys did something called Naiad… In that paper, they assert that social networking often comes down to finding fixed points of functions on graphs For example, “look for poker players who are physically within a mile of me and are friends of me or one of my friends”
14
Social network computations CS5412 Spring 2014 (Cloud Computing: Birman) 14 They believe that most parallel social networking computations can be re-expressed as fixed points In essence, define a function (S) for a set S, then iterate until (S) = S. This is the fixed point. They want to compute all the fixed points concurrently for some very large community
15
Can we really find use cases? CS5412 Spring 2014 (Cloud Computing: Birman) 15 All the vehicles on Highway 101 need to continuously “watch for the vehicles that could cut me off if they change path” Define this indirectly too: if truck T changes its trajectory this way, car C might move that way, and then C would cut me off, so include T into the set… The idea is to do all such computations at once!
16
Naiad and Dryad CS5412 Spring 2014 (Cloud Computing: Birman) 16 Then they map Naiad onto Dryad First write functions that compute these sets Next express the fixed-point property over functions Last, seed the data set and then run Dryad to iterate until all the fixed points are found (or until a time-limit is reached, to cover non-convergent functions)
17
Issue? CS5412 Spring 2014 (Cloud Computing: Birman) 17 By the time Naiad is finished, the style of code is very hard to read, although those who write it find it pretty natural to work this way In fact many social networking companies do use this style of functional programming (like JaneStreet, famous for using O’CaML for financial analytics) But is it systems research?
18
Other major systems in this space CS5412 Spring 2014 (Cloud Computing: Birman) 18 Check out http://en.wikipedia.org/wiki/Graph_database http://en.wikipedia.org/wiki/Graph_database They list 50 or so graphical databases and processing systems Some popular ones in research settings are Pregel (from Google), GraphLab (CMU) and Vowpal Wabbit (“Fast Learning”) (Yahoo)
19
Take away? CS5412 Spring 2014 (Cloud Computing: Birman) 19 Computer systems need to be responsive to Styles of use (what our “customers” are doing) Common patterns of load (optimize for this case) In today’s major cloud computing settings, graphical data and graphical learning solutions are becoming a highly dominant form of load and focus Computer systems need to evolve to track this need
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.