Deployed Large-Scale Graph Analytics: Use Cases, Target Audiences, and Knowledge Discovery Toolbox (KDT) Technology Aydin Buluc, LBL John.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Implementing Iterative Algorithms with SPARQL Robert Techentin, Barry Gilbert, Mayo Clinic Adam Lugowski, Kevin Deweese, John Gilbert, University of California.
Social network partition Presenter: Xiaofei Cao Partick Berg.
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
Microsoft Proprietary High Productivity Computing Large-scale Knowledge Discovery: Co-evolving Algorithms and Mechanisms Steve Reinhardt Principal Architect.
An Associative Broadcast Based Coordination Model for Distributed Processes James C. Browne Kevin Kane Hongxia Tian Department of Computer Sciences The.
1 Networking through Linux Partha Sarathi Dasgupta MIS Group Indian Institute of Management Calcutta.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Tools and Primitives for High Performance Graph Computation
FLANN Fast Library for Approximate Nearest Neighbors
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Application Layer Functionality and Protocols Network Fundamentals – Chapter.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Architectural Design.
1 High-Performance Graph Computation via Sparse Matrices John R. Gilbert University of California, Santa Barbara with Aydin Buluc, LBNL; Armando Fox, UCB;
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Chapter 9 Elements of Systems Design
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
PROGRAMMING IN VISUAL BASIC.NET INTRODUCTION TO VISUAL BASIC.NET Bilal Munir Mughal 1 Chapter-1.
1 Graph Algorithms in the Language of Linear Algebra John R. Gilbert University of California, Santa Barbara CS 240A presentation adapted from: Intel Non-Numeric.
Knowledge Discovery Toolbox kdt.sourceforge.net Adam Lugowski.
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Architectural Design portions ©Ian Sommerville 1995 Establishing the overall structure of a software system.
Threads, Thread management & Resource Management.
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
© 2008 IBM Corporation ® Atlas for Lotus Connections Unlock the power of your social network! Customer Overview Presentation An IBM Software Services for.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Interfacing Registry Systems December 2000.
8.1 Lawson Security Overview Del Dehn Product Manager.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Representing and Using Graphs
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Paper_topic: Parallel Matrix Multiplication using Vertical Data.
Parallel Computing Presented by Justin Reschke
Background Computer System Architectures Computer System Software.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
TensorFlow– A system for large-scale machine learning
Connected Maintenance Solution
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Spark Presentation.
Connected Maintenance Solution
Software Design and Architecture
Harry Xu University of California, Irvine & Microsoft Research
Introduction to Spark.
湖南大学-信息科学与工程学院-计算机与科学系
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
AWS Cloud Computing Masaki.
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Presentation transcript:

Deployed Large-Scale Graph Analytics: Use Cases, Target Audiences, and Knowledge Discovery Toolbox (KDT) Technology Aydin Buluc, LBL John Gilbert, Adam Lugowski and Drew Waranis, UCSB David Alber and Steve Reinhardt, Microsoft

Knowledge Discovery Toolbox enables rapid algorithm development and fast execution for large-scale complex graph analytics

memory 2. Build input graph 3. Analyze input graph 1a. Cull relevant historical data 4. Visualize result graph Knowledge Discovery Workflow 1b. Use streaming data Data filtering technologies KDT Graph viz engine

Agenda Use cases and audiences for graph analytics Technology Next steps

Graph Analytics Graphs arise from – Social networks (human or animal) – Transaction networks (e.g., Internet, banking) – Molecular biological interactions (e.g., protein-protein interactions) Many queries are – Ranking – Clustering – Matching / Aligning Graphs are not all the same – Directed simple graphs, hypergraphs, bipartite graphs, with or without attributes on edges or vertices, …

Use Case: Find Influential People in a Social Network Warfighter wants to understand a social network (e.g., village, terrorist group); see DARPA GUARDDOG Specifically, wants to identify leaders / influencers GUI selects data, calls KDT to identify top N influencers Warfighters

Use Cases Homeland security / Understand roles of members of terrorist groups based on known links between them / “Looking just at cell-phone communications, who are the leaders?” International banking / Detect money laundering / “Find instances of money being transferred at least 5 times and coming back to its source.” Common thread: Enabling the knowledge-discovery domain expert to analyze graphs directly gets to the “right” answer faster and possibly at all. (In the embedded context, the end- user and the KD domain expert are likely different people.)

Audiences End-users / warfighters – True end-user GUI not addressed by KDT Knowledge discovery domain experts – Are experts in something other than graph analytics – Have large graphs they need to explore as part of their work – Want simple, robust, scalable, flexible package Graph-analytic researchers – Are experts in graph analytics, machine learning, etc. – Want to experiment with new algorithms … – And get feedback from users on efficacy on large data Efficiency-level developers – Call-backs in C++ currently have big performance advantage – Formatting data for ingest

Agenda Use cases and audiences for graph analytics Technology Next steps

Local v. Global Metrics Degree Centrality v. Betweenness Centrality A B Is vertex A or B most central? – A has directed edges to more vertices (degree centrality) – B is on more shortest paths between vertex pairs (betweenness centrality)

Algorithms: Insight v Graph Traversals Graph traversals (~= execution time) Insight O(|E|) O(|E| 2 ) Exact betweenness centrality Degree centrality Egocentrality Approximate betweenness centrality K-betweenness centrality Search for better algorithms

Knowledge Discovery Toolbox (KDT) Overview Target audiences – Primarily, (non-graph-expert) domain experts needing to analyze large graphs – Secondarily, graph-algorithm researchers and developers needing access to highly performant scalable graph infrastructure Target use cases – Broadly, problems needing the detail of algorithms that traverse the graph extensively – Social-network-based ranking and search – Homeland security Current KDT practicalities – Abstractions are (semantic) directed graph and sparse and dense vectors, all of which are distributed across a cluster – Python interface layered on Combinatorial BLAS Delivers full scaling of CombBLAS with negligible Python overhead for non-semantic graphs – v0.2 release expected in October x86-64 clusters running Windows or Linux – Open-source code available at kdt.sourceforge.net under New BSD license

Parsimony with New Concepts for Domain Experts (Semantic) directed graphs – constructors, I/O – basic graph metrics (e.g., degree() ) – vectors Clustering: Markov, and components Ranking: betweenness centrality, PageRank Matching: k-cycles Hypergraphs and sparse matrices Graph primitives (e.g., bfsTree() ) SpMV / SpGEMM on semirings # bigG contains the input graph comp = bigG.connComp() giantComp = comp.hist().argmax() G = bigG.subgraph(comp==giantComp) clus = G.cluster(‘Markov’) clusNedge = G.nedge(clus) smallG = G.contract(clus) # visualize Markov Clustering Input Graph Largest Component Graph of Clusters […] L = G.toSpParMat() d = L.sum(kdt.SpParMat.Column) L = -L L.setDiag(d) M = kdt.SpParMat.eye(G.nvert()) – mu*L pos = kdt.ParVec.rand(G.nvert()) for i in range(nsteps): pos = M.SpMV(pos)

Graph API (v0.2) Ranking exact and approx BC, PageRank Community Detection Network Vulnerability Analysis Applications DiGraph bfsTree, isBfsTree plus utility (e.g., DiGraph,nvert, toParVec,degree,load,UFget,+,*, sum,subgraph,reverseEdges) 64-bit and single-bit elements Algorithms and primitives Graph500 (Sp)Vec (e.g., +,*,|,&,>,==,[], abs,max,sum,range, norm, hist,randPerm, scale, topK) Graph-problems Clustering Markov, connected components SpMat (e.g., +,*, SpMV, SpGEMM, SpMV_SemiRing, HyGraph bfsTree, isBfsTree plus utility (e.g., HyGraph,nvert, toParVec,degree, load,UFget) SpMV_SemiRing SpMM_SemiRing CombBLAS Separation of interfaces Matching semantic support (filters, objects)

Semantic Graph Use Case “Looking just at cell-phone communications, who are the leaders?” import kdt # user function that converts a (file) record into an edge def readRecord(self, sourceV, destV, record): sourceV = record[0] destV = record[1] self.category = record[2] self.type = record[3] return (sourceVert, destVert, self) G = kdt.DiGraph.load(‘/file/my/graph/data’, readRecord) # edges for which the edge-filter returns True will # be used in the calculation edgeFilter = lambda x: x.category == CellPhone G.addEFilter(edgeFilter) # calculate leaders via approximate betweenness centrality bc = G.centrality(‘approxBC’) leaders = bc.topK(10) Caveat: Currently, expressing the filter in Python (rather than C++) leads to a big performance decrease; reducing/eliminating this decrease is work in progress.

Example Algorithm: Find a breadth-first tree starting from a given vertex from ATAT to Cell-phone call Text message

X ATXATX  from ATAT to

X ATXATX  from ATAT to 4 2 2

from ATAT to X ATXATX 

from ATAT to X ATXATX 

The Case for Sparse Matrices Many irregular applications contain coarse-grained parallelism that can be exploited by abstractions at the proper level. Traditional graph computations Graphs in the language of linear algebra Data driven, unpredictable communication. Fixed communication patterns Irregular and unstructured, poor locality of reference Operations on matrix blocks exploit memory hierarchy Fine grained data accesses, dominated by latency Coarse grained parallelism, bandwidth limited The case for sparse matrices

Performance Graph500 in KDT or Combinatorial BLAS Graph500 benchmark on 8B edges, C++ or KDT calling CombBLAS NERSC “Hopper” machine (Cray XE6) [Buluç & Madduri]: New hybrid of CombBLAS MPI + OpenMP gets 25 GTEPS on 2T edges (scale 37) on 43,200 cores of Hopper

Performance Betweenness Centrality With a few hundred cores, can do even a complex graph analysis in near-interactive time 2M edges, approximate betweenness centrality sampling at 3%

Productivity Betweenness centrality – Python version initially written to SciPy interfaces – Porting to KDT took 11 hours for working, scalable implementation Markov clustering – Written by an undergraduate in 6 hours

Agenda Use cases and audience Technology Next steps

Next Steps Core technology – Evolve semantic graph support so fully usable – Implement support for streaming graphs Engineering – Couple with GUI / graph viz package – Port to Windows Azure – Accept more data formats – Extend coverage of clustering, ranking, and matching algorithms

KDT Summary Open-source toolbox targeted at domain experts Scalable to 10B-edge graphs and thousands of cores Limited set of methods, no graph viz yet kdt.sourceforge.net for details If you - have other use cases - need specific data formats or methods - have developed a method please contact me at

Knowledge Discovery Toolbox enables rapid algorithm development and fast execution for large-scale complex graph analytics

Backup

Further Info Linked, by Albert-Laszlo Barabasi Graph Algorithms in the Language of Linear Algebra, by John Gilbert and Jeremy Kepner, SIAM

KDT Data filtering viz Cloud Benefits for Graph Analytics For domain expert – Elasticity of compute resource – Ready availability of needed data – what? – Ready availability of new methods – which? For graph-algorithm researcher – Quickly try your algorithm on big data – Quickly make it visible to domain experts Elastic compute and memory Needed big data Needed methods

“Transport of the mails, transport of the human voice, transport of flickering pictures - - in this century, as in others, our highest accomplishments still have the single aim of bringing men together.” Antoine de Saint-Exupery

Undelivered Possibilities Graph viz More ranking/clustering/matching options Availability in Azure Initial stages on disk, later stages in memory Dynamic/streaming graphs

Use Case: Find Influential People in a Social Network MyGroup Promoter Promoter has a SN group Wants to identify influencers on which to focus marketing efforts so as to maximize viral effect of the group Calls KDT with group name, gets back top N influencers Useful for (e.g.) viral marketing, public health

Comparison to Other Parallel Packages Package Target users InterfaceSupported memory* Graph-alg devs Domain experts PegasusXHadoopDistributed on-disk PregelXC++Distributed on-disk PBGLXC++Distributed in-memory MTGLXC++Shared SNAP (GA Tech)XCShared SNAP (Stanford)XXC++ / NodeXLShared GraphLabXC++Shared CombBLASXC++Shared or distributed, in-memory KDTXXPythonShared or distributed, in-memory *“Shared” meaning either cache-coherent or Cray XMT-style

ATAT from to Example Implementation: bfsTree

X ATAT from to ATXATX  Technically Ecologically

X ATAT from to ATXATX  Technically Ecologically

X ATAT from to ATXATX  Technically Ecologically

X ATAT from to ATXATX  6 Technically Ecologically

Many Graphs Don’t Decompose Simply onto Distributed Memory 4n exchanges n^2 FLOPS Good locality 4n exchanges n^2 FLOPS Good locality ? exchanges ? OPS Usually poor locality, hence frequent comms, hence often a poor match for MapReduce

Identification of Primitives Sparse matrix-matrix multiplication (SpGEMM) Element-wise operations × Matrices on various semirings: (x, +), (and, or), (+, min), … Sparse matrix-dense vector multiplication Sparse matrix indexing ×.* Sparse array-based primitives

Some Combinatorial BLAS functions

bfsTree Implementation in KDT, for DiGraphs (Kernel 2 of Graph500) def bfsTree(self, root, sym=False): if not sym: self.T() # synonym for reverseEdges parents = dg.ParVec(self.nvert(), -1) fringe = dg.SpParVec(self.nvert()) parents[root] = root fringe[root] = root while fringe.nnn() > 0: fringe.spRange() self._spm.SpMV_SelMax_inplace(fringe._spv) pcb.EWiseMult_inplacefirst(fringe._spv, parents._dpv, True, -1) parents[fringe] = fringe if not sym: self.T() return parents SpMV and EWiseMult are CombBLAS ops that do not yet have good graph abstractions – pathsHop is an attempt for one flavor of SpMV Technically Ecologically

pageRank Implementation in KDT (p. 1 of 2) def pageRank(self, epsilon = 0.1, dampingFactor = 0.85): # We don't want to modify the user's graph. G = self.copy() nvert = G.nvert() G._spm.removeSelfLoops() # Handle sink nodes (nodes with no outgoing edges) by # connecting them to all other nodes. degout = G.degree(gr.Out) nonSinkNodes = degout.findInds() nSinkNodes = nvert - len(nonSinkNodes) iInd = ParVec(nSinkNodes*(nvert)) jInd = ParVec(nSinkNodes*(nvert)) wInd = ParVec(nSinkNodes*(nvert), 1) sinkSuppInd = 0 for ind in range(nvert): if degout[ind] == 0: # Connect to all nodes. for sInd in range(nvert): iInd[sinkSuppInd] = sInd jInd[sinkSuppInd] = ind sinkSuppInd = sinkSuppInd + 1 sinkMat = pcb.pySpParMat(nvert, nvert, iInd._dpv, jInd._dpv, wInd._dpv) sinkG = DiGraph() sinkG._spm = sinkMat This portion looks more like graph operations Technically Ecologically

pageRank Implementation in KDT (p. 2 of 2) (main loop) G.normalizeEdgeWeights() sinkG.normalizeEdgeWeights() # PageRank loop delta = 1 dv1 = ParVec(nvert, 1./nvert) v1 = dv1.toSpParVec() prevV = SpParVec(nvert) dampingVec = SpParVec.ones(nvert) * ((1 - dampingFactor)/nvert) while delta > epsilon: prevV = v1.copy() v2 = G._spm.SpMV_PlusTimes(v1._spv) + \ sinkG._spm.SpMV_PlusTimes(v1._spv) v1._spv = v2 v1 = v1*dampingFactor + dampingVec delta = (v1 - prevV)._spv.Reduce(pcb.plus(), pcb.abs()) return v1 This portion looks much more like matrix algebra Technically Ecologically

Graph500 Implementation in KDT (p. 1 of 2) scale = 15 nstarts = 640 GRAPH500 = 1 if GRAPH500 == 1: G = dg.DiGraph() K1elapsed = G.genGraph500Edges(scale) if nstarts > G.nvert(): nstarts = G.nvert() deg3verts = (G.degree() > 2).findInds() deg3verts.randPerm() starts = deg3verts[dg.ParVec.range(nstarts)] G.toBool() K2elapsed = 1e-12 K2edges = 0 for start in starts: start = int(start) if start==0: #HACK: avoid root==0 bugs for now continue before = time.time() parents = G.bfsTree(start, sym=True) K2elapsed += time.time() - before if not k2Validate(G, start, parents): print "Invalid BFS tree generated by bfsTree" print G, parents break [origI, origJ, ign] = G.toParVec() K2edges += len((parents[origI] != -1).find()) Technically Ecologically

Graph500 Implementation in KDT (p. 2 of 2) def k2Validate(G, start, parents): ret = True bfsRet = G.isBfsTree(start, parents) if type(ret) != tuple: if dg.master(): print "isBfsTree detected failure of Graph500 test %d" % abs(ret) return False (valid, levels) = bfsRet # Spec test #3: [origI, origJ, ign] = G.toParVec() li = levels[origI] lj = levels[origJ] if not ((abs(li-lj) <= 1) | ((li==-1) & (lj==-1))).all(): if dg.master(): print "At least one graph edge has endpoints whose levels differ by more than one and is in the BFS tree" print li, lj ret = False # Spec test #4: neither_in = (li == -1) & (lj == -1) both_in = (li > -1) & (lj > -1) out2root = (li == -1) & (origJ == start) if not (neither_in | both_in | out2root).all(): if dg.master(): print "The tree does not span the connected component exactly, root=%d" % start ret = False # Spec test #5: respects = abs(li-lj) <= 1 if not (neither_in | respects).all(): if dg.master(): print "At least one vertex and its parent are not joined by an original edge" ret = False return ret - #1 and #2: implemented in isBfsTree - #3: every input edge has vertices whose levels differ by no more than 1. Note: don't actually have input edges, will use the edges in the resulting graph as a proxy - #4: the BFS tree spans a connected component's vertices (== all edges either have both endpoints in the tree or not in the tree, or source is not in tree and destination is the root) - #5: a vertex and its parent are joined by an edge of the original graph Technically Ecologically