PreprocessingComputePost Proc. XML Raw Data ETL SliceCompute Repeat Subgraph PageRank Initial Graph Analyz e Top Users
GraphX
HDFS Compute Spark Preprocess Spark Post. Raw Wikipedia XML HyperlinksPageRankTop 20 Pages
Id SrcIdDstId Property (E) Collaborator Advisor Colleague PI Property (V) (rxin, student) (jgonzal, postdoc) (franklin, professor) (istoica, professor) Property GraphVertex Table Edge Table rxin stu. rxin stu. franklin, prof. istoica prof. istoica prof. jgonzal, pst.doc. Collab. PI Advisor Colleague
Data-ParallelGraph-Parallel Property Graph Table Result Row
Raw Wikipedia XML HyperlinksPageRankTop 20 Pages TitlePR Text Table TitleBody Topic Model (LDA) Word Topics WordTopic Editor Graph Community Detection User Community UserCom. Term-Doc Graph Discussion Table UserDisc. Community Topic Com.
Part. 2 Part. 1 Vertex Table (RDD) BC AD FE A D Property Graph Edge Table (RDD) A A B B A A C C C C D D B B C C A A E E A A F F E E F F E E D D B B C C D D E E A A F F Routing Table (RDD) B B C C D D E E A A F F D Vertex Cut Heuristic
Vertex CutEdge Cut