Mingxing Zhang, Youwei Zhuo (equal contribution),

GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition
Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

Outline Motivation GraphP Evaluation Graph applications
Processing-In-Memory The drawbacks of the current solution GraphP Evaluation

Graph Applications Social network analytics Recommendation system
Bioinformatics … social graph Resource Description graph underlying representation

Challenges High bandwidth requirement
Small amount of computation per vertex Data movement overhead comp L1 L2 L3 many have been proposed in conventional computer systems, data goes through cache hierarchy from memory to computation units. data movement overhead limits memory access performance mem

PIM: Processing-In-Memory
Idea: Computation logic inside memory Advantage: High memory bandwidth Example: Hybrid Memory Cubes (HMC) comp 320GB/s intra-cube 4x120GB/s inter-cube mem ….. avoid data movement overhead intra

HMC: Hybrid Memory Cubes
120 120 320 Intra-cube Bottleneck: Inter-cube communication Inter-cube Inter-group easy to connect 4 fully connected link connecteach, bandwidth between each cube 120 how to scacle to 16 impossible because up to 4 , 4 cubes as a group dragonfly onely one link connecting group bandwidth between group 120, bandwidth between each cube < 120 bandwidth (GB/s)

Outline Motivation GraphP Evaluation Graph applications
Processing-In-Memory The drawbacks of the current solution GraphP Evaluation

Current Solution: Tesseract
First PIM-based graph processing architecture Programming model Vertex program Partition Based on vertex program Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-in-memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

PageRank in Vertex Program
for (v: vertices) { for (w: edges.destination) { } update = 0.85 * v.rank / v.out_degree; put(w.id, function{ w.next_rank += update; }); barrier(); iterate all vertices iterate all destination (neighours) of the source the implication of it

Graph Partition hmc0 3 4 5 2 1 1 2 1 intra edge vertex 3 4 5 inter
3 4 5 2 1 1 2 1 intra edge vertex 3 4 5 we will be using the same example graph throughout the talk inter edge hmc1 comm put(w.id, function{ w.next_rank += update; }); communication = # of cross-cube edges

Drawback of Tesseract Excessive data communication Why?
Programming Model Graph Partition Data Communication Tesseract ?

Outline Motivation GraphP Evaluation

GraphP Consider graph partition first. Graph Partition
Source-Cut Programming model Two-phase vertex program Reduces inter-cube communication

Source-Cut Partition 3 4 5 2 1 1 2 hmc0 1 intra edge vertex 2 inter
3 4 5 2 1 1 2 hmc0 1 intra edge vertex 2 this is the same graph inter edge 2 replica 3 4 5 hmc1

Two-Phase Vertex Program
for (r: replicas) { } r.next_rank = 0.85 * r.next_rank / r.out_degree; //apply updates from previous iterations 2 02 blink 3 4 5

for (v: vertices) { for (u: edges.sources) { } update += u.rank; 2 4 blink 3 4 5

for (r: replicas) { } put(r.id, function { r.next_rank = update}); 2 barrier(); 3 4 5 +cube boundary 1:1 replica communication 3 4

Benefits Strictly less data communication
Enables architecture optimizations

Less Communication 2 2 2 4 5 4 5 Tesseract GraphP

Broadcast Optimization
for (r: replicas) { } put(r.id, function { r.next_rank = update}); broadcast barrier(); 4 4 4 4

Naïve Broadcast 15 point to point messages src dst dst dst dst
to send to a remote group of 4 cubes, 4 identical messages are sent in the intergroup link dst dst

Hierarchical communication
3 intergroup messages src dst dst only 1 intergroup message per remote group dst dst

Other Optimizations Computation/communication overlap
Leveraging low-power state of SerDes Please see the paper for more details

Outline Motivation GraphP Evaluation

Evaluation Methodology
Simulation Infrastructure zSim with HMC support ORION for NOC Energy modeling Configurations Same as Tesseract 16 HMCs Interconnection: Dragonfly and Mesh2D 512 CPUs Single-issue in-order cores Frequency: 1GHz

Workloads 4 graph algorithms 5 real-world graphs Breadth First Search
Single Source Shortest Path Weakly Connected Component PageRank 5 real-world graphs Wiki-Vote (WV) ego-Twitter (TT) Soc-Slashdot0902 (SD) Amazon0302 (AZ) ljournal-2008 (LJ)

Performance <1.1x data partition 1.7x memory bandwidth Tesseract

Communication Amount

Energy consumption

Other results Bandwidth utilization Scalability Replication overhead
Please see the paper for more details

Conclusions We propose GraphP Key contributions
A new PIM-based graph processing framework Key contributions Data partition as first-order design consideration Source-cut partition Two-phase vertex program Enable additional architecture optimizations GraphP drastically reduces inter-cube communication and improves energy efficiency.

GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition
Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian from USC It is a joint work with Tsinghua University and Stanford university Tsinghua University University of Southern California Stanford University

Workload Size & Capacity
128 GB (16 * 8GB) ~16 billion edges ~400 million edges (SNAP) ~7 billion edges (WebGraph)

Two-phase vertex program
Equivalent Expressiveness as vertex programs

Mingxing Zhang, Youwei Zhuo (equal contribution),

Similar presentations

Presentation on theme: "Mingxing Zhang, Youwei Zhuo (equal contribution),"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mingxing Zhang, Youwei Zhuo (equal contribution),

Similar presentations

Presentation on theme: "Mingxing Zhang, Youwei Zhuo (equal contribution),"— Presentation transcript:

Similar presentations

About project

Feedback