Download presentation
Presentation is loading. Please wait.
1
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. An XMT/MTGL Case Study: PageRank Jonathan Berry Scalable Algorithms Department Sandia National Laboratories July 23, 2008
2
Informatics Datasets Are Different Informatics: The analysis of datasets arising from “information” sources such as the WWW (not physical simulation) Motivating Applications: Homeland security Computer security (DOE emphasis) Biological networks, etc. Primary HPC Implication: Any partitioning is “bad” “One of the interesting ramifications of the fact that the PageRank calculation converges rapidly is that the web is an expander-like graph” Page, Brin, Motwani,Winograd 1999 From UCSD ‘08 Broder, et al. ‘00
3
We Are Developing The MultiThreaded Graph Library Enables multithreaded graph algorithms Based upon community standard (Boost Graph Library) Abstracts data structures and other application specifics Hide some shared memory issues Preserves good multithreaded performance MTGL ADAPTER MTGL C C S-T connectivity scaling (MTA-2)SSSP scaling (MTA-2) MTA-2 Processors Solve time (sec)
4
Initial Algorithmic Impacts of MTGL on XMT Are Promising S-T Connectivity –Gathering evidence for 2005 prediction –This plot show results for ≤ 2 billion edges –Even with Threadstorm 2.0 limitations, XMT predictions look good Connected Components –Simple SV is fast, but hot-spots –Multilevel Kahan algorithm scales (but XMT data incomplete) # XMT Processors Time (s) MTGL Shiloach-Vishkin algorithm 32000p Blue Gene\L MTGL/MTA 10p Time (s) # Edges MTGL Kahan’s algorithm MTGL/XMT 10p
5
The “PageRank Derby” Ranking of data is a key operation –Which terabytes of some petabytes of data are the most interesting? –Which gigabytes of those terabytes? Etc.. We have chosen PageRank as a candidate kernel ranking operation (though there are others) We wish to understand computational tradeoffs for various architectures and various datasets –XMT, XT4, Niagara, Netezza, Hadoop, etc. We simulate real data with “R-MAT” graphs (Faloutsos, et al.) –No previous results for traditional HPC (distributed memory) –Two of Sandia’s top distributed memory people have gotten some.
6
“R-MAT” (Recursive MATrix Decomposition) Think of dropping a marble through a sequence of plastic trays with holes in them –Pick an initiator with k=pq holes –The i’th level has k^i holes –The marble goes through each hole with some probability (normalized to 1.0 over all holes) –The bottom level has N cells (1x1) and is the adjacency matrix The probabilities determine the nature of the graph –All equal generates Erdos-Renyi graphs –Unbalanced probabilities can lead to inverse power-law graphs. 0.570.19 0.05 0.570.19 0.05 0.570.19 0.05 0.570.19 0.05 0.570.19 0.05
7
PageRank’s Kernel Operation Vertices “vote” by contributing their current rank to their neighbors (in proportion) For example, supposing that all current ranks are 1.0: –u contributes 0.5 to x –v contributes 0.33.. To x –w contributes 0.5 to x This operation, done over all edges, dominates the running time of PageRank v u w x
8
Attempt 1: Go Through The Adjacency Lists = “bad”
9
Attempt 2: Go Over The Edges
10
Attempt 3: Load Balance Via Your Own Threads = “kind of ok, but not very pleasing”
11
Attempt 4: Remove The Hot Spots, Auto-Parallelize Load balanced and no hot spot! Extra memory for in-adjacencies
12
“CANAL” Output Load balanced and no hot spot! Extra work to compute parallel prefix for merge
13
“CANAL” Output – Serial Inner Loop Can still work well in cases with enough work – even with high degree vertices. Less total work than previous (nodep) Serial inner loop is a scalability risk
14
Current Performance Results Environment variable throttles back the number of streams for 128P scaling
15
The MTGL Is Having An Interdisciplinary Impact Algorithms/architectures/visualization integration –Sandia architects profiled MTGL to predict performance on XMT –Titan visualization framework uses MTGL –Qthreads/MTGL → X-caliber driver application Scalable facility location on MTA-2 –Based on expertise gained in EPA sensor placement WFO project –Applications to community detection, sensor placement, … MTGL/Qthreads PageRank On Niagara Seconds Threads
16
Impact of HPC Informatics Activities LDRD –The Networks Grand Challenge LDRD is building on MTGL’s success Industry –2005 WFO project helped justify the Cray XMT Scholarly community –3 algorithms track papers 1 st MTAAP (2007) –Keynote talk at 2 nd MTAAP (2008) –IEEE CiSE Special Issue on Combinatorial Computing –DIMACS shortest paths challenge –Indiana University collaboration: Parallel Processing Letters, BGL refactor
17
Acknowledgements MultiThreading Background Simon Kahan (Google (formerly Cray)) Petr Konecny (Google (formerly Cray)) MultiThreading/Distributed Memory Comparisons Kristyn Maschhoff (Cray) Bruce Hendrickson (1415) Douglas Gregor (Indiana U.) Andrew Lumsdaine (Indiana U.) MTGL Algorithm Design and Development Vitus Leung (1415) Kamesh Madduri (Georgia Tech.) William McLendon (1423) Cynthia Phillips (1412) MTGL Integration Brian Wylie (1424) Kyle Wheeler (Notre Dame) Brian Barrett (1422)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.