Parallel Applications And Tools For Cloud Computing Environments Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010
Large Scale PageRank with Iterative MapReduce Shuohuan,Yuduo,Parag,Hui
Outline motivation of large scale pagerank optimization strategies experiments results visualization with PlotViz3
PageRank Large scale PageRank Large graph processing become popular Efficient processing of large scale graph challenges current MapReduce runtimes. Motivation: common optimization strategies for large scale PageRank Current status Twister, Hadoop,DryadLINQ with ClueWeb data set with 50 million pages MPI PageRank
Optimization Strategies Cache partitions of web graph in Memory Twister, Pregel, HaLoop, Surfer, Static Data (am files) Partition the web graph DryadLINQ, (Twister, Hadoop) PageRank Task granularity should fit the memory and network bandwidth in Cloud infrastructure Hierarchy messaging in reduce stage Hadoop, (Twister, DryadLINQ) PageRank Local merge
Cache Static Data
Partition the WebGraph scalability with various nodes on Madrid
Partition the web graph scalability with various input data size on Tempest
Hierarchy Messaging in Reduce Stage
Visualization with PlotViz3 1k vertices, red vertex: wikipedia.org