Download presentation
Presentation is loading. Please wait.
Published byΠρίαμ Μπότσαρης Modified over 6 years ago
1
Parallel Applications And Tools For Cloud Computing Environments
Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010
2
Large Scale PageRank with Iterative MapReduce
Shuohuan,Yuduo,Parag,Hui
3
Outline motivation of large scale pagerank optimization strategies
experiments results visualization with PlotViz3
4
PageRank Large scale PageRank
Large graph processing become popular Efficient processing of large scale graph challenges current MapReduce runtimes. Motivation: common optimization strategies for large scale PageRank Current status Twister, Hadoop,DryadLINQ with ClueWeb data set with 50 million pages MPI PageRank
5
Optimization Strategies
Cache partitions of web graph in Memory Twister, Pregel, HaLoop, Surfer, Static Data (am files) Partition the web graph DryadLINQ, (Twister, Hadoop) PageRank Task granularity should fit the memory and network bandwidth in Cloud infrastructure Hierarchy messaging in reduce stage Hadoop, (Twister, DryadLINQ) PageRank Local merge
6
Cache Static Data
7
Partition the WebGraph scalability with various nodes on Madrid
8
Partition the web graph scalability with various input data size on Tempest
9
Hierarchy Messaging in Reduce Stage
10
Visualization with PlotViz3 1k vertices, red vertex: wikipedia.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.