Parallel Applications And Tools For Cloud Computing Environments

Slides:



Advertisements
Similar presentations
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Advertisements

epiC: an Extensible and Scalable System for Processing Big Data
SALSA HPC Group School of Informatics and Computing Indiana University.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Pregel: A System for Large-Scale Graph Processing
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.
SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSA HPC Group School of Informatics and Computing Indiana University.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -
SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.
Spark System Background Matei Zaharia  [June HotCloud ]  Spark: Cluster Computing with Working Sets  [April NSDI.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
Some slides adapted from those of Yuan Yu and Michael Isard
Hadoop Aakash Kag What Why How 1.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
PREGEL Data Management in the Cloud
Data Structures and Algorithms in Parallel Computing
Applying Twister to Scientific Applications
湖南大学-信息科学与工程学院-计算机与科学系
Mingxing Zhang, Youwei Zhuo (equal contribution),
Mayank Bhatt, Jayasi Mehar
Replication-based Fault-tolerance for Large-scale Graph Processing
Scalable Parallel Interoperable Data Analytics Library
Twister4Azure : Iterative MapReduce for Azure Cloud
MapReduce.
Pregelix: Think Like a Vertex, Scale Like Spandex
Group 15 Swathi Gurram Prajakta Purohit
MapReduce: Simplified Data Processing on Large Clusters
Motivation Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication.
Iterative and non-Iterative Computations
Presentation transcript:

Parallel Applications And Tools For Cloud Computing Environments Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010

Large Scale PageRank with Iterative MapReduce Shuohuan,Yuduo,Parag,Hui

Outline motivation of large scale pagerank optimization strategies experiments results visualization with PlotViz3

PageRank Large scale PageRank Large graph processing become popular Efficient processing of large scale graph challenges current MapReduce runtimes. Motivation: common optimization strategies for large scale PageRank Current status Twister, Hadoop,DryadLINQ with ClueWeb data set with 50 million pages MPI PageRank

Optimization Strategies Cache partitions of web graph in Memory Twister, Pregel, HaLoop, Surfer, Static Data (am files) Partition the web graph DryadLINQ, (Twister, Hadoop) PageRank Task granularity should fit the memory and network bandwidth in Cloud infrastructure Hierarchy messaging in reduce stage Hadoop, (Twister, DryadLINQ) PageRank Local merge

Cache Static Data

Partition the WebGraph scalability with various nodes on Madrid

Partition the web graph scalability with various input data size on Tempest

Hierarchy Messaging in Reduce Stage

Visualization with PlotViz3 1k vertices, red vertex: wikipedia.org