Recent Advances in SmartGridSolve Oleg Girko School of Computer Science & Informatics University College Dublin.

Slides:



Advertisements
Similar presentations
National Institute of Advanced Industrial Science and Technology Ninf-G - Core GridRPC Infrastructure Software OGF19 Yoshio Tanaka (AIST) On behalf.
Advertisements

Fault-Tolerance for Distributed and Real-Time Embedded Systems
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
Distributed components
Experiments with SmartGridSolve: Achieving Higher Performance by Improving the GridRPC Model Thomas Brady, Michele Guidolin, Alexey Lastovetsky Heterogeneous.
Distributed Computations
Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
ISPDC 2007, Hagenberg, Austria, 5-8 July On Grid-based Matrix Partitioning for Networks of Heterogeneous Processors Alexey Lastovetsky School of.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming Models and Paradigms
Distributed Object System. Project Goals Develop a distributed system for performing time-consuming calculations. Load Balancing support. Fault Tolerance.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Workload Management Massimo Sgaravatto INFN Padova.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Fine Grain MPI Earl J. Dodd Humaira Kamal, Alan University of British Columbia 1.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Speaker: Xin Zuo Heterogeneous Computing Laboratory (HCL) School of Computer Science and Informatics University College Dublin Ireland International Parallel.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Selective Recovery From Failures In A Task Parallel Programming Model James Dinan*, Sriram Krishnamoorthy #, Arjun Singri*, P. Sadayappan* *The Ohio State.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Service Architecture of Grid Faults Diagnosis Expert System Based on Web Service Wang Mingzan, Zhang ziye Northeastern University, Shenyang, China.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Summary on Distributed System Concepts and Architectures Vijay Neelakandan
Xiao Liu CS3 -- Centre for Complex Software Systems and Services Swinburne University of Technology, Australia Key Research Issues in.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Transparency In Distributed Systems Hiremath,Naveen
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
SUMAN K REDDY BURJUKINDI. Evolution of Modern Operating Systems 1 st Generation: Centralized Operating System 2 nd Generation: Network Operating System.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
Distributed Architectures A Comparative Analysis Client-Server (socket), RPC/RMI,P2P,Grid Where do you want to go today ? Chintan Odhavji Patel and Feng.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
May 16-18, Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter.
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
Cloud Age Time to change the programming paradigm?
A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli
CS 501: Software Engineering Fall 1999 Lecture 12 System Architecture III Distributed Objects.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Chap 7: Consistency and Replication
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Efficient Load Balancing Algorithm for Cloud Computing Network Che-Lun Hung 1, Hsiao-hsi Wang 2 and Yu-Chen Hu 2 1 Dept. of Computer Science & Communication.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Section 2.1 Distributed System Design Goals Alex De Ruiter
Distributed objects and remote invocation Pages
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Using Static Code Analysis to Improve Performance of GridRPC Applications Oleg Girko, Alexey Lastovetsky School of Computer Science & Informatics University.
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
Workload Management Workpackage
TensorFlow– A system for large-scale machine learning
Map Reduce.
Parallel Algorithm Design
Steven Whitham Jeremy Woods
University of Technology
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
Wide Area Workload Management Work Package DATAGRID project
Parallel Programming in C with MPI and OpenMP
MapReduce: Simplified Data Processing on Large Clusters
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

Recent Advances in SmartGridSolve Oleg Girko School of Computer Science & Informatics University College Dublin

GridRPC and collective mapping GridRPC limitations Individual mapping Client-server communication only No communication parallelism Collective mapping Improved balancing of computation load Reduced volume of communication Improved balancing of communication load Increased parallelism of communication Reduced client memory usage and paging Collective mapping requirements DAG of task dependencies Order of task execution

SmartGridRPC: runtime discovery grpc_map() { … } Discovery phase and execution phase Constraints on the code – No control flow dependent on remote task result – grpc_local() { … } for side effects

Runtime discovery: fault tolerance Error phase Re-mapping Re-execution from the beginning Additional constraints on code

ADL: algorithm definition language Separate algorithm definition Parameterised grpc_map() { … } Increased programming efforts – Need to write a separate algorithm definition – Need to keep ADL in sync with the algorithm – No check if ADL diverged from the algorithm

Static code analysis Non-intrusive Algorithm definition extracted by code analysis No limitations of other approaches – No restrictions on code – No separate algorithm description

Fault tolerance for better performance Not restarting from scratch Keeping log of GridRPC calls and task dependencies Restarting failed task and all task it depends on Reducing likelihood of data loss

Publications T. Brady, J. Dongarra, M. Guidolin, A. Lastovetsky, K. Seymour, “SmartGridRPC: The new RPC model for high performance Grid computing”, Concurrency and Computation: Practice and Experience, vol. 22, issue 18, pp , 2010 M. Guidolin, “Performance of GridRPC-based programming systems for distributed scientific computing: issues and solutions”, School of Computer Science and Informatics, Dublin, Ireland, University College Dublin, pp. 168, 04/2010 M. Guidolin, T. Brady, A. Lastovetsky, “How Algorithm Definition Language (ADL) Improves the Performance of SmartGridSolve Applications”, The 7th High- Performance Grid Computing Workshop, Atlanta, USA, Apr 19, 2010 O. Girko, A. Lastovetsky, “Using Static Code Analysis to Improve Performance of GridRPC Applications”, 9th High-Performance Grid and Cloud Computing Workshop (HPGC 2012), Shanghai, China, IEEE Computer Society, May 21, 2012 T. Brady, O. Girko, A. Lastovetsky, “Smart RPC In Grids And Clouds” in “Large Scale Network-Centric Computing Systems”, John Wiley & Sons, to be published in 2012

Questions?