Using Static Code Analysis to Improve Performance of GridRPC Applications Oleg Girko, Alexey Lastovetsky School of Computer Science & Informatics University.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

National Institute of Advanced Industrial Science and Technology Ninf-G - Core GridRPC Infrastructure Software OGF19 Yoshio Tanaka (AIST) On behalf.
Three types of remote process invocation
Practical techniques & Examples
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Remote Procedure Call Design issues Implementation RPC programming
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Implementing Remote Procedure Calls Andrew Birrell and Bruce Nelson Presented by Kai Cong.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
Task Scheduling and Distribution System Saeed Mahameed, Hani Ayoub Electrical Engineering Department, Technion – Israel Institute of Technology
Experiments with SmartGridSolve: Achieving Higher Performance by Improving the GridRPC Model Thomas Brady, Michele Guidolin, Alexey Lastovetsky Heterogeneous.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Workload Management Massimo Sgaravatto INFN Padova.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Speaker: Xin Zuo Heterogeneous Computing Laboratory (HCL) School of Computer Science and Informatics University College Dublin Ireland International Parallel.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
Secure Web Applications via Automatic Partitioning Stephen Chong, Jed Liu, Andrew C. Meyers, Xin Qi, K. Vikram, Lantian Zheng, Xin Zheng. Cornell University.
Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Conrad Benham Java Opcode and Runtime Data Analysis By: Conrad Benham Supervisor: Professor Arthur Sale.
Recent Advances in SmartGridSolve Oleg Girko School of Computer Science & Informatics University College Dublin.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
C Functions Three major differences between C and Java functions: –Functions are stand-alone entities, not part of objects they can be defined in a file.
Author(s) Politehnica University of Bucharest Automatic Control and Computers Faculty Computer Science Department Robocheck – Integrated Code Validation.
Chapter 1 Introduction Major Data Structures in Compiler
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Bin Xin, Patrick Eugster, Xiangyu Zhang Dept. of Computer Science Purdue University {xinb, peugster, Lightweight Task Graph Inference.
Chapter 1 Java Programming Review. Introduction Java is platform-independent, meaning that you can write a program once and run it anywhere. Java programs.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Distributed objects and remote invocation Pages
THE EYESWEB PLATFORM - GDE The EyesWeb XMI multimodal platform GDE 5 March 2015.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Distributed Computing & Embedded Systems Chapter 4: Remote Method Invocation Dr. Umair Ali Khan.
Algorithms in Programming Computer Science Principles LO
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
Workload Management Workpackage
TensorFlow– A system for large-scale machine learning
Week 3-4 Control flow (review) Function definition Program Structures
Code Optimization.
Map Reduce.
STF 454 TDL – Overview Last change:
Harry Xu University of California, Irvine & Microsoft Research
Implementation of Efficient Check-pointing and Restart on CPU - GPU
Coordination and computation over wireless networks
Software Programming J. Holvikivi 2014.
湖南大学-信息科学与工程学院-计算机与科学系
Machine-Independent Optimization
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Easytrieve 11.6 Compile-and-Go Processing
Compiler Code Optimizations
PROGRAM AT RUNTIME Subject code: CSCI-620
point when a program element is bound to a characteristic or property
Optimizing Compilers CISC 673 Spring 2009 Course Overview
Peter Oostema & Rajnish Aggarwal 6th March, 2019
Presentation transcript:

Using Static Code Analysis to Improve Performance of GridRPC Applications Oleg Girko, Alexey Lastovetsky School of Computer Science & Informatics University College Dublin Dublin, Ireland

GridRPC and collective mapping GridRPC limitations Individual mapping Client-server communication only No communication parallelism Collective mapping Improved balancing of computation load Reduced volume of communication Improved balancing of communication load Increased parallelism of communication Reduced client memory usage and paging Collective mapping requirements DAG of task depndencies Order of task execution

Runtime discovery (SmartGridSolve) grpc_map() { … } Discovery phase and execution phase Constraints on the code – No control flow dependent on remote task result – grpc_local() { … } for side effects

Runtime discovery: fault tolerance Error phase Re-mapping Re-execution from the beginning Additional constraints on code

ADL: algorithm definition language Separate algorithm definition Parameterised grpc_map() { … } Increased programming efforts – Need to write a separate algorithm definition – Need to keep ADL in sync with the algorithm – No check if ADL diverged from the algorithm

Static code analysis Non-intrusive Algorithm definition extracted by code analysis No limitations of other approaches – No restrictions on code – No separate algorithm description

Static code analysis: build process

 Source code is parsed by Clang to produce AST

Static code analysis: build process Source code is parsed by Clang to produce AST  Resulting ASTs from all files are analysed together

Static code analysis: build process Source code is parsed by Clang to produce AST Resulting ASTs from all files are analysed together  Files containing blocks for collective mapping are being preprocessed

Static code analysis: build process Source code is parsed by Clang to produce AST Resulting ASTs from all files are analysed together Files containing blocks for collective mapping are being preprocessed  Resulting C code is being compiled

Static code analysis: build process Source code is parsed by Clang to produce AST Resulting ASTs from all files are analysed together Files containing blocks for collective mapping are being preprocessed  Resulting C code is being compiled and linked

Static code analysis: build process Source code is parsed by Clang to produce AST Resulting ASTs from all files are analysed together Files containing blocks for collective mapping are being preprocessed Resulting C code is being compiled  Appliction performance model is built at runtime

Static code analysis: analysing AST

 Function calls are treated as inlined

Static code analysis: analysing AST Function calls are treated as inlined Control flow is approximated  Loops are unrolled

Static code analysis: analysing AST Function calls are treated as inlined Control flow is approximated – Loops are unrolled  Branches are treated as parallel execution

Static code analysis: analysing AST Function calls are treated as inlined Control flow is approximated – Loops are unrolled – Branches are treated as parallel execution  Non-scalar GridRPC call arguments are correlated, assinged ID

Static code analysis: analysing AST Function calls are treated as inlined Control flow is approximated – Loops are unrolled – Branches are treated as parallel execution Non-scalar GridRPC call arguments are correlated, assinged ID Extracted information:  Argument IDs => task dependency DAG

Static code analysis: analysing AST Function calls are treated as inlined Control flow is approximated – Loops are unrolled – Branches are treated as parallel execution Non-scalar GridRPC call arguments are correlated, assinged ID Extracted information: – Argument IDs => task dependency DAG  Task start and task wait events => order of task execution

Static code analysis: further development Covering more cases – Better detection of run-time parameters – Support for argstack and valist – Test case: HydroPad

Static code analysis: further development Fault tolerance – Not restarting from scratch – Keeping log of GridRPC calls and task dependencies – Restarting failed task and all task it depends on – Reducing likelihood of data loss

Conclusion Collective mapping Existing approches Runtime discovery ADL Static code analysis Further development

Questions?