 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Database Architectures and the Web
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
High Performance Computing Course Notes Grid Computing.
Parasol Architecture A mild case of scary asynchronous system stuff.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Distributed Systems Architectures
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
DISTRIBUTED COMPUTING
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 1.
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
N-Tier Architecture.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Chapter 9 Elements of Systems Design
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
1 소프트웨어공학 강좌 Chap 9. Distributed Systems Architectures - Architectural design for software that executes on more than one processor -
DISTRIBUTED COMPUTING
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
A Proposal of Application Failure Detection and Recovery in the Grid Marian Bubak 1,2, Tomasz Szepieniec 2, Marcin Radecki 2 1 Institute of Computer Science,
Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)
Apache Airavata (Incubating) Gateway to Grids & Clouds Suresh Marru Nov 10 th 2011.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
9 Systems Analysis and Design in a Changing World, Fourth Edition.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
SMS Software Distribution. Overview  Explaining How SMS Distributes Software  Managing Distribution Points  Configuring Software Distribution and the.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Distributed System Architectures Yonsei University 2 nd Semester, 2014 Woo-Cheol Kim.
1 Distributed Processing Chapter 1 : Introduction.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
Millions of Jobs or a few good solutions …. David Abramson Monash University MeSsAGE Lab X.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
9 Systems Analysis and Design in a Changing World, Fifth Edition.
Google Summer of Code Project Updates Jeff Kinnison, University of Notre Dame Pradyut Madhavaram, City University of New York.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Introduction to Distributed Platforms
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
N-Tier Architecture.
Large Distributed Systems
Grid Computing.
CSC 480 Software Engineering
Design pattern for cloud Application
Presentation transcript:

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015

What is Apache Airavata?  An open source software framework for executing and managing computational jobs and workflows.  Supports local cluster, supercomputers, national grids, academic and commercial clouds.

Architectural Goals  Loosely Coupled Components.  Scalability.  Fault Tolerance.  Experiment Recovery.  Reliable Job Monitoring.  Fault Handling.  Security.  Workflow Enactment.

Terminology  Task – Single unit of execution.  Job – Special task which submit a Job to a computer resource.  Process – Collection of tasks. One process per Application  Experiment – User submit an experiment to Apache Airavata.  Workflow – More than one application per experiment.

Relationship of Data Models

Loosely Coupled Components  Separation of Concerns - Each component has specific work to do.  AMQP based messaging provide inter component communications provides gateways a transparent white box view of Airavata inner happenings.  Easy to evolve with new technologies..  Eg: WS Messaging replaced with widely used RabbitMQ broker.

Airavata Component Architecture

Component Based Architecture(CBA) Pattern.  Reusable, Replaceable, Easy of development.  Airavata Components  API Server – Hide all component from User.  Orchestrator – Take Decisions and Selection.  Worker – Execute set of Tasks.  Registry - Data Catalog.  Workflow Engine – Workflow Enactment.

Scalability  Airavata worker capacity can be increased and decreased on demand to maintain performance and load spikes.  Workers scale horizontally.  Distribute jobs between workers using the internal work queue.

Fault Tolerance  To support long running jobs, it is important for the middleware to sustain network glitches and restarts the upgrades of the middleware services with maximum fault tolerance.  Airavata worker component which interacts with computational resource is fully fault tolerant.  Schedule or unscheduled component down time possible.  Airavata Components unlikely to be downed but VMs.  Ultrascan deployment instances up and running smoothly.

Experiment Recovery  Experiment recovery in Airavata internal.  Work queue based process submission.  Status update in checkpoints.  Avoid duplicate job submission to computational resource.

Reliable Job Monitoring  Polling job status by scheduler monitor commands doesn’t work always.  Some schedulers remove completed jobs aggressively  Too many SSH connections to compute resource.  What are the alternatives? UDP, Demon &  Schedulers send job notifications.

Fault Handling  Retry job submission in SSH connection issues.  Identify input and output data staging failures.  Verify job status on computational resources after successful job submission.  Failure jobs identified by notification and retrieve standard output and standard error.  Show useful error message to user on exceptions.

Security  Implemented in review and guidance by CTSC - Center for Trustworthy Scientific Cyberinfrastructure  Airavata API security with WSO2 IS.  Credential store manages all machine credentials.  SSH keys  SSH username & passwords.  Airavata provide user permission based on security role.  Super administrator  Administrator  User Common API for Clients Apache Airavata

Workflow Enactment  An experiment with more than one application is considered as a workflow in Airavata.  Airavata workflow interpreter manages dependency among applications and execute them.  Parallel execution of applications if possible.  Currently under development with new architectural changes. Compose Workflows Launch Workflows

e.g: Experiment Launch

Questions ?