Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)

Slides:

Advertisements

Similar presentations

Simulation - An Introduction Simulation:- The technique of imitating the behaviour of some situation or system (economic, military, mechanical, etc.) by.

Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Chapter 13 Review Questions

The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.

Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)

Resource Management of Grid Computing

CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.

Future Work Needed Kenneth Wade Najim Yaqubie. Outline 1.Model is simple 2.Too many assumptions 3.Conflicting internal architectures 4.Security Challenges.

6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.

The Architecture Design Process

Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.

OCCF – The Realtime Grid. 1 Characteristics of Current Grid Computing Static data sets - Generally from fixed length experiments - Statistical measurements.

Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.

Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.

NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.

Network Design and Implementation IACT 418/918 Autumn 2005 Gene Awyzio SITACS University of Wollongong.

Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.

GRID COMPUTING & GRID SCHEDULERS - Neeraj Shah. Definition A ‘Grid’ is a collection of different machines where in all of them contribute any combination.

Tiered architectures 1 to N tiers. 2 An architectural history of computing 1 tier architecture – monolithic Information Systems – Presentation / frontend,

New Challenges in Cloud Datacenter Monitoring and Management

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.

Computer System Architectures Computer System Software

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.

SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.

What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.

Chapter 4 Realtime Widely Distributed Instrumention System.

Session-8 Data Management for Decision Support

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.

Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.

The Future of the iPlant Cyberinfrastructure: Coming Attractions.

OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.

1. Process Gather Input – Today Form Coherent Consensus – Next two months.

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

Extreme Scale: Programming. Participants Mary Hall Rob Ross Christine Sweeney Ian Foster Daniel Laney Lavanya Ramakrishnan Jim Ahrens John Wright Craig.

Welcome to CPS 210 Graduate Level Operating Systems –readings, discussions, and programming projects Systems Quals course –midterm and final exams Gateway.

Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Distributed Data for Science Workflows Data Architecture Progress Report December 2008.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.

Systems Analyst (Module V) Ashima Wadhwa. The Systems Analyst - A Key Resource Many organizations consider information systems and computer applications.

NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.

(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.

Cisco Consulting Services for Application-Centric Cloud Your Company Needs Fast IT Cisco Application-Centric Cloud Can Help.

© NALO Solutions Limited NALO Solutions, presents the – Revenue Collector App Using Mobile Phones to gather Revenue SOFTWARE ENGINEERING.

Windows Workflow Foundation Guy Burstein Senior Consultant Advantech – Microsoft Division

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

Scaling out and in with Azure SQL DB Elastic Scale DBA-203 Warner Chaves, MCM/MVP, SQLTurbo.com, Pythian.com.

Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

BIG DATA/ Hadoop Interview Questions.

Copyright © Univa Corporation, All Rights Reserved Using Containers for HPC Workloads HEPiX – Apr 21, 2016 Fritz Ferstl – CTO, Univa.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Introduction to Distributed Platforms

Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

Cloud Management Mechanisms

Grid Computing.

Cloud Management Mechanisms

SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691

Presentation transcript:

Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)

Overall Questions What is needed at the extreme scales? How will extreme scales affect how we: 1.Design workflow systems (our focus!) 2.Use workflow systems 3.Validate the results of workflow systems What are the key challenges? Are there challenges unique to IS & DA Are there challenges common to both

Our Charge/Questions What are the different high-level factors and design decisions that need to be considered and made? Different coupling models, the role of storage and buffering, supported data models and their definition, and adaptivity of the workflow are addressed.

Challenges Today – Wide-area WMS submitting to machines designed for interactive logins run into some barriers. (data movement too) – Security is part of the issue – solution designed today breaks when policy changes. Is this a technical issue or a policy issue? – Technology: WMS needs some facility for managing the time-limited nature of credentials, need human-in-the-loop, exception handling, etc. – Reasoning about robustness – cost of retry, diagnosing security failures, etc. Need for transparency across the levels of system – Storage management: to handle intermediate storage, must be able to allocate storage with a time limit, and deal with allocation failures. Ilkay: must double expected storage use in order to succeed on XSEDE. Can’t reserve / effectively allocate memory/storage hierarchy - oversubscription – Example use cases: Analysis code pulling remote data over the network at runtime. Need to “park” data temporarily between stages of application, perhaps on an external system. Applications are a mixture of supercomputer + database + low end applications.

Future Architectures Assumptions about the ecosystem: – Machines: Fat nodes, heterogeneous, storage on nodes, deep memory hierarchies, … – Outside the machine: SDNs, clouds, reserved experimental facilities, … Problems on todays machines are magnified on future systems – Metrics of success are moving away from FLOPS which effects how the WMS does its job – Coordination of policies across facilities: allocation, security, API, etc. – Need for uniform representation of workflows. – Supporting workflow composition. – Multiplicity of WMS implementations makes it difficult to share solutions to these problems. Planning, provisioning, and scheduling. – Experience: One big meta-scheduler not effective – Alternative: separate provisioning from scheduling. – Each resource scheduler needs autonomy, but also expose sufficient transparency and control. – Problem: Can one component effectively mix provisioning, planning, and scheduling, or do we separate? Performance, predictability, and such – Want predictable performance of workflow operations – Reliable resource allocation or adaptability so that delays do not cascade. – Need benchmarks, models, mini-apps, simulations, to evaluate systems and implementations. Reproducibility portability, and integrity. – WMS is in a good place to track provenance and reproducibility. – But, need transparency from the other components to pull out relevant data. – And, the # of components and pace makes storing the provenance data itself a challenge to be managed. – Frank: Is the end user willing to pay the price for that benefit?

Data Management What we expect to see. – Global filesystem will be slow and unpredictable. – Total I/O capacity is limited. – Competition for intermediate storage between apps. What does this mean? – Need to provision storage and network bandwidth as a first-class concern, coupled with the allocation for FLOPS, IO, etc… – Need advanced mechanisms (e.g., data staging) within the machine for data sharing between running apps, not using the global filesystem. Involves naming, rendezvous, garbage collection… – To deal with unexpected events at runtime, we either need to overprovision or have the ability to re-provision at runtime. Example: If consumer of data is slow to start-up, then need to allocate more storage OR pause the producer. – Dantong: memory hierarchy will result in many different sharing mechanism, need some visibility to start the right ones.