SLA Management in AssessGrid Dominic Battré, TU Berlin.

Slides:



Advertisements
Similar presentations
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Advertisements

On the Use of Service Level Agreements in AssessGrid.
Service Level Agreement Based Scheduling Heuristics Rizos Sakellariou, Djamila Ouelhadj.
Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay.
WS-Agreement in AssessGrid James Padgett Collaborative Architectures and Performance Group
Requirements Specification and Management
© 2010 Bennett, McRobb and Farmer1 Use Case Description Supplementary material to support Bennett, McRobb and Farmer: Object Oriented Systems Analysis.
ITIL: Service Transition
Management of IT Environment (5) LS 2012/ Martin Sarnovský Department of Cybernetics and AI, FEI TU Košice ITIL:Service Design IT Services Management.
SPECIFYING AND MONITORING GUARANTEES IN COMMERCIAL GRIDS THROUGH SLA Sven Graupner Vijay MachirajuAad van Moorsel IEEE/ACM International Symposium on Clustering.
An Application of Dynamic Service Level Agreements in a Risk-Aware Grid Environment Sanaa Sharaf and Karim Djemame School of Computing University of Leeds.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Services and Operations in Polish NGI M. Radecki,
GridFTP: File Transfer Protocol in Grid Computing Networks
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Environmental Council of States Network Authentication and Authorization Services The Shared Security Component February 28, 2005.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Implementation/Acceptance Testing / 1 Implementation and Acceptance Testing Physical Implementation Criteria: 1. Data availability 2. Data reliability.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
S New Security Developments in DICOM Lawrence Tarbox, Ph.D Chair, DICOM WG 14 (Security) Siemens Corporate Research.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Transparent Cross-Border Migration of Parallel Multi Node Applications Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid.
TOSCA Monitoring Working Group Status Roger Dev June 17, 2015.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
Advanced Techniques for Scheduling, Reservation, and Access Management for Remote Laboratories Wolfgang Ziegler, Oliver Wäldrich Fraunhofer Institute SCAI.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
TOSCA Monitoring Reference Architecture Straw-man Roger Dev CA Technologies March 18, 2015 PRELIMINARY.
System.Security.Policy namespace Chinmay Lokesh.NET Security CS 795 Summer 2010.
Superscheduling and Resource Brokering Sven Groot ( )
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Legion - A Grid OS. Object Model Everything is object Core objects - processing resource– host object - stable storage - vault object - definition of.
AN SLA-BASED RESOURCE VIRTUALIZATION APPROACH FOR ON-DEMAND SERVICE PROVISION Gabor Kecskemeti MTA SZTAKI International Workshop on Virtualization Technologies.
Grid Operations Centre LCG SLAs and Site Audits Trevor Daniels, John Gordon GDB 8 Mar 2004.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
CS223: Software Engineering Lecture 2: Introduction to Software Engineering.
Service Proforma Middleware Workshop. Notes Please complete as much of this proforma as possible – it will help make the workshop more informative & productive.
1 Software Project Planning Software project planning encompasses five major activities –Estimation, scheduling, risk analysis, quality management planning,
An approach to Web services Management in OGSA environment By Shobhana Kirtane.
© 2006 Open Grid Forum HPC Job Delegation Best Practices Grid Scheduling Architecture Research Group (GSA-RG) May 26, 2009, Chapel Hill, NC, US.
David M. Kroenke and David J. Auer Database Processing Fundamentals, Design, and Implementation Appendix B: Getting Started in Systems Analysis and Design.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
INSERT PROJECT ACRONYM HERE BY EDITING THE MASTER SLIDE (VIEW / MASTER / SLIDE MASTER) Using WS-Agreement for Risk Management in the Grid European Commission.
© 2006 Open Grid Forum WS-Agreement Advance Reservation Profile Oliver Waeldrich OGF 26, 26. May, Chapel Hill.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Agreement-based Grid Service Management (OGSI-Agreement) Editors: K. Czajkowski (USC/ISI), A. Dan, J Rofrano (IBM), S. Tuecke, ANL M. Xu (Platform) Asit.
ITIL: Service Transition
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Systems Analysis and Design
Fundamental Test Process
On the Use of Service Level Agreements in AssessGrid
Technical Outreach Expert
Presentation transcript:

SLA Management in AssessGrid Dominic Battré, TU Berlin

Dominic Battré – SLA Management in AssessGrid 2 AssessGrid in a Nutshell Requirement for Service Level Agreements from users Reluctance to sign SLAs by providers

Dominic Battré – SLA Management in AssessGrid 3 AssessGrid in a Nutshell

Dominic Battré – SLA Management in AssessGrid 4 AssessGrid in a Nutshell

Dominic Battré – SLA Management in AssessGrid 5

6 AssessGrid in a Nutshell

Dominic Battré – SLA Management in AssessGrid 7 AssessGrid in a Nutshell TeraGrid Grid 3 DAS-2 … * statistics from 2005/2006! failed jobs succ. jobs

Dominic Battré – SLA Management in AssessGrid 8 AssessGrid in a Nutshell

Dominic Battré – SLA Management in AssessGrid 9 AssessGrid in a Nutshell User: - Which provider is reliable? - How reliable is a provider? - Does a provider lie? Provider: - How reliable am I? - Can I sign SLAs? - Can I improve my reliability?

Dominic Battré – SLA Management in AssessGrid 10 Agenda AssessGrid in a Nutshell Content of SLAs Demo - Job submission and provider selection - Fault Tolerance Underlying technology - Negotiation Manager - Risk Assessment and Management Content of SLAs as WS-Agreement Future Challenges

Dominic Battré – SLA Management in AssessGrid 11 Content of SLAs time Job 1 Job 2 Job 3 Job 5 Job 4 Job 7 Job 6 nodes Job 1 Each job specified with Job 1 runtime nr. nodes Earliest start timeLatest finish time Schedule Participating parties Job Definition - Scheduling - Executable - File Staging - Acceptable Probability of Failure Price and penalty

Dominic Battré – SLA Management in AssessGrid 12

Dominic Battré – SLA Management in AssessGrid 13 Job Submission and Provider Selection Specify Job End-UserBrokerProviders Program, Input, Output Acceptable PoF Penalty in case of failure Deadline

Dominic Battré – SLA Management in AssessGrid 14 Job Submission and Provider Selection Get Quotes End-UserBrokerProviders

Dominic Battré – SLA Management in AssessGrid 15 Job Submission and Provider Selection Get Quotes End-UserBrokerProviders Forwarding based on Matching of templates to request Quotes created in the past Performance in the past

Dominic Battré – SLA Management in AssessGrid 16 Job Submission and Provider Selection Generate Quotes End-UserBrokerProviders Calculate Probability of Failure (PoF) Calculate required number of spare nodes, extra time Calculate price Check available resources in schedule

Dominic Battré – SLA Management in AssessGrid 17 Job Submission and Provider Selection Quotes End-UserBrokerProviders

Dominic Battré – SLA Management in AssessGrid 18 Job Submission and Provider Selection Enhance Quotes End-UserBrokerProviders Own estimation of PoF in case of unreliable providers Perform ranking respecting user’s desire

Dominic Battré – SLA Management in AssessGrid 19 Job Submission and Provider Selection Quotes End-UserBrokerProviders

Dominic Battré – SLA Management in AssessGrid 20 Job Submission and Provider Selection Select Provider End-UserBrokerProviders Criteria: Price, PoF, Adjusted PoF AHP-Ranking

Dominic Battré – SLA Management in AssessGrid 21 Job Submission and Provider Selection Get Reputation End-UserBrokerProviders

Dominic Battré – SLA Management in AssessGrid 22 DS Analytical Hierarchy Process Past Performance Maintenance Security Customer Support Infrastructure Experience Staff 24/7 Staff training/yr Staff experience Red. Power Red. Storage Storage Age … Maintenance Infrastructure

Dominic Battré – SLA Management in AssessGrid 23 Job Submission and Provider Selection End-UserBrokerProviders Create Agreement

Dominic Battré – SLA Management in AssessGrid 24

Dominic Battré – SLA Management in AssessGrid 25 Demonstration of Fault Tolerance

Dominic Battré – SLA Management in AssessGrid 26

Dominic Battré – SLA Management in AssessGrid 27 Negotiation Manager Globus Toolkit 4 Apache 2 License 2 Flavours - Simple Framework - AssessGrid Implementation (OpenCCS, Risk Assessment, …) Features - Template Store - Access Control, Credential Delegation - State Management - Staging by GridFTP - Simple Validation of CreationConstraints - Extensible - WS-Notification - Optional: Quote Mechanism - Optional: Cheap Cancellation Extension

Dominic Battré – SLA Management in AssessGrid 28 Template Store Optional component Templates stored persistently in RDBMS Get, Insert, Delete by WS-RF Monitoring by WS-Notification Access policies: - Everybody can read - Admin(s) can modify Templates used in AssessGrid - Regular Job (POSIX and SPMD) - Out-sourced Job with checkpoint data-set

Dominic Battré – SLA Management in AssessGrid 29 Access Control Default: - 3 User Groups Admins, Owners, Users - Admin has access to anything - Owner is legally responsible - Users have read access - Owner and Users are different in case of SLA outsourcing Overwriteable Option to delegate credentials

Dominic Battré – SLA Management in AssessGrid 30 State Management Asynchronous, multi-threaded, persistent state management Wait for stage-in StartDo stage-in Wait for execution Stage-in done Wait for termination Do stage-out Stage-out done Cleanup

Dominic Battré – SLA Management in AssessGrid 31 File-staging Files specified by JSDL User delegates credentials User estimates duration - Shorter duration triggers earlier execution - Longer duration triggers later execution Staging by GridFTP

Dominic Battré – SLA Management in AssessGrid 32 CreationConstraints Difficult to support Namespaces: //wsag:…/assessgrid:… - prefixes are just strings Very difficult to support structural information xs:group, xs:all, xs:choice, xs:sequence Possible but difficult to support xs:restriction xs:simple - Check for enumeration (xs:restriction of xs:string) - Check for valid dates (xs:restriction of xs:date) - Everything else close to impossible {min,max}{In,Ex}clusive totalDigits, fractionDigits, length, … probably useless Context Terms Creation Constraints

Dominic Battré – SLA Management in AssessGrid 33 Optional Quote Mechanism UserProvider Get Template Fill Template Create Agreement Yes / Nobound Create Quote modify

Dominic Battré – SLA Management in AssessGrid 34 Extensible Not: Black Box Interface Domain specific Implementation deployed But: WSDL Domain specific Implementation NegMgr WSDL deployed

Dominic Battré – SLA Management in AssessGrid 35 Cancellation Policy Motivation: - Serious issues of 3-way commit protocol (reservations) Goal: Cheap Cancellation Policy - “Full refund if product bought online is returned online within 14 days” (German law) - “Cancellation before first day of validity: 15 EUR, after that: not possible” (Deutsche Bahn) - “less than 24 hours before scheduled stay: 50% of first day for cancellation” (hotels)

Dominic Battré – SLA Management in AssessGrid 36 Cancellation Policy - Rules Ends of periods: Price: createQuotecreateAgreementEarliest Start +5min-1d 1 EUR - 80%

Dominic Battré – SLA Management in AssessGrid 37 Cancellation Policy - Combination createQuotecreateAgreementEarliest Start +5min -1d Full price -50% 0.50 EUR time price Used in Broker for roll-back of unsuccessful workflow mappings

Dominic Battré – SLA Management in AssessGrid 38 Context … /C=DE/O=… … DN … Context Terms Creation Constraints

Dominic Battré – SLA Management in AssessGrid 39 Terms, SDTs Conjunction of terms - Common structure of templates - WS-AG too powerful/difficult to fully support Service Description Term (one) - assessgrid:ServiceDescription (extension of abstract ServiceTermType) jsdl:POSIXExecutable / SPMD (executable, arguments, environment) jsdl:Resources jsdl:DataStaging * assessgrid:PoF (upper bound) Context Terms Creation Constraints

Dominic Battré – SLA Management in AssessGrid 40 Terms, GuaranteeTerms No hierarchy but two meta guarantees - ProviderFulfillsAllObligations e.g. Reward: 1000 EUR, Penalty 1000 EUR - ConsumerFulfillsAllObligations e.g. Reward: 0 EUR, Penalty 1000 EUR First violation is responsible for failure No hardware problem, then User fault Other Guarantees - Execution Time Any start time (best effort) Exact start time Earliest start time, latest finish time - Maximum StageIn/Out time - No Cancellation Context Terms Creation Constraints No timely execution No stage-out

Dominic Battré – SLA Management in AssessGrid 41 Stuff I did not talk about Risk Assessment Risk Management - Checkpointing details, runtime extension, spare nodes, … Confidence and Reputation Service Workflows - Description in WS-Agreement - Mapping to individual SLAs Simulation tools

Dominic Battré – SLA Management in AssessGrid 42 Future Challenges Failure detection and analysis (Re)negotiation Risk Assessment Interoperability of WS-Agreement implementations by micro-specs – or even common template structures Automatic evaluation of CreationConstraints Posthumous resolving of disagreements Third party blaming Persisting Problems - Dependencies of violated guarantees - Violation caused by third party or unknown cause - Failure/success of entire SLA

Dominic Battré – SLA Management in AssessGrid 43