Managing Grid and Web Services and their exchanged messages OGF19 Workshop on Reliability and Robustness Friday Center Chapel Hill NC January 31 2007 Authors.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Chapter 19: Network Management Business Data Communications, 5e.
1 A Scalable Approach for the Secure and Authorized Tracking of the Availability of Entities in Distributed Systems Shrideep Pallickara, Jaliya Ekanayake.
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter 14 Upgrading to Exchange Server 2003.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Extensible Networking Platform IWAN 2005 Extensible Network Configuration and Communication Framework Todd Sproull and John Lockwood
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
CSC-8530: Distributed Systems Christopher Salembier 28-Oct-2009.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Principles for Collaboration Systems Geoffrey Fox Community Grids Laboratory Indiana University Bloomington IN 47404
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
1 Modeling Stateful Resources with Web Services ICE Ph.D lecture Byung-sang Kim.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Cardea Requirements, Authorization Model, Standards and Approach Globus World Security Workshop January 23, 2004 Rebekah Lepro Metz
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Managing Grid and Web Services and their exchanged messages Authors Harshawardhan Gadgil (his PhD topic), Geoffrey Fox, Shrideep Pallickara, Marlon Pierce.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
June 25 th PDPTA Incorporating an XML Matching Engine into Distributed Brokering Systems.
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
DISTRIBUTED COMPUTING
Computer Science and Engineering 1 Service-Oriented Architecture Security 2.
A Portal Based Approach to Viewing Aggregated Network Performance Data in Distributed Brokering Systems By Gurhan Gunduz, Shrideep Pallickara, Geoffrey.
Cluster Reliability Project ISIS Vanderbilt University.
McGraw-Hill/Irwin © The McGraw-Hill Companies, All Rights Reserved BUSINESS PLUG-IN B17 Organizational Architecture Trends.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
International Telecommunication Union Geneva, 9(pm)-10 February 2009 ITU-T Security Standardization on Mobile Web Services Lee, Jae Seung Special Fellow,
Architecting Web Services Unit – II – PART - III.
A Transport Framework for Distributed Brokering Systems Shrideep Pallickara, Geoffrey Fox, John Yin, Gurhan Gunduz, Hongbin Liu, Ahmet Uyar, Mustafa Varank.
XMPP Concrete Implementation Updates: 1. Why XMPP 2 »XMPP protocol provides capabilities that allows realization of the NHIN Direct. Simple – Built on.
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Locating Mobile Agents in Distributed Computing Environment.
Scalable, Fault-tolerant Management of Grid Services: Application to Messaging Middleware Harshawardhan Gadgil Ph.D. Defense Exam.
Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols ► Acts as denial of service by disrupting the flow of data between a source and.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Chapter 5 McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
A Collaborative Framework for Scientific Data Analysis and Visualization Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox Department of Computer.
HPSearch for Managing Distributed Services Authors Harshawardhan Gadgil, Geoffrey Fox, Shrideep Pallickara Community Grids Lab Indiana University, Bloomington.
Scalable, Fault-tolerant Management of Grid Services Harshawardhan Gadgil, Geoffrey Fox, Shrideep Pallickara, Marlon Pierce Presented By Harshawardhan.
Geo-distributed Messaging with RabbitMQ
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
A Scalable Virtual Registry Service for jGMA Matthew Grove DSG Seminar 3 rd May 2005.
Scalable, Fault-tolerant Management of Grid Services: Application to Messaging Middleware Harshawardhan Gadgil Ph.D. Defense Exam.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Introduction to Active Directory
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Scalable, Fault-tolerant Management of Grid Services: Application to Messaging Middleware Harshawardhan Gadgil Advisor: Prof. Geoffrey Fox Ph.D. Defense.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
Distributed Handler Architecture (DHArch) Beytullah Yildiz Advisor: Prof. Geoffrey C. Fox.
Scaling and Fault Tolerance for Distributed Messages in a Service and Streaming Architecture Hasan Bulut Advisor: Prof. Geoffrey Fox Ph.D. Defense Exam.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
A service Oriented Architecture & Web Service Technology.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Scripting based architecture for Management of Streams and Services in Real-time Grid Applications Authors Harshawardhan Gadgil, Geoffrey Fox, Shrideep.
Self Healing and Dynamic Construction Framework:
CHAPTER 3 Architectures for Distributed Systems
#01 Client/Server Computing
Replication Middleware for Cloud Based Storage Service
A Framework for Secure End-to-End Delivery of Messages in Publish/Subscribe Systems Shrideep Pallickara, Marlon Pierce, Harshawardhan Gadgil, Geoffrey.
Grid Systems: What do we need from web service standards?
#01 Client/Server Computing
Presentation transcript:

Managing Grid and Web Services and their exchanged messages OGF19 Workshop on Reliability and Robustness Friday Center Chapel Hill NC January Authors Harshawardhan Gadgil (his PhD topic), Geoffrey Fox, Shrideep Pallickara, Marlon Pierce Community Grids Lab, Indiana University Presented by Geoffrey Fox

2 Management Problem I Characteristics of today’s (Grid) applications Characteristics of today’s (Grid) applications –Increasing complexity –Components widely dispersed and disparate in nature and access  Span different administrative domains  Under differing network / security policies  Limited access to resources due to presence of firewalls, NATs etc… (major focus in prototype) –Dynamic  Components (Nodes, network, processes) may fail Services must meet Services must meet –General QoS and Life-cycle features –(User defined) Application specific criteria Need to “manage” services to provide these capabilities Need to “manage” services to provide these capabilities –Dynamic monitoring and recovery –Static configuration and composition of systems from subsystems

3 Management Problem II Management Operations* include Management Operations* include –Configuration and Lifecycle operations (CREATE, DELETE) –Handle RUNTIME events –Monitor status and performance –Maintain system state (according to user defined criteria) Protocols like WS-Management/WS-DM define inter-service negotiation and how to transfer metadata Protocols like WS-Management/WS-DM define inter-service negotiation and how to transfer metadata We are designing/prototyping a system that will manage a general world wide collection of services and their network links We are designing/prototyping a system that will manage a general world wide collection of services and their network links –Need to address Fault Tolerance, Scalability, Performance, Interoperability, Generality, Usability We are starting with our messaging infrastructure as We are starting with our messaging infrastructure as –we need this to be robust in Grids we are using it in (Sensor and material science) –we are using it in management system –and it has critical network requirements * * From WS – Distributed Management

4 Core Features of Management Architecture Remote Management Remote Management –Allow management irrespective of the location of the resource (as long as that resource is reachable via some means) Traverse firewalls and NATs Traverse firewalls and NATs –Firewalls complicate management by disabling access to some transports and access to internal resources –Utilize tunneling capabilities and multi-protocol support of messaging infrastructure Extensible Extensible –Management capabilities evolve with time. We use a service oriented architecture to provide extensibility and interoperability Scalable Scalable –Management architecture should be scale as number of managees increases Fault-tolerant Fault-tolerant –Management itself must be fault-tolerant. Failure of transports OR management components should not cause management architecture to fail.

Management Architecture built in terms of Hierarchical Bootstrap System – Robust itself by Replication managees in different domains can be managed with separate policies for each domain Periodically spawns a System Health Check that ensures components are up and running Registry for metadata (distributed database) – Robust by standard database techniques and our system itself for Service Interfaces Stores managee specific information (User-defined configuration / policies, external state required to properly manage a managee) Generates a unique ID per instance of registered component

Architecture: Scalability: Hierarchical distribution Replicated ROOT US EUROPE FSUCARDIFFCGL Active Bootstrap Nodes /ROOT/EUROPE/CARDIFF Responsible for maintaining a working set of management components in the domain Always the leaf nodes in the hierarchy Passive Bootstrap Nodes Only ensure that all child bootstrap nodes are always up and running Spawns if not present and ensure up and running …

Messaging Nodes form a scalable messaging substrate Message delivery between managers and managees Provides transport protocol independent messaging between distributed entities Can provide Secure delivery of messages Managers – Active stateless agents that manage managees. Both general and managee specific management threads performs actual management Multi-threaded to improve scalability with many managees Managees – what you are managing (managee / service to manage) – Our system makes robust There is NO assumption that Managed system uses Messaging nodes Wrapped by a Service Adapter which provides a Web Service interface Assumed that ONLY modest state needed to be stored/restored externally. Managee could front end and restore itself a huge database Management Architecture built in terms of

Architecture: Conceptual Idea (Internals) Connect to Messaging Node for sending and receiving messages User writes system configuration to registry Manager processes periodically checks available managees to manage. Also Read/Write managee specific external state from/to registry Always ensure up and running Periodically Spawn WS Management

Architecture: User Component “Managee Characteristics” are determined by the user. Events generated by the Managees are handled by the manager Event processing is determined by via WS- Policy constructs E.g. Wait for user’s decision on handling specific conditions “Auto Instantiate” a failed service but service responsible for doing this consistently even when failed service not failed but just unreachable Administrators can set up services (managees) by defining characteristics Writing information to registry can be used to start up a set of services

Issues in the distributed system Consistency Examples of inconsistent behavior Two or more managers managing the same managee Old messages / requests reaching after new requests Multiple copies of managees existing at the same time / Orphan managees leading to inconsistent system state Use a Registry generated monotonically increasing Unique Instance ID (IID) to distinguish between new and old instances of Managers, Managees and Messages Requests from manager thread A are considered obsolete IF IID(A) < IID(B) Service Adapter stores the last known MessageID (IID:seqNo) allowing it to differentiate between duplicates AND obsolete messages Periodic renewal with registry IF IID(manageeInstance_1) < IID(manageeInstance_2) THEN manageeInstance_1 was deemed OBSOLETE SO EXECUTE Policy (E.g. Instruct manageeInstance_1 to silently shutdown)

Issues in the distributed system Security Security – Provide secure communication between communicating parties (e.g. Manager Managee) Publish/Subscribe:- Provenance, Lifetime, Unique Topics Secure Discovery of endpoints Prevent unauthorized users from accessing the Managers or Managees Prevent malicious users from modifying message (Thus message interactions are secure when passing through insecure intermediaries) Utilize NaradaBrokering’s Topic Creation and Discovery * and Security Scheme # * NB-Topic Creation and Discovery (Grid2005) # NB-Security (Grid2006)

Implemented: WS – Specifications WS – Management (June 2005) parts (WS – Transfer [Sep 2004], WS – Enumeration [Sep 2004] and WS – Eventing) (could use WS- DM) WS – Eventing (Leveraged from the WS – Eventing capability implemented in OMII) WS – Addressing [Aug 2004] and SOAP v 1.2 used (needed for WS- Management) Used XmlBeans for manipulating XML in custom container. Currently implemented using JDK but will switch to JDK1.5 Released on in February 2007http://

Performance Evaluation Results Extreme case with many catastrophic failures Response time increases with increasing number of concurrent requests Response time is MANAGEE-DEPENDENT and the shown times are typical MAY involve 1 or more Registry access which will increase overall response time Increases rapidly as no. of Managees > (150 – 200) managees

Performance Evaluation How much infrastructure is required to manage N managees ? N = Number of managees to manage M = Max. no. of entities connected to a single messaging node D = Max. no of managees managed by a single manager process R = min. no. of registry service instances required to provide fault- tolerance Assume every leaf domain has 1 messaging node. Hence we have N/M leaf domains. Further, No. of managers required per leaf domain is M/D Total Components in lowest level = (R registry + 1 Bootstrap Service + 1 Messaging Node + M/D Managers) * (N/M such leaf domains) = (2 + R + M/D) * (N/M) Thus percentage of additional infrastructure is = [(2 +R)/M + 1/D] * 100 %

Performance Evaluation Research Question: How much infrastructure is required to manage N managees ? Additional infrastructure = [(2 +R)/M + 1/D] * 100 % A Few Cases Typical values of D and M are 200 and 800 and assuming R = 4, then Additional Infrastructure = [(2+4)/ /200] * 100 % ≈ 1.2 % Shared Registry => there is one registry interface per domain, R = 1, then Additional Infrastructure = [(2+1)/ /200] * 100 % ≈ 0.87 % If NO messaging node is used (assume D = 200), then Additional Infrastructure = [(R registry + 1 bootstrap node + N/D managers)/N] * 100 % = [(1+R)/N + 1/D] * 100 % ≈ 100/D % (for N >> R) ≈ 0.5%

Performance Evaluation Research Question: How much infrastructure is required to manage N managees ?

Performance Evaluation XML Processing Overhead XML Processing overhead is measured as the total marshalling and un-marshalling time required. In case of Broker Management interactions, typical processing time (includes validation against schema) ≈ 5 ms Broker Management operations invoked only during initialization and failure from recovery Reading Broker State using a GET operation involves 5ms overhead and is invoked periodically (E.g. every 1 minute, depending on policy) Further, for most operation dealing with changing broker state, actual operation processing time >> 5ms and hence the XML overhead of 5 ms is acceptable.

Prototype: Managing Grid Messaging Middleware We illustrate the architecture by managing the distributed messaging middleware: NaradaBrokering This example motivated by the presence of large number of dynamic peers (brokers) that need configuration and deployment in specific topologies Runtime metrics provide dynamic hints on improving routing which leads to redeployment of messaging system (possibly) using a different configuration and topology Can use (dynamically) optimized protocols (UDP v TCP v Parallel TCP) and go through firewalls Broker Service Adapter Note NB illustrates an electronic entity that didn’t start off with an administrative Service interface So add wrapper over the basic NB BrokerNode object that provides WS – Management front-end Allows CREATION, CONFIGURATION and MODIFICATION of broker and broker topologies

Messaging (NaradaBrokering) Architecture 19

June 19, 2006 Community Grids Lab, Bloomington IN :CLADE 2006: 20 Typical use of Grid Messaging in NASA Datamining Grid Sensor Grid implementing using NB NB GIS Grid

21 NaradaBrokering Management Needs NaradaBrokering Distributed Messaging System consists of peers (brokers) that collectively form a scalable messaging substrate. Optimizations and configurations include: NaradaBrokering Distributed Messaging System consists of peers (brokers) that collectively form a scalable messaging substrate. Optimizations and configurations include: –Where should brokers be placed and how should they be connected, E.g. RING, BUS, TREE, HYPERCUBE etc…, each TOPOLOGY has varying degree of resource utilization, routing, cost and fault-tolerance characteristics. Static topologies or topologies created using static rules may be inefficient in some cases Static topologies or topologies created using static rules may be inefficient in some cases –E.g., In CAN, Chord a new incoming peer randomly joins nodes in the network. Network distances are not taken into account and hence some lookup queries may span entire diameter of network –Runtime metrics provide dynamic hints on improving routing which leads to redeployment of messaging system (possibly) using a different configuration and topology –Can use (dynamically) optimized protocols (UDP v TCP v Parallel TCP) and go through firewalls but no good way to make choices dynamically These actions collectively termed as Managing the Messaging Middleware These actions collectively termed as Managing the Messaging Middleware

Prototype: Costs (Individual Managees are NaradaBrokering Brokers) Operation Time (msec) (average values) Un-Initialized (First time) Initialized (Later modifications) Set Configuration 778 ± 533 ± 3 Create Broker 610 ± 657 ± 2 Create Link 160 ± 227 ± 2 Delete Link 104 ± 220 ± 1 Delete Broker 142 ± 1129 ± 2

Recovery: Typical Time Topology Number of managee specific Configuration Entries Recovery Time = T(Read State From Registry) + T(Bring managee up to speed) = T(Read State) + T[SetConfig + Create Broker + CreateLink(s)] Ring N nodes, N links (1 outgoing link per Node) 2 managee Objects Per node 10 + ( )  1548 msec Cluster N nodes, Links per broker vary from 0 – 3 1 – 4 managee Objects per node Min:  1393 msec Max: *  1622 msec Assuming 5ms Read time from registry per managee object

Prototype: Observed Recovery Cost per managee OperationAverage (msec) *Spawn Process 2362 ± 18 Read State 8 ± 1 Restore (1 Broker + 1 Link) 1421 ± 9 Restore (1 Broker + 3 Link) 1616 ± 82 Time for Create Broker depends on the number & type of transports opened by the broker E.g. SSL transport requires negotiation of keys and would require more time than simply establishing a TCP connection If brokers connect to other brokers, the destination broker MUST be ready to accept connections, else topology recovery takes more time.

Management Console: Creating Nodes and Setting Properties

Management Console: Creating Links

Management Console: Policies

Management Console: Creating Topologies

Conclusion We have presented a scalable, fault-tolerant management framework that Adds acceptable cost in terms of extra resources required (about 1%) Provides a general framework for management of distributed entities Is compatible with existing Web Service specifications We have applied our framework to manage Managees that are loosely coupled and have modest external state (important to improve scalability of management process) Outside effort is developing a Grid Builder which combines BPEL and this management system to manage initial specification, composition, and execution of Grids of Grids (of Services)