Managing Dynamic Metadata and Context Mehmet S. Aktas
2 of 29 Outline Introduction Problem Statement, Hypothesis, Design Goals Literature Survey Research Issues Milestones Contributions Summary
3 of 29 Context Def: "Context is any information that can be used to characterize the situation of an entity, where an entity can be a person, place, or computational object.“ Dey A. et al, 1999 Context is metadata associated to both services and their activities Context can be independent of any interaction static context Examples: type or endpoint of a service, less likely to change dynamic context Examples: throughput of a service, likely to change over time generated as result of interaction information associated to an activity or session Examples: session-id, URI of the coordinator of a session
4 of 29 Gaggle of Services Gaggle of Services are set of actively collaborating managed services put together for a particular functionality, such as collaboration, visualization or sensor Grid collaborate for a particular common goal Example: emergence preparedness and response are actively generate events as result of interactions are very small part of the whole Grid
5 of 29 Motivation Current Grid Information Services provide information describing services independent of their interactions. We need management of all information associated with services for; correlating activities of widely distributed services workflow-style, SOA based applications management of events especially in multimedia collaboration distributed session management for instance; audio, video, audio/video meetings in Chinese Olympics
6 of 29 Motivation II More reasons for management of Context enabling uniform query capabilities to both dialog or monolog context information “Give me list of services satisfying C:{a,b,c..} QoS requirements and participating S:{x,y,z..} sessions” enabling real-time replay/playback capabilities in collaboration based sessions enabling session failure recovery
7 of 29 Application Use Domain Multimedia Collaboration domain: Global MMCS multiple A/V services talk to various collaboration clients and services defines a general session collaboration protocol (XGSP) XSGP enables different collaboration tools to talk to each other e.g. AccessGrid, H.323 needs a distributed session management systems Characteristics of the domain widely distributed services metadata of events (archival data) mostly read-only persistent, but lifetime is bounded to lifetime of events
8 of 29 Application Use Domain - II Workflow-style distributed application: Geographic Information System Grid sensor grid data services generates events when a certain magnitude event occurs firing off various codes, filtering, analyzing raw data, generating images, maps needs a distributed context management to correlate workflow activities Characteristics of domain any number of widely distributed services can be involved conversation metadata transient multiple writers
9 of 29 1 WMS GUIWFS HP Search Data Filter PI Code Data Filter Context Information Service 2 5,6, , session profile information related WMS user profile shared data for HPSearch activity activity shared state <soap:Header encodingStyle=“WSCTX URL" mustUnderstand="true"> http.. <activity-list mustUnderstand="true" mustPropagate="true"> SOAP header for Context 1.session associated dynamic metadata 2.user profile 3.activity associated dynamic metadata 4.service associated dynamically generated metadata What are the examples of dynamically generated metadata in a real-life example? 3,4: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id 5,6,7: HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result 8: HPSearch writes the URI of the of the output file into Context 9: WMS polls the information from Context Service 10: WMS retrieves the generated output file by workflow script and generates a map HPSearch associated additional data generated during execution of workflow. service associated
10 of 29 Problem Statement What is a novel process of building Information Services, maintaining dynamic session-related metadata of widely distributed services, providing uniform interface to both interaction-independent and conversation-based context?
11 of 29 Hypothesis A fault-tolerant, high performance, scalable information system maintaining widely distributed dynamically generated metadata for Gaggle of Services providing uniform interface to context information utilization of existing Grid Information Services for interaction-independent context to improve search capabilities enabling coordination of widely distributed services in Gaggles workflow-style Grid applications enabling distributed event management and various capabilities for A/V conferencing applications discovery of entities in a session enabling playback/replay capabilities, enabling session failure recovery
12 of 29 Architectural Design Goals Key Design Goals of our Design scalability with respect to # widely distributed services performance high responsiveness, reduced access latency fault tolerance high availability of information robust to replica crashes flexibility accommodate broad range of application domains read-dominated, read/write dominated
13 of 29 Literature Survey Main Stream Grid Information Services MDS, R-GMA, UDDI (Grimories) Specifications for stateful service interactions WS-CAF, WSRF, WS-Metadata Exchange Linda TupleSpaces coordination model
14 of 29 MDS4-(GT4)R-GMA (European Data Grid) Grimories UDDI Extension (myGrid) Functionalitymonitoring and discoveryperformance monitoring, information registry and discovery of services and workflows Componentsaggregator services, information sources registry, producersregistry Provided dataapplication-oriented resource-oriented, stateful interaction data application-oriented, resource-oriented application- oriented Distribution, Organizational Structure decentralized, hierarchicaldecentralized, hierarchical, peer-to-peer centralized Main Stream Grid Information Services
15 of 29 Limitations in Grid Information Services Lack of support for session related dynamic metadata MDS4 adopts WSRF approach which does not scale managing activities of multiple services sharing same state Lack of support for advanced query capabilities ex: “Give me list of WFS services participating “fault displacement calculations” workflow session where the service connected by a network path over 2MB/sec of bandwidth with max 100 msec of latency.”
16 of 29 WS-CAF WS-Context - Key Concepts WS Composite Application Framework (WS-CAF) WS-Context, WS-Coordination, WS-Transaction Mngmt. WS Context defines context, context service and mapping on SOAP shared data to correlate service activities context information dependent on the type of the activity transactional activity: the URI of the coordinator in a session context service maintains associated context participants of an activity register with context service for lifecycle of that activity
17 of 29 Web Service Resource Framework Key Concepts defines standard interfaces and behaviors for distributed system integration standard XML-based information model standard interfaces for push and pull mode access to service data enables every service to expose state data for query, update monitoring shared state models resource state as private to a service supports resource oriented approach for stateful interactions requires the identity of the resource to be passed in the SOAP message
18 of 29 WS-Metadata Exchange Key Concepts WS Metadata is key to interactions WS-Policy: capabilities, requirements, general characteristics of services WSDL: describes message operations, supported network protocols used by services WS-Metadata Exchange provides mechanism for sharing information about the capabilities of individual Web services allows querying a WS Endpoint to retrieve metadata about what to know to interact with them defines request/response message pairs to retrieve WS metadata
19 of 29 Limitations in Specifications for Service Communication WSRF does not actually accomplish state management by just enabling access and update rights heterogeneous service environment workflow-style applications WSRF, WS-Metadata Exchange models service metadata private to a service does not scale in managing activities of multiple services WS-Metadata Exchange defines only how to access interaction-independent metadata WS-Context is promising it has limitations simple framework for context management limited query capability does not address distributed management aspects of context metadata
20 of 29 TupleSpaces Paradigm a communication paradigm space-based asynchronous communication first described in Linda project in 1982 at Yale pioneered by David Gelernter Linda is a coordination language using primitive operations on shared data in shared space data-centric coordination model communication units are tuples data-structure consisting of one or more typed fields a TupleSpace is an intermediary container
21 of 29 JavaSpaces [Sun Microsystems] JavaSpaces is an object oriented strongly influenced by Linda model Java based, platform independent spaces are transactionally secure mutual exclusive access to objects spaces are persistent temporal, spatial uncoupling spaces are associative content based search limitations centralized inefficient reading/writing performance dependent on stack of different software layers
22 of 29 Research Issues Recap on key design goals: scalability, performance, fault tolerance research issues related replicating dynamic metadata deployment (dynamic vs. static replication) Where to place replicas of given context metadata? What are the properties of new location must meet? How to know if replica location stable? How can we provide tailored replication based on R/W properties?
23 of 29 Research Issues II consistency What is the appropriate consistency model? How do replicas exchange replica updates in what direction? How can we utilize an ordering capability based on NTP (Network Time Protocol) to provide consistency on the replicated context metadata? performance efficient metadata access How to choose a replica server to best serve client request? How to avoid performance degradation due to repetitive queries?
24 of 29 Research Issues III scalability load balancing strategies How to manage load balancing? other research issues replay/playback capabilities How to enable real-time replay/playback capabilities? session recovery How to enable session recovery? uniform interface to context How to provide a uniform interface to context?
25 of 29 Milestones Implementation of TupleSpaces paradigm Uniform Update and Query (search, discovery) Services Sequencer Service ensures that an order is imposed on actions/events that take place in a session
26 of 29 Milestones II Storage (Replication) Service decide # and placement of replicas enable autonomous behavior support robust behavior for replica crashes Access (Request Distribution) Service distribute request among object replicas Expeditor Service generalized caching mechanism reduce storage access due to repetitive queries
27 of 29 Evaluation of Hypothesis Qualitative evaluation Does the system delivers what it promises in terms of functionality? Example test domains: Geographical Information System Grid, Global MMCS How does the system function incase of replica crashes? Quantitative evaluation How well the system delivers what it promises in terms of performance? What are the performance cost and gains brought together with scalability and fault tolerance? trade offs between fault-tolerance, scalability and performance what limitations does the trade offs impose to the practical use of my system? what is # of replicas needed for certain availability? what is the cost of fault tolerance? what is the cost of scalability?
28 of 29 Contribution of this Thesis Identifies a novel approach for building Information Services managing session related context. Identifies a novel approach for providing fault tolerance and scalability while providing high performance when managing dynamic metadata Identifies a dynamic replication mechanism for widely distributed dynamic and transient metadata
29 of 29 Summary This thesis addresses following problems Lack of support in Grid Information Services for context (session-related dynamic metadata) management to correlate activities in workflow-style applications: by providing a novel approach for management of widely distributed, shared session-related dynamic metadata Lack of support in Grid Information Services to provide distributed session management: by providing distributed event management system enabling session failure recovery or replay/playback capabilities Lack of search capabilities in Grid Information Services: by providing uniform search interface to both interaction independent and conversation-based context enabling service discovery through events