Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Slides:



Advertisements
Similar presentations
Web Services Nasrullah. Motivation about web service There are number of programms over the internet that need to communicate with other programms over.
Advertisements

Remote Procedure Call (RPC)
Remote Procedure Call Design issues Implementation RPC programming
Transparent Robustness in Service Aggregates Onyeka Ezenwoye School of Computing and Information Sciences Florida International University May 2006.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Distributed components
Latest techniques and Applications in Interprocess Communication and Coordination Xiaoou Zhang.
6/11/2015Page 1 Web Services-based Distributed System B. Ramamurthy.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Chapter 9: Moving to Design
Client-Server Processing and Distributed Databases
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Web Services Based Streaming Gateway for Heterogeneous A/V Collaboration Hasan Bulut Computer Science Department Indiana University.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
SOA, BPM, BPEL, jBPM.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
JMS Compliance in NaradaBrokering Shrideep Pallickara, Geoffrey Fox Community Grid Computing Laboratory Indiana University.
HPSearch Design & Development via Scripting Harshawardhan Gadgil Dr. Geoffrey Fox, Dr. Marlon Pierce.
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
DISTRIBUTED COMPUTING
Cluster Reliability Project ISIS Vanderbilt University.
Triana: Service-Oriented Examples Ian Taylor Cardiff University, and the Center for Computation and Technology LSU.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Architecting Web Services Unit – II – PART - III.
A Transport Framework for Distributed Brokering Systems Shrideep Pallickara, Geoffrey Fox, John Yin, Gurhan Gunduz, Hongbin Liu, Ahmet Uyar, Mustafa Varank.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Interoperability between Scientific Workflows Ahmed Alqaoud, Ian Taylor, and Andrew Jones Cardiff University 10/09/2008.
Cohesion and Coupling CS 4311
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Jian Gui WANG New Implementation of Agriculture Models APAN19---Jan New Implementations of Agriculture Models Using Mediate Architecture.
SensorGrid Galip Aydin June SensorGrid A flexible computing environment for coupling real-time data sources to High Performance Geographic Information.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Enterprise Integration Patterns CS3300 Fall 2015.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
HPSearch for Managing Distributed Services Authors Harshawardhan Gadgil, Geoffrey Fox, Shrideep Pallickara Community Grids Lab Indiana University, Bloomington.
Grid Computing Environment Shell By Mehmet Nacar Las Vegas, June 2003.
A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
WEB SERVER SOFTWARE FEATURE SETS
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
On Using BPEL Extensibility to Implement OGSI and WSRF Grid Workflows Aleksander Slomiski Presented by Onyeka Ezenwoye CIS Advanced Topics in Software.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
A service Oriented Architecture & Web Service Technology.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Scripting based architecture for Management of Streams and Services in Real-time Grid Applications Authors Harshawardhan Gadgil, Geoffrey Fox, Shrideep.
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
The Client/Server Database Environment
CHAPTER 3 Architectures for Distributed Systems
#01 Client/Server Computing
Design and Implementation of Audio/Video Collaboration System Based on Publish/subscribe Event Middleware CTS04 San Diego 19 January 2004 PTLIU Laboratory.
Lecture 1: Multi-tier Architecture Overview
HPSearch Service Management & Administration Tool
Distributed System using Web Services
Gordon Erlebacher Florida State University
New Tools In Education Minjun Wang
Distributed System using Web Services
#01 Client/Server Computing
GGF10 Workflow Workshop Summary
Presentation transcript:

Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil

Outline Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Motivation Critical Infrastructure systems connect disparate data sources, high-performance computing applications and visualization services for real-time data processing. Real-time data processing Results required in real-time. Data available in streams. Requires pre-processing (e.g. filtering data to remove unwanted parts). Scalability Potentially large number of data sources (Static, dynamic) or data processing elements (services) Unpredictable behavior Fault-tolerance a key factor. E.g. Incorporate new data sources or processing units on the fly

Motivation (contd.) System Management Increasing complexity of application implies more metadata. Proper management required to ensure smooth functioning of the system. Require easy access to manage system characteristics.

Motivation Streaming data Processing Critical Infrastructure systems (Scientific applications) Real-time streaming sources exist E.g. sensors, satellite stations OR Static data sources (databases containing previously warehoused observations) Data filtering / transformation essential in most cases for converting data to proper format for processing application Real-time processing required. Crucial for critical infrastructure applications Audio/video applications. Real-time sources E.g. Collaborative sessions OR Static data source (stored A/V files) Pre-processing required to modify A/V characteristic Format (encoding) / bit rate (quality) etc… Real-time processing crucial for collaborative environments

Outline Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Literature Survey Services (Web / Grid) Scripting Languages Benefits Possible problems Handling data flow in applications File-based vs. Streaming Workflow Systems Enable gluing High performance components GUI – based building and programming flavor Component based architectures Messaging systems (for High throughput data transfer) System Management

Service “Service is a logical manifestation of a logical /physical resource (DB, programs, devices, humans etc) and/or some application logic exposed to network” - Web Service Grids: An Evolutionary Approach (2004) Web Services Simple mechanism for distributed computing Language independent, firewall friendly Grid Services Are essentially Web Services Transient – (can be created, destroyed, or die naturally) State – Maintained between calls to the Web Service

Scripting Languages Benefits Enables Rapid prototyping (less code size and development time) Less effort to Perform complex tasks Interface with OS (hosting environment) Glue code to tie programs Usually portable Primarily for Plugging existing components together However, some disadvantages too Weak typing Less structure, difficult to maintain Some examples Rhino – Java script for JAVA Perl, VBScript, (P/J)ython Scripting vs GUI builders GUI Builders – Ease of involvement of novice design engineer Scripting – Provides more flexibility thru direct access

Scripting Environments Hosting Services OGSI:Lite & WSRF:Lite Based on Perl Rapidly deploy grid services Matlab / Jython from GEODISE GEODISE – Suite of CAD integrated with distributed grid-enabled computing, data, analysis and knowledge resources Uses Matlab to provide programatic access to GEODISE functions along with an existing suite of Matlab tools Jython used to provide a hosting environment using Java CoG kit.

Data flow in applications Real-time processing required. Typically data transfer involves temporary storing of data. This data may be transferred using files (E.g. Grid FTP). Every component of the chain processes data from input file, writes processed data to output file. Time and Space critical in real-time applications hence file-based transfer is undesirable for real-time applications. Tools to automate data transfer and invoke applications (E.g. Grid Ant, Karajan)

Workflow Architectures Triana – Graphical PSE to compose scientific applications Composed of one or more Triana engines. Distributed version Data transfer takes place using JXTA pipes. Taverna Can interact with arbitrary services. Plugins to mediate / operate the service in each case Uses XScufl (derived from WSFL) workflow language. Kepler Java packages for designing and execution. Has a graphical interface for composing complex workflows Can wrap existing code written in different languages. For e.g. Perl script or Matlab script

Component Architectures IU-Extreme Connects components (Provides and Uses ports) Jython based scripting to do application management tasks (create application, set properties, invoke application) Data transfer by GridFTP between components, Globus Reliable File Transfer (fault tolerance). Many other systems Focus mainly on invocation of services as in a Workflow

Messaging systems JXTA – P2P middleware, JMS for communication Pastry Fault tolerant P2P middleware Based on Distributed Hash tables No real-time routing possible IU – Event- brokering system designed to run on a large network of co- operating brokers. Implements high-performance protocols (message transit time < 1 ms per broker) Order-preserving optimized message transport Interface with reliable storage for persistent events Fault tolerant data transport Support for different underlying transport implementations such as TCP, UDP, Multicast, SSL, RTP

System Management Increasing complexity of systems implies increasing amount of metadata to be managed Provide access to System and management of System metadata - WS - Management E.g. Performance metrics, logs, service metadata Require ability to query system data and take actions affecting the characteristics of the system. For e.g. Perl provides hooks to query system data

Outline Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Research Issues Support for streaming data processing. Data transfer and processing in real-time Data transfer to be carried on between the end-points (sender and recipient) without the flow engine mediating - Grid Services Flow Language Design a run-time system that allows merging data sources, data filtering and processing applications and visualization tools in a service-oriented architecture Assume all components available as Web (Grid) services. Scalability an issue – Addition of data sources or processing applications (Services) should not degrade the system performance Fault-tolerance – Services and data sources may be lost. Allow system to detect faults and discover and incorporate new components.

Research Issues System Management Interface - Allow access to system and manipulate the characteristics of system by querying system metadata Create Virtual topology for application deployment Query performance metrics to design policies to change routing substrate characteristics (E.g. Add new brokers or links between existing brokers to aid efficient routing) Discover Services / brokers / topics of interest. To dynamically rewire components with data streams. Replay events Useful for achieving recovery after failure

Outline Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

HPSearch Binds URI to a scripting language We use Mozilla Rhino (A Javascript implementation, Refer: ), but the principles may be applied to any other scripting language Every Resource may be identified by a URI and HPSearch allows us to manipulate the resource using the URI. For e.g. Read from a web address and write it to a local file x = “ y = “file:///u/hgadgil/data.txt”; Resource r = new Resource(“Copier”); r.port[0].subscribeFrom(x); /* read from */ r.port[0].publishTo(y); /* write to */ f = new Flow(); f.addStartActivities(r); f.start(“1”); Adding support for WS-Addressing construct, under investigation

HPSearch (contd.) Currently provide bindings for the following file:// socket://ip:port ftp:// topic:// jdbc: Host-objects to do specific tasks WSDL – invoke web-services using SOAP PerfMetrics – Bind NaradaBrokering performance metrics. Store published metrics and allow querying Resource – Every data source / filter / sink is a resource. Flow – To create a data flow between resources. Useful for creating data flows For more information, visit

Architecture Consists of SHELL Front end to scripting. TASK_SCHEDULER (FLOW_ENGINE) Distributes tasks among co-operating engines for load-balancing purposes. WSPROXY - An AXIS web service wraps an actual service. The behavior of the service can be controlled by making simple WS calls to this proxy.  Can be controlled by any Workflow Engine  WSProxy handles streaming data communication on behalf of the service. Service only sees I/P and O/P streams. These could be files or a remote data stream or even a file transferred via HTTP / FTP or results from a database query Can be deployed in standard Web Service containers (such as Tomcat)

Architecture WSProxy - Interfaces Runnable More control over execution (start, suspend, resume, stop…) Basic idea (read block of data, process it, write it out) Ideal for designing quick filtering applications that process data in streams. Wrapped Wrap an existing service (Executables [*.exe], Matlab scripts, shell / Perl scripts etc…) Less control, can only start, stop Ideal for wrapping existing programs / services to expose as a pluggable component / web service

HPSearch Architecture Overview Request Handler Java script Shell Task Scheduler Flow Handler Web Service EP Other Objects HPSearch Kernel URIHandler DBHandler WSDLHandler WSProxyHandler Request Handler HPSearch Kernel HPSearch Kernel Broker Network... DataBase Web Service Files Sockets Topics WSProxy Service WSProxy Service WSProxy Service

So what is the overhead ? Partial results as of now Taken on 1.6 GHz Pentium 4 machine w/ 256 MB RAM running Java 1.4.1_02, NB version 0.98 rc2, Rhino 1.5R3 Shell Init: 2085 mSec (average) Results from RDAHMM Script (26 lines, small script) takes about 15 mSec (average per line) to execute Task distribution (2 engine, 4 tasks) mSec WSProxy (Init – depends on number of streams to initialize) 700 – 2000 mSec (approximate value using System.currentTimeMillis ).

Outline Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Contribution of this Thesis Stream and Service Management - Program data-flows Incorporate static and dynamic data sources WSProxy ensures that data flows directly between components (Services) without the HPSearch engine mediating. Useful for streaming large amounts of data without clouding the controller. Scalable ? We use NB as our messaging substrate which can handle large number of clients All components (data sources, data processing and visualization applications) are clients. HPSearch manages streams and connects and steers components. Fault – tolerant ? Data source, data filter (processing application) failure possible. HPSearch can use the discovery service to invoke new services (in lieu of failed services) and reconnect components via streams to continue data flow

Contribution of this Thesis (contd.) System Management - Scripting admin tasks Creating network (virtual broker network) topology Querying Performance metrics Topic / Broker discovery Rapid deployment of applications Deploy Network topology Set Application properties Deploy Application In short: Provide alternative programmatic (scripting) access to remote services / resources

Milestones Implement WS front-end to shell Remotely submit a script for execution, possibly through a portal WSProxy / Handler: Fault tolerance to handle situations when The machine hosting the WSProxy dies The broker which is used by the proxy dies The HPSearch Engine dies Design Application Interface Allow users to create applications using this interface Set Application properties, Allow modification of application properties at runtime using scripting NB Admin objects NaradaBroker, PerfMetrics, NBDiscovery, ReplayService

Milestones (contd.) Design stream negotiation module to allow WSProxy to negotiate stream characteristics Select best possible transport and other QoS elements for data transfer between two services (for a particular stream) Applications - To demonstrate the use Audio / Video mixer application Multiple data sources and data filtering applications joined in a chain.

Outline Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Applications Streaming Data Filtering GPS Data Data Filter Filters the input data to get only the estimate and error values RDAHMM Analyze the data Matlab Plotting Script Graph HPSearch Kernel - TSE (Distributed) Services Sensor Source

trex.ucs.indiana.edu school.cs.indiana.edu Applications Creating Virtual Broker Network for deploying applications b = new NaradaBroker("school.cs.indiana.edu"); b.create(""); /* OR b.create("file:///u/hgadgil/alternateConfig.conf"); */ b.connectTo(" ", "5045", "t", ""); b.requestNodeAddress(" bl-dhcp.indiana.edu:5045", "0"); c = new NaradaBroker("trex.ucs.indiana.edu"); c.create(""); c.connectTo(" ", "5045", "t", ""); c.requestNodeAddress("tcp:// bl-dhcp.indiana.edu:5045", "0"); school.cs.indiana.edu trex.cs.indiana.edu HPSearch Shell

Applications Invoking Arbitrary Web Services approved = false; userID = " "; if(loanAmt < 10000) approved = true; else { wsRA = new WSDL(" risk = wsRA.invoke("assessRisk", userID, loanAmt); if(risk > 50) approved = false; else approved = true; } Print "Loan Approved: " + approved; risk = WS_riskAssessor(userID, loanAmt) approved = true Print result loanAmt < approved = trueapproved = false risk > 50

Outline Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

This thesis addresses Managing data streams (Dynamic and static) Enabling connecting data sources and data processing components (available as Web Services) for processing data in real-time for critical infrastructure applications Develop a general purpose scripting architecture (like Perl) for a multitude of tasks Goal is to create an architecture that is Pluggable / Extensible Manageable - Programmable Similar to the UNIX Pipe-Filter Architecture, but implemented on a Distributed scale