SensorGrid High Performance Web Service Architecture for Geographic Information Systems Thesis Proposal Galip Aydin
Outline Introduction Motivations SensorGrid Architecture Research Issues and Goals Contributions
Geographic Information Systems A geographic information system (GIS) is a system for creating and managing spatial data and associated attributes. A computer system capable of integrating, storing, editing, analyzing, and displaying geographically-referenced information. A "smart map" tool that allows users to create interactive queries (user created searches), analyze the spatial information, and edit data. Maps are created by overlaying various geospatial features.
Traditional GIS approach Mostly desktop applications, require expertise and high amount of resources. Centralized server-client models for web- based GIS environments. Cross-vendor or cross-product interoperability is not possible without costly format conversions. Most of the applications consume archived data but with the advancements of the sensors new applications that consume real-time data are appearing in abundance.
Traditional GIS approach (contd.) Limitations Distributed nature of geospatial data. Proprietary data formats, and service methodologies. Lack of interoperable services. Problems Assembling data from distributed sources Format conversions Amount of resources for geoprocessing
Open GIS Standards Several standards bodies started developing data standards and implementation specifications for geospatial and location based services. The goal is to make geographic information and services neutral and available across any network, application, or platform. Two major organizations are Open Geospatial Consortium (OGC) and ISO/TC211.
OGC Supports interoperable solutions that "geo- enable" the Web. Several specifications: Geospatial Data: Geography Markup Language (GML) Sensors: Metadata – SensorML Measurements – Observations & Measurements (GML extension) Services: Web Feature Service Web Map Service Web Coverage Service etc.
Issues with Open Standards HTTP GET/POST based services; limited data transport capabilities (HTTP, FTP, , files etc.) Not Web Services; tightly coupled, point to point communication results in centralized, synchronous applications. High-end scientific and complex GIS apps require: Asynchronous communication models to cope with the high number of participants and long-running codes. Transfer of large data between services. Coupling data sources and high performance tools. Orchestrating multiple services for solving complex problems.
Motivation 1 Complex problems require GIS applications and services to collaborate. Lack of service orchestration capabilities Lack of service oriented practice causes hard to manage distributed practices especially when large number of participants are involved. Coupling data sources to GIS applications There are various types of distributed geospatial data sources used by the GIS applications and we need a flexible computing environment for seamless integration.
Motivation 2 Data transport requirements GIS require large amount of data to be transported between sources and consumers. Current approaches do not provide a scalable and flexible solution. High performance It is a must, not an option for most scientific GIS applications. For instance evaluating pre-seismic real- time messages may lead to early warnings. Proliferation of Sensors Sensors introduce new challenges to the current GIS applications in terms of data collection, management and processing.
Motivating Examples Pattern Informatics Earthquake forecasting code developed by Prof. John Rundle (UC Davis) and collaborators. Uses seismic archives. Regularized Dynamic Annealing Hidden Markov Method (RDAHMM) Time series analysis code by Dr. Robert Granat (JPL). Can be applied to GPS and seismic archives. Can be applied to real-time data. Interdependent Energy Infrastructure Simulation System (IEISS) Models infrastructure networks (e.g. electric power systems and natural gas pipelines) and simulates their physical behavior, interdependencies between systems.
SOA for GIS Utilize Web Services to realize Service Oriented Architecture, Open GIS standards for “data format and service interfaces” for interoperability. We have built WS versions of: WFS – access to geospatial data on various databases WMS (A. Sayar) – visualization of feature data Extended UDDI and WS-Context (M. Aktas) - supporting dynamic service metadata and services registry. Problems with simple WS version Basic WFS; request-response, not asynchronous. Performance: GI Services are not designed to handle non- trivial data transfers. XML: Size of the geospatial data increases with XML encoding.
GIS Data Grids Data is in the heart of every GIS. Easy and fast access to distributed geospatial data is crucial especially in time of crisis or disasters. Points to consider: High performance transport Real-time observations from distributed sensors. Unified access to geospatial data stored in relational DBs, XML DBs and ESRI Shape files. Leverage OGC Web Feature Service to provide standard access and query interfaces. Develop Web Service version of WFS and modify/extend for high performance. Fast population of GML Feature Collections from data in the various DBs.
GIS Data Services WFS Specification; transporting high volume geospatial data encoded in GML is not trivial with HTTP methods or pure Web Services. Researching use of publish/subscribe based messaging system for large data transport and fast response. Issues : Support for multiple clients, creating topics on the fly. Dynamic session metadata: Keeping session state and metadata for each client and request. Use of WS-Context. Prioritize client requests.
Real-Time Sensors Sensors are everywhere; they are being deployed as sensor networks for more accurate measurements. With the proliferation of the sensors, data collection and processing paradigms are changing. Most scientific geo-applications are designed to work with archived data. Critical Infrastructure Systems and Crisis management environments require fast and accurate access to real-time sources and a flexible/pluggable architecture for geoprocessing of the data.
Use Case - GPS Sensors A good example for scientific sensors are GPS station networks. GPS measurements are used for determining seismic events, understanding long-term crustal movement etc. We have access to SOPAC GPS networks: Currently only socket based RYO format access is available, but not utilized! We provide multiple format (RYO, ASCII, GML) real-time streaming access by using NaradaBrokering topics. OHIO and chain of filters. We are investigating use of topic based messaging systems for managing real-time data streams.
SensorGrid Architecture Support both archived and real-time geospatial data access. Support alternate transport and representation schemes. Use topic based messaging infrastructure for large volume data transport. WS-Context for managing dynamic service metadata. UDDI based FTHPIS as services registry. Streaming WFS for serving archived data. Streaming SCS for serving sensor metadata and sensor measurements.
Framework for HP WS Research improving Web Service performance by using better transport protocol and XML representation scheme. Virtualize representation and protocol by binding SOAP to message-oriented middleware. Handlers will negotiate protocol and convert messages between different representations. WS-Context for keeping session metadata related to methodology and specific parameters.
Negotiation Protocol Design a negotiation protocol for web services to negotiate: Transport protocol HTTP over TCP, Parallel TCP, UDP … Efficient representation of XML BXSA, bnux, BXML, MTOM, Fast Infoset, Millau, XOP, DFDL, Fast Web Services, … Other (Security etc.) Try to develop strategies for determining Best available protocol Best representation for a given communication. We will investigate use/extend of WS-Policy to build a negotiation protocol. We will not develop a binary representation method but build a framework that supports multiple binary formats.
Research Issues 1 Applying Web Service principles to GIS data services We have built a WS version of WFS Not suitable for large data sets and where quick response is required High Performance Should support HP data transport for GIS services. Interoperability The system should bridge GIS and Web Service communities by adapting standards from both. Other GIS applications should be able to consume data without having to do costly format conversions. Security
Research Issues 2 Scalability The system should be able to handle high volume and high rate data transport and processing. Plugging new sensors, data sources or geoprocessing applications should not degrade system’s overall performance. Flexibility and extendibility Setting architectural principles for real-time Filters to process sensor data on the fly. Ability to add new filters without system failures. Quality of Service Is latency introduced by filter chains in processing real-time sensor data acceptable? Is the system fault tolerant?
Scaling Measurements TimeRYOASCIIGML 1 SOPAC Network (SDCRTN - 9 Stations) 1 sec1.5KB4.03KB48.7KB 1 hr5.31MB14.18MB171.31MB 1 day127.44MB340.38MB4.01GB 1 month3.8GB9.97GB123.3GB 1 yr45.8GB119.67GB1.41TB Entire SOPAC Network 5 Networks (47 stations) 1yr229GB598.35GB7.05TB Entire SCIGN Network (250 stations) 1yr1.23TB16.18TB160TB
Research Goals Design a High Performance Web Service architecture for distributed GIS services to support archived and real-time geospatial data. Build GIS Data Services for coupling scientific applications with various types of distributed geospatial databases. Implement Web Service versions of Web Feature Service for archived data Sensor Collection Service for real-time geospatial data and sensor metadata. Utilize publish-subscribe based messaging infrastructure to deploy distributed filters for processing real-time sensor data. Develop a negotiation protocol for Web Services for supporting high performance data transport.
Contribution of This Thesis Merges two important software worlds: GIS and Web Service Architectures. Allows unified access to data by developing Web Services and Open GIS standards based services to access and manage archived and real-time geospatial data. Develops a novel way of deploying filter chains on a topic based messaging system for processing real-time streaming sensor data. Identifies a novel approach for negotiating various characteristics of communication between Web Services for High Performance messaging.
Appendix
-83,25 -80,31 City Gate #10 CG E J27. Sample GML Document
Sample GML visualization
RYO Message Format
High Performance XML I (G. Fox) There are many approaches to efficient “binary” representations of XML Infosets MTOM, XOP, Attachments, Fast Web Services DFDL is one approach to specifying a binary format Assume URI-S labels Scheme and URI-R labels realization of Scheme for a particular message i.e. URI-R defines specific layout of information in each message DFDL from GGF quite interesting for this Assume we are interested in conversations where a stream of messages is exchanged between two services or between a client and a service i.e. two end-points Assume that we need to communicate fast between end-points that understand scheme URI-S but must support conventional representation if one end-point does not understand URI-S
High Performance XML II (G. Fox) First Handler Ft=F1 handles Transport protocol; it negotiates with other end-point to establish a transport conversation which uses either HTTP (default) or a different transport such as UDP with WSRM implementing reliability URI-T specifies transport choice Second Handler Fr=F2 handles representation and it negotiates a representation conversation with scheme URI-S and realization URI-R Negotiation identifies parts of SOAP header that are present in all messages in a stream and are ONLY transmitted ONCE Fr needs to negotiate with Service and other handlers illustrated by F3 and F4 below to decide what representation they will process F1F2F3 F4 Container Handlers
High Performance XML III (G. Fox) Filters controlled by Conversation Context convert messages between representations using permanent context (metadata) catalog to hold conversation context Different message views for each end point or even for individual handlers and service within one end point Conversation Context is fast dynamic metadata service to enable conversions NaradaBrokering will implement Fr and Ft using its support of multiple transports, fast filters and message queuing; H1H4H3H2Body Service Conversation Context URI-S, URI-R, URI-T Replicated Message Header Transported Message Handler Message View Service Message View Container Handlers FtFrF3 F4
RDAHMM: GPS Time Series Segmentation (M. Pierce) Slide Courtesy of Robert Granat, JPL Complex data with subtle signals is difficult for humans to analyze, leading to gaps in analysis HMM segmentation provides an automatic way to focus attention on the most interesting parts of the time series GPS displacement (3D) length two years. Divided automatically by HMM into 7 classes. Features: Dip due to aquifer drainage (days ) Hector Mine earthquake (day 626) Noisy period at end of time series
Multiple protocol transport support In publish-subscribe Paradigm with different Protocols on each link Transport protocols supported include TCP, Parallel TCP streams, UDP, Multicast, SSL, HTTP and HTTPS. Communications through authenticating proxies/firewalls & NATs. Network QoS based Routing Allows Highest performance transport Subscription FormatsSubscription can be Strings, Integers, XPath queries, Regular Expressions, SQL and tag=value pairs. Reliable delivery Robust and exactly-once delivery in presence of failures Ordered delivery Producer Order and Total Order over a message type. Time Ordered delivery using Grid-wide NTP based absolute time Recovery and Replay Recovery from failures and disconnects. Replay of events/messages at any time. Buffering services. Security Message-level WS-Security compatible security Message Payload options Compression and Decompression of payloads Fragmentation and Coalescing of payloads Messaging Related Compliance Java Message Service ( JMS ) 1.0.2b compliant Support for routing P2P JXTA interactions. Grid Feature SupportNaradaBrokering enhanced Grid-FTP. Bridge to Globus GT3. Web Services supportedImplementations of WS-ReliableMessaging, WS-Reliability and WS-Eventing. Traditional NaradaBrokering Features (G. Fox)