Chapter 4 Realtime Widely Distributed Instrumention System
Useful and robust operation of realtime distributed system requires many capability Automated management of data stream and distributed components Dynamic scheduling Resource reservation This capability will be built on supporting architecture,middleware and low level services such as realtime cataloging
Distributed real time application High speed data stream result from on line instrument and imaging system High speed network is providing the potential to collect,organization, storage,analysis and distribution of the large data object that result from such data stream Health care imaging system :both high data rate and realtime cataloging High energy physics experiments :high data rate and volume have to processed and archive in realtime
Problem Characterization and prototype Realtime management of distributed system invlove : distributed data collection and management distributed data analysis and cataloging Each of these reqiures a supporting infrastructure of middlware and of system and communication services The required middleware services include automated cataloging (chapter 5) Automated monitoring and management system of distributed components (chapter 14,15) Policy based access control system to support scheduling and resource allocation (chapter 19)
Nature of the Remote Operation Distributed instrument can be remote in space,scale,time Remote in space is the typical circumstance of network distributed scientific collaboration Another common circumstance is that the control function is remote in scale that direct control is not possible
Cardioangiography Key aspect of realtime data is immediate and automated processing to organize and catalog the data Data is generated in large volume and with high throughout,the people generating the data are geographically seperated from the people cataloging or using the data A realtime digital library system (WALDO) collects data from instrument and automatically processes,catalogs,archive each data unit together with derived data and metadata Waldo uses an object oriented approach for capture,catalog and management of large data object
Waldo software architecture Data collection system High speed network based cache for providing intermediate storage for processing and for high speed application access Processing mechanism for various sort of data Data management for automatic cataloging and metadata generation Data access interface including application oriented interface Flexible mechanism for providing various searching strategy Transparent security that provides strong access control Transparent storage management for data component Curator interfaces for managing both the metadata and large data object collection User access interface
Particle accelerator A detector system at high energy physics particle accelerator.modern detector like STAR generate 20 – 40 MB/s. data must be processed in two phase : In phase 1 : a detector puts out a steady state high data rate stream In phase 2: data analysis using DPSS (distributed parallel storage system) in this system
Electron microscopy This example concern the remote control of electron microscopy based on the image content In situ electron microscopy experiment requires dynamic interaction with the specimen under observations it is exited with external stimuli Remote control via WAN do not offer realtime data and command delivery guarantes are not practical for finely tuned adjustment
Enable remote control in WAN Human interaction can easily be performed in WAN On the other hand dynamic control operation because of the control operation and monitored response to the control or stimuli have to be coupled by low latency communication that is not possible on WAN. dynamic remote control application usually involve automated control operation performed near the instrument to eliminate WAN realtime delivery requirement
In this section we describe some of architectural and middleware approach in implementing high performance distributed instrument A Model data Intensive Architecture Agent based management Policy based access control In the previous example we demonstrate the utility of using high speed distributed cache This cache based approach provides standard interface to large, distributed storage system Each data source deposits its data in cache and each data consumer takes data from catch
Network based cache DPSS provide highly distributed storage system that is usually used a cache of data DPSS is typically used to collect data from on line instrument and supply that data to analysis application It provides high capacity and isolate application from the tertiary storage system It may be dynamically configured by aggregating workstation and disks from all over the network
Agent based management and monitoring In widely distributed system,when we observe that something has wrong, it is generally too late to react Because the needed information is no longer accessible Because it will too long to ask and answer all of required question Agents not only provide standard access to comprehensive monitoring,they can also perform task such as keeping a state history
Monitoring one successful monitoring methodology involve recording every event with precision timestamps This monitoring is designed to facilitate performance tuning, the characterization of distributed algorithm and the management of functioning system When developing, high speed network based distributed services,we often unexpectedly low network throughput /high latency. The reason for this poor performance is not obvious. A precision and comprehensive monitoring is an invaluable tool for diagnosing such problems
Agent based management of widely distributed systems Agent based management may be the key to keeping widely distributed system running reliably Initial experimentation with such agent in DPSS indicates several potential advantages First: Structured access to current and historical information Second :reliability not only does this system keep track of all components within the system, but it restart any component that has crashed Third :automatic reconfiguration.when new components are added,the agents do not have to reconfigured Fourth : information management Fifth: user representation, can perform actions on behalf of user (if the data is not load in DPSS )
Policy based access control the goal for access control in such distributed system is to reflect the general principles that have been established in society for policy based access control The resource has multiple stakeholders and each stakeholder will impose use conditions on resource.all of the use conditions must be met simultaneously in order to satisfy the requirement for access. An approach that addresses the general goals noted above can be based on authorization and attribute certificates Users are permitted access to resource based on their attributes