Scaling and Fault Tolerance for Distributed Messages in a Service and Streaming Architecture Hasan Bulut Advisor: Prof. Geoffrey Fox Ph.D. Defense Exam April 24, 2007
4/24/ of 33 Motivation A collaboration environment contains multiple real-time stream sources and multiple sinks. Streams are generated at different locations with different timestamps. Streams can be from any videoconferencing system or any video source. We would like to replay them instantly or later at any time. We also would like to replay any collection of streams. In addition to those, we would like to annotate streams and attach other objects such as text to annotated streams.
4/24/ of 33 Motivation Collaboration environments, such as WebEx, may utilize messaging middleware systems. The architecture should be generic so that we can archive and replay any type of streaming event or data. Independent of data format of the stream, i.e. video, audio, text, images. We would like to instantly replay real-time streams with control over the stream/streams, such as play, pause, forward and rewind. We also would like to come up with a system where we can demonstrate this generic framework – eSports system.
4/24/ of 33 GlobalMMCS Prototype System
4/24/ of 33 Extending GlobalMMCS to Mobile Clients
4/24/ of 33 eSports System Interface (Recording)
4/24/ of 33 Research Issues I Session and Stream Management Developing a XML-based control framework where XML messages are also used for information exchange. Managing session information Dynamic sessions and static sessions Describing collaboration session metadata in XML format Managing stream information Streams are in different types and formats Investigating the impact of using a distributed context management service on the design and architecture of the streaming service
4/24/ of 33 Research Issues II Stream Specific Issues Independent of event type and data format – streams can be video, audio, text, images, etc. Services required within messaging middleware to archive and replay of streams achieve instant replay of streams achieve synchronization among streams generated at geographically large area Increase fault tolerance A scalable architecture to overcome performance issues Jitter introduced by archiving and replay service Stream delay introduced for LAN and WAN clients
4/24/ of 33 Services/Components Built Within This Dissertation Designed and ImplementedImplemented Jitter Reduction Service - Buffering Service - Time Differential Service Network Time Protocol Generic Streaming - Session Recorders - Session Players Distributed Repository Algorithm eSports System XML Based General Session Protocol – XGSP GlobalMMCS Session Server XGSP Streaming Gateway
4/24/ of 33 Services Built Within Messaging Middleware Time Services NTP Time Service High Resolution Timing Service Jitter Reduction Service Buffering Service Time Differential Service Replication Scheme (Repository Redundancy) Generic Archiving and Replay Session Recorders Session Players
4/24/ of 33 Time Services Streams are generated at different locations, hence carry different timestamps. We need global time ordering of events. NTP can achieve 1-30 msec accuracy, which is sufficient for collaboration environments Entities should choose atomic servers closer to them. NTP (RFC 1305) Time Service Used to synchronize timekeeping among a set of distributed time servers and clients. Entities generating events in the system should utilize Time Service to timestamp the events. High Resolution Timing Service Implemented for Windows, Linux and Solaris Gives 2-3 usec resolution
4/24/ of 33 Test Result The first offset value is ms, which shows how much the clock in that machine is ahead of the real time. The change of offsets is between (-3) - (2) ms.
4/24/ of 33 Jitter Reduction Services Jitter value of a stream shows the quality of the stream Appears as noise in audio and flicker in video The lower the jitter the better the stream quality. Buffering Service Time-order events. Time Differential Service Releases events preserving the time spacing between events. Minimizes jitter. Buffer duration is 200 msec or higher.
4/24/ of 33 Conceptualizing Jitter Reduction Service
4/24/ of 33 Jitter Reduction Service Test Results
4/24/ of 33 Repository Redundancy NaradaBrokering reliable delivery mechanism utilizes a repository to store events to ensure events are received by subscribers. Repository redundancy scheme improves the reliable delivery if there is a failure at a repository. This provides a distributed reliable repository with modest latency. Each repository functions autonomously and makes decisions independently. A repository gossips with other repositories in the bundle, i.e. exchanging information regarding events stored in other repositories. A repository can recover from any other repository as long as the missing event exists in that repository.
4/24/ of 33 Repository Redundancy Test Results Topology C Topology D Topology E Topology F P 1 Publisher S 1 Subscriber (S 1 : Measuring client) Topology A Topology B
4/24/ of 33 Repository Redundancy Test Results 3 brokers, 3 repositories 1 broker 3 brokers 1 broker, 1 repository 3 brokers, 1 repository 3 brokers, 2 repositories
4/24/ of 33 Generic Streaming Framework and Metadata Management Metadata Management Generic Archiving and Replay Session Recorders Session Players eSports System
4/24/ of 33 Metadata Management We have used “Distributed Context Management” – developed by Mehmet Aktas, which provides a distributed and fault tolerant metadata repository for real time environments. It provides useful features such as sharing context among entities with a single URI. Two levels of management: session level and intra- session level Sessions level management is to keep track of sessions Intra-session level management is to keep track of streams in the session
4/24/ of 33 Generic Archiving and Replay Framework A generic framework for recording and replay of any type of streaming event or data. Instant replay of streams: Real-time (live) streams can be replayed, paused and rewound while streams are being recorded. Stream linkage: Multiple streams are linked together to construct a session. A collaboration session can be recorded and replayed within this framework. Example; eSports System
4/24/ of 33 Uniform Event Type For Generic Framework Received events are wrapped inside NaradaBrokering native events (NBEvent) with some event specific information. Received event is placed to the payload of the NBEvent. NBEvent also contains timestamp information and event type.
4/24/ of 33 Session Recorders Control message T corresponds to a topic which is dedicated to a stream. Connections between entities in the system are either UDP or TCP depending on the requirements of the applications. A recorder is assigned to a topic, which plays a proxy role to receive events of that type.
4/24/ of 33 Session Players Control message The primary purpose of session player is to simulate clients in the original session. Supports instant replay of real-time live streams that are being recorded. Session players support replay, pause, rewind and fast forward operations. When one of those operations is requested, it is applied to all of the topics (streams) in that session.
4/24/ of 33 eSports System – Capabilities provided to eSports System Archive and replay of NaradaBrokering native events Any type of data can be archived in NB repositories and be replayed preserving the timespacing between events Archive and replay of GlobalMMCS sessions GlobalMMCS is an integrated collaboration environment where streams from different communities exist, i.e. H.323 clients, AG clients. Instant replay Real time streams in videoconferencing sessions can be replayed with very low latency Utilizing Distributed Context Management service eSports clients can share session information
4/24/ of 33 eSports System and Streaming Services
4/24/ of 33 eSports System Interface (Recording)
4/24/ of 33 eSports System Interface (Replay)
4/24/ of 33 eSports System – Taking snapshots from Video Players
4/24/ of 33 Performance Results (Test Setup) LAN Setup : gf4.ucs.indiana.edu WAN Setup (FSU): vlab2.scs.fsu.edu WAN Setup (USCD): synseis.geongrid.org
4/24/ of 33 Performance Tests (LAN Results)
4/24/ of 33 Performance Tests (WAN – UCSD Results)
4/24/ of 33 Contribution We proposed and implemented a Generic Streaming Framework where archiving and replay of streams are independent of event type and data format Instant replay of real-time streams available from different collaboration sessions, i.e. H.323, AG Following services are introduced and implemented to messaging middleware to increase the quality-of-service of streams and fault tolerant of the collaboration system; Time Service – to provide a global time ordering of events Jitter Reduction Service – to enable replay of streams in event based system Replication Scheme – increase fault tolerance and scalability of the system
4/24/ of 33 Contribution Designed and implemented eSports System with annotation capability to demonstrate the capabilities of Generic Streaming Framework. Demonstrated that Distributed Context Management Service can be used for collaboration environments to share session and stream information. Designed and implemented XGSP to achieve integration of different collaboration communities. Extended GlobalMMCS to send real-time video conferencing streams to cellular clients.