Dieter Gawlick, Oracle 03-06 October, 2005 (GGF15 in Boston) Notes About Streams Dieter Gawlick, Oracle 03-06 October, 2005 (GGF15 in Boston)
Examples of Streams Network monitoring and traffic engineering Sensor networks, RFID tags Telecom call records Financial applications Web logs and click-streams Manufacturing processes Queues RSS
Streams are Everywhere Application Application Application Application Stream File/DB File/DB Stream Sensor Data Actuators Stream Stream Feeds Feeds Stream Propagations Propagations A virtual stream; potentially based on a CQ A materialized stream Stream
Definition … a body of water, confined within a bed and banks and having a detectable current – What you get from Google Continuous, unbounded, rapid, time-varying streams of data elements – Jennifer Widom A semi ordered and ever growing set of related data – Anonymous … make up your own definition …
Specifications of Streams Data model Data structure of elements (NVP, CSV, SQL99, XML, RDF, ..) - the easy part Relation between elements has to be added Sequencing, other Review presentation from Stanford Streams project and others Data access Continuous Queries; e.g., Subscriptions, CQL Single elements – well understood Windows, joins, aggregates – (very) limited understanding Any language has to deal with evolution of streams Ad-hoc data access assumes a static/stable data source Ad-hoc access Current languages are very weak on temporal support
Specifications of Streams Client model and tools We need an additional client pattern Use consumption of queues as guide (Exactly once) consumption E.g., make sure that ‘new’ data are processed exactly once even in the presence of application and system failures State management - to protect against failures E.g., prevent extended locking – commit after consumption of new data element Management (logical properties) Retention Other
The Creation of Streams Creation of (initial) streams Applications Fixed Based on parameters Always, on demand Temporal states Continuous query must be able to identify events Creation of (derived) streams by evaluation of other streams: Single element (classical subscription) Windows – CQL like languages Joining of streams – CQL like language with extensions Joining stream with non-stream data – tricky (temporal reference) Many policies See next slide
Distribution and consumption of Streams Messaging, RSS, … Propagation Consumption Single client, group of clients Policies Best effort, auditable, non-reputable Fair According to community policies According to constraints defined by publisher and consumer Best effort Non transactional, transactional (exactly once)
Operational Characteristics Scalability/performance Large number of (streams) elements, publishers, consumers (clients) Reliability/Recoverability Streams may represented business interaction; e.g., may represent most valuable business (data) asset Security Should be aligned with database/file security; e.g., fine grain, context oriented Transactional Elements of some streams need to be created, propagated consumed exactly once Fair Everyone gets information at the same time … and more
Streams and Events The elements of streams are messages Message report about events Event: A change of a state that is of interest Event and message specification Newton Model: Events and messages are pre-efined – take what you get, select a subset Heisenberg Model: Events and messages can be defined by subscriber/consumer Requires temporal support Requires a language to support event/message specification It is a significant challenge to identify important events and related messages