Overview of Microsoft StreamInsight MGB 2003 Overview of Microsoft StreamInsight Torsten Grabs Lead Program Manager Microsoft StreamInsight © 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
The Need for an Event-Driven Platform CEP Platform from Microsoft - Overview The Need for an Event-Driven Platform Analytical results need to reflect important changes in business reality immediately and enable responses to them with minimal latency Database Applications Event-driven Applications Query Paradigm Ad-hoc queries or requests Continuous standing queries Latency Seconds, hours, days Milliseconds or less Data Rate Hundreds of events/sec Tens of thousands of events/sec or more Query Semantics Declarative relational analytics Declarative relational and temporal analytics Event request output stream input stream response
Scenarios for Event-Driven Applications CEP Platform from Microsoft - Overview Latency Scenarios for Event-Driven Applications Relational Database Applications Months Days hours Minutes Seconds 100 ms < 1ms CEP Target Scenarios Operational Analytics Applications, e.g., Logistics, etc. Data Warehousing Applications Web Analytics Applications Manufacturing Applications Financial trading Applications Monitoring Applications 10 100 1000 10000 100000 ~1million Aggregate Data Rate (Events/sec.)
CEP Platform from Microsoft - Overview Example Scenarios Manufacturing: Sensor on plant floor React through device controllers Aggregated data 10,000 events/sec Web Analytics: Click-stream data Online customer behavior Page layout 100,000 events /sec Financial Services: Stock & news feeds Algorithmic trading Patterns over time Super-low latency 100,000 events /sec Power, Utilities: Energy consumption Outages Smart grids 100,000 events/sec Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds Data Stream Data Stream Visual trend-line and KPI monitoring Batch & product management Automated anomaly detection Real-time customer segmentation Algorithmic trading Proactive condition-based maintenance Stream Data Store & Archive Asset Specs & Parameters Event Processing Engine Threshold queries Event correlation from multiple sources Pattern queries Lookup
StreamInsight Platform SQL Server Connections StreamInsight Platform StreamInsight Application Development StreamInsight Application at Runtime Event sources Event targets Input Adapters Output Adapters Devices, Sensors StreamInsight Engine Pagers & Monitoring devices Standing Queries KPI Dashboards, SharePoint UI Web servers Query Logic Query Logic Trading stations Event stores & Databases Query Logic Stock ticker, news feeds Event stores & Databases Updates will be available at http://www.devconnections.com/updates/LasVegas_Fall09/SQL
What is Project “Austin”? Real time data collection from wide variety of connected devices (Sensors, Smart Meters, Servers, Tablets, Phones) Standards compliant endpoints (REST, XML, JSON) Securable data ingress with data enrichment and transformation (geo- tagging, etc.) Connected Multi-tenant Azure service with flexible, elastic capacity for collection and analytics Federated scale out collection and analytics Distributed service monitoring and tracing Scalable Turn key connectivity for platform data sources and sinks (SQL Azure, Windows Azure Table Storage) Integrated with Azure management portal and billing experiences Integrated Rich temporal (StreamInsight) and sequential (Reactive Framework) analytics models Dynamic, flexible query and data source management experience Analytics Cloud Services
StreamInsight on Azure: “Austin” StreamInsight Application Development StreamInsight Application at Runtime Prebuilt Input Adapters Prebuilt Output Adapters Austin StreamInsight Engine Standing Queries Data Egress Adapter RESTful endpoint Scalable Data Ingress Adapter Stream-Insight Query Reactive Query Authentication Data Egress Adapter Azure Tables Built-in Archive Stream-Insight Query Management Service Monitoring Service
Brand Transformation Presentation Events Events expose different temporal characteristics Point in time events Interval events with fixed duration Interval events with initially unknown duration Rich payloads capture all properties of an event t1 t4 t3 t2 t5 Time Payload/ value a b c d e
Event Types Events in Microsoft’s CEP platform use the .NET type system Events are structured and can have multiple fields Fields are typed using the .NET framework types CEP engine provisioned timestamp fields capture all the different temporal event characteristics Event sources populate time stamp fields Timestamps/Metadata Long pumpID String Type Location Double flow pressure …
Event Streams & Adapters A stream is a possibly infinite sequence of events Insertions of new events Changes to event durations Stream characteristics: Event/data arrival patterns Steady rate with end-of-stream indication Intermittent, random, or in bursts Out of order events: Order of arrival of events does not match the order of their application timestamps Adapters Receive/get events from the data source Enqueue events for processing in the engine
Typical CEP Queries Typical CEP queries require combination of functionality Complex type describes event properties Calculations introduce additional event properties Grouping by one or more event properties Aggregation for each event group over a pre-defined period of time, typically a window Multiple event groups monitored by the same query Correlate event streams Check for absence of activity with a data source Enrich events with reference data Collection of assets may change over time We want to make writing and maintaining those queries easy or even effortless
StreamInsight Query Features Brand Transformation Presentation StreamInsight Query Features Operators over streams Calculations (PROJECT) Correlation of streams from different data sources (JOIN) Check for absence of activity with a data source (EXISTS) Selection of events from streams (FILTER) Stream partitioning (GROUP & APPLY) Aggregation (SUM, COUNT, …) Ranking and heavy hitters (TOP-K) Temporal operations: hopping window, sliding window Extensibility – to add new domain-specific operators
LINQ Query Examples LINQ Example – JOIN, PROJECT, FILTER: Join Filter from e1 in MyStream1 join e2 in MyStream2 on e1.ID equals e2.ID where e1.f2 == “foo” select new { e1.f1, e2.f4 }; Join Filter Project LINQ Example – GROUP&APPLY, WINDOW: from e3 in MyStream3 group e3 by e3.i into SubStream from win in SubStream.HoppingWindow( FiveMinutes,ThreeSeconds) select new { i = SubStream.Key, a = win.Avg(e => e.f) }; Grouping Window Project & Aggregate
Extensibility SDK Built-in operators do not cover all functionality Need for domain-specific extensions Integrate with functionality from existing libraries Support for extensions in the CEP platform: User-defined operators, functions, aggregates Code written in .NET, deployed as .NET assembly Query operators and LINQ can refer to functionality of the assembly Temporal snap-shot operator framework Interface to implement user-defined operators Manages operator state and snapshot changes Framework does the heavy lifting to deal with intricate temporal behavior such as out-of-order events
Resiliency Outages happen in computing Planned and unplanned downtime Power outages “Patch Tuesday” Human mistakes Planned and unplanned downtime Systems need to be “resilient” to outages Minimize damage Become operational again quickly The specific requirements depend on how mission critical your applications is
Resiliency: Timeliness Timeliness: recover from outages quickly. Goal is simple: as fast as possible. StreamInsight doesn’t store event data, but it does store query state. This may be significant. This may be slow to recreate.
Resiliency: Correctness Three Levels: Exact equivalence. The same stream of events, regardless of outage. Equivalent events. No missed events, and no wrong events, but duplicates are allowed. Rough aggregation: get the moving average price of a stock over the last day. Missing a few inputs will result in inaccurate, but close results. Still don’t want to lose a day’s worth of work.
What is Checkpointing? Checkpointing saves a query’s state to disk. You control when the checkpoint is initiated. SI takes care of saving out consistent state. After an outage, StreamInsight can restore this state. This limits state loss during an outage, speeding recovery. Level of correctness depends on additional work we are able to perform. Recovery process is coordinated by SI.
Checkpointing API public IAsyncResult server.BeginCheckpoint( Query query, AsyncCallback asyncCallback, object asyncState); public bool server.EndCheckpoint( IAsyncResult asyncResult); public void server.CancelCheckpoint(
When is Checkpointing Useful? Provides a mechanism to recover from an outage: To recover from unexpected system failure. To handle expected outages (e.g., patch Tuesday). For machine migration. Not a panacea: Does not provide uninterrupted service. Does not protect against broken query logic.
Using Checkpoints We’ll walk through the three progressively-strict checkpointing scenarios: State retention. Equivalent events. Exact equivalence.
Low Bar: State Retention Ideal output: Real output: H G F E D C B A … B A H’ G’ F’ …
Checkpointing j i h g f e d c Enqueue markers into … Enqueue markers into input streams to instruct operators to save their state. j i h g f e d c …
Checkpointing c d e f g h i j … oops c d e f g h i j …
Recovery n m l k j i h g Load saved operator state and then start … Load saved operator state and then start consuming input. n m l k j i h g …
Medium Bar: Equivalent Events Ideal output: Real output: H G F E D C B A … B A D C B …
Filling the Gaps StreamInsight needs help: Missing state since last checkpoint. Missed events during outage. Solution: replayable adapters. The dance: StreamInsight picks a place in the input stream. StreamInsight communicates this to the input adapter. The input adapter replays from the chosen spot.
Checkpointing l k j i h g f e k j i h g f e d j i h g f e d c l k j i … l k j i h g f e k j i h g f e d j i h g f e d c …
Recovery l k j i h g f e … l k j i h g f e …
A Place in the Stream h g f e d c b a … Physical Stream
Communicating the State Input adapter factories can optionally implement one of IHighWaterMarkInputAdapterFactory IHighWaterMarkTypedInputAdapterFactory In a recovery situation, StreamInsight will then call Create with a high-water mark. The factory is then responsible for properly cueing the input.
StreamInsight in Action MGB 2003 StreamInsight in Action Internet of Things Demo © 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
The Demo Status Data Sensor Data Alert Data Control Data StreamInsight “Austin” Alert Data Control Data Historical Data
StreamInsight Design Principles Scalability – Aggregate data rate keeps increasing. Minimum resources impact (co-located). Local computation Avoid flooding the network Programmability Extensibility – UserDefinedAggregates, UserDefinedFunctions, UserDefinedOperators. Composability. Developer experience (language, IDE, debugging, supportability) Adaptablity Easy to integrate via adapters. Portability (servers, edge devices)
StreamInsight Architecture Host Process ... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adap-ters
Expression / Type Service Management Service Host Process ... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapters Highlights Manageability API for query management (i.e. create, start, stop, delete query) and supportability / monitoring of running queries Same manageability API for both embedded deployment and web service clients
Compiler & Expressions Host Process ... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapters Highlights Standardized IL allows us to implement a variety of syntactic surfaces over the algebra - e.g., LINQ, CQL, etc. Allows for domain-specific front-end languages Prepared for future extensions Compile time type checking and type safe code generation for minimal runtime impact. Support for UDF’s, UDAggs, UDOs. JIT code generation for field references , expression evaluation for low latency processing of high event rates. Basing on CLR helps leverage – Code generator, JIT support Type System Tools and Libraries (LINQ Expressions, IDE, etc.)
Expression / Type Service Events & Streams Host Process ... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapters Highlights JIT code generation for field references, expression evaluation because interpreting these references is sub-optimal for low latency processing of high event rates. Leverage JIT code generation support in CLR runtime for LINQ expressions. Bind the query to different deployment environments based on the metadata. Event manager is implemented as a combination of managed and native code in order to minimize overhead and ensure predictable performance. Events are read-only and referenced-counted by streams (minimize data copying)
Expression / Type Service Query Scheduler Host Process ... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapters . Highlights A query is executed by scheduling the individual operators as they become active. Operator state transition is managed by the Scheduler. When an operator becomes active a thread is scheduled for execution. Scheduling decision based on priority of the query and other parameters. Data flow architecture: reduced coupling and pipeline parallelism Operators are affinitized to a thread/core (multi-core environments) to decrease lock contention and increase caching benefits. Periodic checks and migration for load balancing
Expression / Type Service Execution Operators Host Process ... Web Service Engine Compiler Expression / Type Service Runtime Execution Operators Stream Manager Event Manager Query Scheduler Plan Manager Synopsis Command Dispatcher Management Service Metadata Diagnostics / Tracing Stream OS Adapters XYZ Group A,B,C Apply Union X,Y,Z ZZZ YYY XXX BBB AA ABC CCC Highlights Efficient implementation of operators that perform incremental evaluation as each event is processed. Clean, formal semantics. Leverage relational semantics whenever possible. GroupAndApply Operator Enables parallelism for scale-up (multi-core). Groups are dynamically instantiated and torn down based upon the data. Large numbers of groups can be simultaneously active. (~50M active groups for MSN.com)
The StreamInsight Team Founded in 2008 based on incubation between MSR and SQL teams Small team – by Microsoft standards Roles in Microsoft engineering teams Program Managers: customer scenarios, functional specs, APIs, project mgmt, evangelism Developers: architecture, technical design, product code, unit tests Testers: test breakout, test code, lab runs, release signoff Using agile development methods
StreamInsight Roadmap StreamInsight 2.1 (on prem) StreamInsight on Azure (Cloud) Development experience Major API overhaul StreamInsight service on Windows Azure Currently private CTP GA this summer Using Scrum to organize and manage schedules Work organized in sprints/milestones CTP (Community Technology Preview) after each milestone – similar to public beta TAP (Technology Adopter Program) as we get closer to the planned release
MGB 2003 For More Information StreamInsight download location: http://go.microsoft.com/fwlink/?LinkId=160598 StreamInsight blog: http://blogs.msdn.com/streaminsight/ StreamInsight MSDN documentation: http://msdn.microsoft.com/en-us/library/ee362541(SQL.105).aspx StreamInsight MSDN portal: http://msdn.microsoft.com/en-us/ee476990.aspx © 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
MGB 2003 © 2003 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.