The Morphological Trials of StreamInsight: Packaging Matters

Slides:



Advertisements
Similar presentations
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Advertisements

Chapter 9 Database Management Discovering Computers Fundamental.
Copyright © 2005 Ed Lance Fundamentals of Relational Database Design By Ed Lance.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Dial-In Number: 1 (631) Webinar ID: FHC Tech Talk Automation and Efficiency Series Talk #1 Carbonite automated backup.
With Temporal Tables and More
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Building the Wrong Thing Faster
5/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Streaming Analytics & CEP Two sides of the same coin?
Data Virtualization Tutorial: Introduction to SQL Script
Intro to BI Architecture| Warren Sifre
Mobile Application Development
A very brief introduction
of our Partners and Customers
Spark Presentation.
Distributed Computing
OnContact CRM Customer Relationship Management
The Need for Algorithms 2 days
Remote Monitoring solution
Speaker’s Name, SAP Month 00, 2017
Fun with Reporting Services Tools
Fast Action Links extension A love letter to CiviCRM
Discord Bot Senior Project
Microsoft Inspire 9/17/2018 2:10 PM Proactive Insights
Exploring Azure Event Grid
Plex Workcenter Lookup Add-In Pulls Information into Microsoft Excel so Manufacturing Industry Users Can Efficiently Analyze and Manipulate Data OFFICE.
The New Breed: OMS, Flow, and Power BI Integration
LAMP, WAMP and.. L. Grewe.
SVTRAININGS. SVTRAININGS Python Overview  Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed.
Auditing in SQL Server 2008 DBA-364-M
Power Apps & Flow for Microsoft Dynamics SL
X in [Integration, Delivery, Deployment]
The Sitecore® Experience Platform™ on Microsoft Azure
LESSON 12 - Loops and Simulations
StreamInsight in SQL Server 2012
Pack Your Park by Modernizing Your Business Online
Travel Assistant BOT An EmpFinesseTM Fundamentals Solution.
Azure Event Grid with Custom Events
Event Driven Programming
Building responsive apps and sites with HTML5 web workers
Pack Your Park by Modernizing Your Business Online
What's New in eCognition 9
Fundamentals of Databases
Get your ETL flow under statistical process control
Data Analysis with SQL Window Functions
The Agile Inception Deck
Windows 10 Enterprise E3 for Small and Medium Business
Technical Capabilities
Arithmetic and Decisions
Smart Integration Express
AppointmentmentPeach Appointment Manager
2/19/2019 9:06 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Four Rules For Columnstore Query Performance
Summit Nashville /3/2019 1:48 AM
Tonga Institute of Higher Education IT 141: Information Systems
“To improve the life and business success of the farmer and rancher.”
“To improve the life and business success of the farmer and rancher.”
Tonga Institute of Higher Education IT 141: Information Systems
Arrays.
What's New in eCognition 9
Disaster Recovery Done Dirt Cheap Founder Curnutt Data Solutions
Mark Quirk Head of Technology Developer & Platform Group
Blazor A new framework for browser-based .NET apps Ryan Nowak
End of day Calculator and special order parts tracking
Microsoft Azure Services Platform
Serverless Computing: Promises & Pitfalls
Michael Stephenson Microsoft MVP - Azure
Presentation transcript:

The Morphological Trials of StreamInsight: Packaging Matters Speaker: Jonathan Goldstein Microsoft Research

Clear Weather

Clear Weather: A Cool Research Idea It was summer 2006 and there was starting to be some discussion of streaming in our community Some work coming out of Stanford on a new language called CQL Nile at Perdue just getting started Work picking up on pub/sub and event driven pattern matching While we liked many of the ideas in these projects, we felt that: There was a strong need for something more analytics oriented than pub/sub or pattern matching. SQL was the wrong starting point. Why worry about syntax from the outset? Handling late arriving data was too fundamental to ignore.

Clear Weather: A Cool Research Idea Came up with a new algebra which captured a very wide variety of streaming computations Came up with new query processing algorithms for our new operators Could efficiently compute the queries expressible by our operators These algorithms responded to late arriving data by correcting earlier answers already issued behind the “clock”

Clear Weather: Algebra Basic idea of the algebra: Windows are just sets of data which change over time. Each unique snapshot corresponds to a result which is computed over the data in the snapshot. All data going into and out of the operators contain two timestamps which define the period of time during which the data exists in the window Queries are easy to write and understand: InputStream.SlidingWindow(10 minutes).GroupBy(tuple -> tuple.Category, Count()) Can easily express session windows: StartStream.InfiniteWindow().ClipWindow(EndStream, (left, right) -> left.ID == right.ID))

Clear Weather: Speculation Whenever data arrived late: Assume a tumbling count query with a window of 10 min Assume an event arrives 1 hour late Since we already issued output for the window which the event contributed to: Add an invalidation event, completely revoking the earlier output Issue out of order output for the revoked window with an updated count Seemed like a good idea at the time

Clear Weather: Let’s Start a Business Found some enterprising folks in the SQL team that decided the technology was exciting and wanted to build a new business around it! We were thinking about power distribution networks, manufacturing plants, race cars, oil wells. Built what database people had always built: a server Formed a team of about a dozen excellent engineers to deliver the product I moved over to the product team to help things along

Clear Weather: We Shipped StreamInsight! Shipped on all versions of SQL 2008, with capabilities increasing as the price tag went up Had 5 releases over about 3 years From the beginning, we had a small, but loyal, group of what we considered high value customers. Power Manufacturing Building automation

Mostly Clear

Mostly Clear: Successes and Hiccups We really were able to express a very broad class of queries with our algebra! Much more so than other products! We didn’t fall into the SQL trap. We used LinQ, which allowed our customers to easily integrate their own logic into our queries Our customers loved us for exactly these reasons Failure Our ideas about speculation weren’t ready. We ended up completely turning speculation off.

Mostly Clear: Aggressive speculation – Not such a good idea Consider a join query: LeftStream.JoinTo(RightStream, (left, right) -> left.ID == right.ID)) In many cases end times are established long after start times have already been established in the system (e.g. the event is split into 2) What happens when events which establish end times get delayed? Yuck!

Here come the clouds

Cloudy: Despite our (mostly) Success We felt pretty good about starting a budding new data processing business at Microsoft. But no one at Microsoft seemed to be paying attention!

Rain Everyone was frantically focused on the cloud, and ramping up services. Actually, they needed our technology: Real time application telemetry Health Auditing Online advertising Offline analysis Online campaign analysis Log analysis Logs, logs everywhere

Thunderstorms But StreamInsight was ill suited to the task(s) Servers are unwieldy Installation with product keys etc… Take over machines/VMs. Don’t play well with fabrics Performance Some analysis tools, like column stores, had very high performance StreamInsight was a traditional row at a time query processor. Even though our query language was highly expressive, we just couldn’t compete on performance for offline queries.

Killer Storms Managers within the SQL team recognized that StreamInsight wasn’t the answer they were looking for in the cloud. The team recognized that our technology was needed, but our product wasn’t quite right for these cloudy scenarios. Like most of the DB community, our managers hadn’t ever really had to grapple with a major redefinition of the problem/product space, and were in a panic. Over about a 6 month period, StreamInsight lost its support and the team dispersed.

Salvaging the Wreckage

Salvaging the Wreckage With StreamInsight in shambles, what to do? Decided to move back to research and think about what worked and what didn’t: Algebra: Highly successful Speculation: Total flop because it wasn’t ready LinQ instead of SQL: Highly successful Performance: Inadequate for offline analytics Server model: Totally wrong for cloud oriented applications

It’s a New Day

It’s a New Day  Enter Trill: Same algebra, refined LinQ query language A library, not a server Easily integrated into any .NET app Had both passive (single threaded) and active (parallel execution) modes No speculation anywhere Taught a streaming query processor how to do columnstore tricks Comparable performance to state of the art column stores Orders of magnitude faster than any other streaming system Much richer query semantics than columnstores Made it all transparent to users using code generation

When the cloud comes and wrecks your house Find a way to make pretty music 