Streaming Analytics & CEP Two sides of the same coin?

Slides:



Advertisements
Similar presentations
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Advertisements

Tracking a Soccer Game with Big Data Srinath Perera Director of Research, WSO2 Member, Apache Software
High-Performance Complex Event Processing over Streams Eugene Wu, Yanlei Diao, ShariqRizvi Presented by Ming Li and Mo Liu Presented by Ming Li and Mo.
Programming Models for IoT and Streaming Data IC2E Internet of Things Panel Judy Qiu Indiana University.
The NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
SQL Server 2008 R2 StreamInsight Complex Event Processing Event Stream Processing.
Agenda  Introduction  Background to CEP  Complex Event Processing  Stream Insight  Anatomy of a Stream Insight Project.
HOL9396: Oracle Event Processing 12c
Pulsar Realtime Analytics At Scale Tony Ng, Sharad Murthy June 11, 2015.
OEP BOF9272 SOA Event Delivery Network
CPS 216: Advanced Database Systems Shivnath Babu.
CPS 216: Advanced Database Systems Shivnath Babu Fall 2006.
John Plummer Technical Specialist Data Platform Microsoft Ltd StreamInsight Complex Event Processing (CEP) Platform.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
MyActivity: A Cloud-Hosted Ontology-Based Framework for Human Activity Querying Amin BakhshandehAbkear Supervisor:
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
Comprehensive Flexible Global Storage and Search Responsive Available Secure Manageable Federation Coordination Consolidation Transformation Synchronization.
Janet works on the Azure Stream Analytics team, focusing on the service and portal UX. She has been in the data space at Microsoft for 5 years, previously.
Understanding DBMSs. Data Management Data Query Application DataBase Management System (DBMS)
Microsoft Ignite /28/2017 6:07 PM
Activiti in an Event- driven architecture Robin Bramley Chief Scientific Officer, Ixxus.
IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden.
Big thanks to everyone!.
AMI to SmartGrid “DATA”
The Morphological Trials of StreamInsight: Packaging Matters
In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
PROTECT | OPTIMIZE | TRANSFORM
Introduction to Spark Streaming for Real Time data analysis
The Future of Apache Flink®
Welcome! Power BI User Group (PUG)
SONATA: Scalable Streaming Analytics for Network Monitoring
Scaling Apache Flink® to very large State
Introduction | Model | Solution | Evaluation
The Impact of Replacement Granularity on Video Caching
HP BSM implementation summary
Streaming Analytics with Apache Flink 1.0
Stream Analytics with SQL on Apache Flink®
Solving DEBS Grand Challenge with WSO2 CEP
Akshun Gupta, Karthik Bala
The Design of an Acquisitional Query Processor For Sensor Networks
Apache Flink and Stateful Stream Processing
Introduction to Azure Streaming Analytics
QCon.ai, San Francisco April, 11th 2018
ETL Architecture for Real-Time BI
SONATA: Query-Driven Network Telemetry
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Capital One Architecture Team and DataTorrent
Apama CEP for OpenEdge Developers
COS 518: Advanced Computer Systems Lecture 11 Daniel Suo
Introduction to Apache
Overview of big data tools
Backend System Requirements
Taming the Big Data Fire Hose
Architecture for Real-Time ETL
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
The Dataflow Model.
Developing Advanced Applications with Windows Azure
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
Performance And Scalability In Oracle9i And SQL Server 2000
Data science laboratory (DSLAB)
Streaming data processing using Spark
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
Introduction to Azure Streaming Analytics
Introduction to Azure Streaming Analytics
Azure Cosmos DB – FY20 Top Use Cases
Presentation transcript:

Streaming Analytics & CEP Two sides of the same coin? Fabian Hueske fhueske@apache.org @fhueske Till Rohrmann trohrmann@apache.org @stsffap

Streams are Everywhere Most data is continuously produced as stream Processing data as it arrives is becoming very popular Many diverse applications and use cases

Complex Event Processing Analyzing a stream of events and drawing conclusions Detect patterns and assemble new events Applications Network intrusion Process monitoring Algorithmic trading Demanding requirements on stream processor Low latency! Exactly-once semantics & event-time support

Batch Analytics The batch approach to data analytics

Streaming Analytics Online aggregation of streams No delay – Continuous results Stream analytics subsumes batch analytics Batch is a finite stream Demanding requirements on stream processor High throughput Exactly-once semantics Event-time & advanced window support

Apache Flink® Platform for scalable stream processing Meets requirements of CEP and stream analytics Low latency and high throughput Exactly-once semantics Event-time & advanced windowing Core DataStream API available for Java & Scala

This Talk is About Flink’s new APIs for CEP and Stream Analytics DSL to define CEP patterns and actions Stream SQL to define queries on streams Integration of CEP and Stream SQL Early stage  Work in progress

Tracking an Order Process Use Case Tracking an Order Process

Order Fulfillment Scenario

Order Events Process is reflected in a stream of order events Order(orderId, tStamp, “received”) Shipment(orderId, tStamp, “shipped”) Delivery(orderId, tStamp, “delivered”) orderId: Identifies the order tStamp: Time at which the event happened

Aggregating Massive Streams Stream Analytics Aggregating Massive Streams

Stream Analytics Traditional batch analytics Stream analytics Repeated queries on finite and changing data sets Queries join and aggregate large data sets Stream analytics “Standing” query produces continuous results from infinite input stream Query computes aggregates on high-volume streams How to compute aggregates on infinite streams?

Compute Aggregates on Streams Split infinite stream into finite “windows” Split usually by time Tumbling windows Fixed size & consecutive Sliding windows Fixed size & may overlap Event time mandatory for correct & consistent results!

Example: Count Orders by Hour

Example: Count Orders by Hour SELECT STREAM TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cnt FROM events WHERE status = ‘received’ GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)

Stream SQL Architecture Flink features SQL on static and streaming tables Parsing and optimization by Apache Calcite SQL queries are translated into native Flink programs

Pattern Matching on Streams Complex Event Processing Pattern Matching on Streams

Real-time Warnings

CEP to the Rescue Define processing and delivery intervals (SLAs) ProcessSucc(orderId, tStamp, duration) ProcessWarn(orderId, tStamp) DeliverySucc(orderId, tStamp, duration) DeliveryWarn(orderId, tStamp) orderId: Identifies the order tStamp: Time when the event happened duration: Duration of the processing/delivery

CEP Example

Processing: Order  Shipment val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }

… and both at the same time! Integrated Stream Analytics with CEP … and both at the same time!

Count Delayed Shipments

Compute Avg Processing Time

CEP + Stream SQL // complex event processing result val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = … val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption) val deliveryWarningTable: Table = delWarn.toTable(tableEnv) tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable) // calculate the delayed deliveries per day val delayedDeliveriesPerDay = tableEnv.sql( """SELECT STREAM | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)

CEP-enriched Stream SQL SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDuration FROM ( // CEP pattern SELECT (b.tStamp - a.tStamp) as duration, b.tStamp as tStamp FROM inputs PATTERN a FOLLOW BY b PARTITION BY orderId ORDER BY tStamp WITHIN INTERVAL '1’ HOUR WHERE a.status = ‘received’ AND b.status = ‘shipped’ ) GROUP BY TUMBLE(tStamp, INTERVAL '1’ DAY)

Conclusion Apache Flink handles CEP and analytical workloads Apache Flink offers intuitive APIs New class of applications by CEP and Stream SQL integration 