High-Performance Complex Event Processing over Streams Eugene Wu, Yanlei Diao, ShariqRizvi Presented by Ming Li and Mo Liu Presented by Ming Li and Mo.

Slides:



Advertisements
Similar presentations
Independent consultant Available for consulting In-house workshops Cost-Based Optimizer Performance By Design Performance Troubleshooting Oracle ACE Director.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
Report on Common Intrusion Detection Framework By Ganesh Godavari.
Kien A. Hua Division of Computer Science University of Central Florida.
LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington.
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams YING YANG, XINDONG WU, XINGQUAN ZHU Data Mining and Knowledge.
Efficient Query Evaluation on Probabilistic Databases
SQL Server 2008 R2 StreamInsight Complex Event Processing Event Stream Processing.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Encouraging Students in the Target Language. Target Language Classroom Kit Popsicle Sticks Large / Small Dice Scanned Pictures Video Clips Short Scenario.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Agenda  Introduction  Background to CEP  Complex Event Processing  Stream Insight  Anatomy of a Stream Insight Project.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science SPIRE: Scalable Processing of RFID Event Streams Yanlei Diao University of Massachusetts,
Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.
Abandoned Object Detection for Indoor Public Surveillance Video Dept. of Computer Science National Tsing Hua University.
HOL9396: Oracle Event Processing 12c
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
COMPLEX EVENT PROCESSING KENNY INTHIRATH. EVENT-DRIVEN APPLICATIONS Event-Driven Applications High numbers of events Low latency Real-time Opposed to.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
Data Warehouse Operational Issues Potential Research Directions.
Mo Liu 1, Medhabi Ray 1, Elke A. Rundensteiner 1, Dan Dougherty 1, Chetan Gupta 2, Song Wang 2, Ismail Ari 3, and Abhay Mehta 2 1 Worcester Polytechnic.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Mo Liu 1, Elke A. Rundensteiner 1, Dan Dougherty 1, Chetan Gupta 2, Song Wang 2, Ismail Ari 3, and Abhay Mehta 2 1 Worcester Polytechnic Institute, USA.
Dart: A Meta-Level Object-Oriented Framework for Task-Specific Behavior Modeling by Domain Experts R. Razavi et al..OOPSLA Workshop DSML‘ Dart:
Zhiphone: A Mobile Phone that Learns Context and User Preferences Jing Michelle Liu Raphael Hoffmann CSE567 class project, AUT/05.
An Extensible Test Framework for Microsoft StreamInsight Alex Raizman Asvin Ananthanarayan Anton Kirilov Badrish Chandramouli Mohamed Ali.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
Relational DBs Basics. Formally understood Set theoretic Originally defined with an algebra, with Selection, Projection, Join, and Union/Difference/Intersection.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
Event Detection and Notification in the World-Wide Sensor Web Magdalena Balazinska with Evan Welbourne, Garret Cole, Nodira Khoussainova, Julie Letchner,
Yanlei Diao, University of Massachusetts Amherst Capturing Data Uncertainty in High- Volume Stream Processing Yanlei Diao, Boduo Li, Anna Liu, Liping Peng,
Stream SQL, Rules, Subscriptions: It’s All The Same Hans-Arno Jacobsen Bell University Laboratory Chair Middleware Systems Research Group University of.
XML Stream Processing Yanlei Diao University of Massachusetts Amherst.
IBM Research: Software Technology © 2005 IBM Corporation Programming Technologies 1 Temporal Rules Vijay Saraswat IBM TJ Watson July 27, 2012.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Event Stream Processing with Out-of-Order Data Arrival Mo Liu Database System Research Group Worcester Polytechnic Institute.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Event Stream Processing with Out-of-Order Data Arrival Ming Li and Mo Liu Department of Computer Science Worcester Polytechnic Institute Worcester Massachusetts.
Notes Over 1.2.
Streaming Analytics & CEP Two sides of the same coin?
COMP3017 Advanced Databases
International Conference on Data Engineering (ICDE 2016)
Distributive Property
ANOMALOUS NOISE EVENTS CONSIDERATIONS FOR THE COMPUTATION OF ROAD TRAFFIC NOISE LEVELS : THE DYNAMAP'S MILAN CASE STUDY F. Orga (1), R. M. Alsina-Pagès.
Using The Distributive Property With Variables
Algebraic Sequences.
Lecture 16: Probabilistic Databases
Foundations for Algebra
Semantic Adaptation of Schema Mappings when Schemas Evolve
Towards an Internet-Scale XML Dissemination Service
Slides prepared by Samkit
Exploiting Semantics for Event Detection Systems
Evaluating expressions and Properties of operations
Probabilistic Databases
Mental Math Activities
©G Dear 2008 – Not to be sold/Free to use
Topic: Semantic Text Mining
Foundations for Algebra
Translating Imperative Code into SQL
Properties and Algebraic Expressions
Presentation transcript:

High-Performance Complex Event Processing over Streams Eugene Wu, Yanlei Diao, ShariqRizvi Presented by Ming Li and Mo Liu Presented by Ming Li and Mo Liu The material in the talk is adapted from the slides of this paper ’ s conference talk at SIGMOD 2006

Outline Background of Complex Event Processing Background of Complex Event Processing SASE Event Language SASE Event Language Query Evaluation Query Evaluation Sequence Scan and Construction Sequence Scan and Construction Optimization Optimization Performance Measurement Performance Measurement

Preliminaries Event Event An event is defined to be an instantaneous, atomic (happens completely or not at all) occurrence of interest at a point in time. Event stream not homogeneous

Complex Event Processing Sensor technologies are gaining mainstream adoption Emerging applications: retail management, food & drug distribution, healthcare, library, postal services … High volume of events with complex processing filtered correlated for complex pattern detection transformed to reach an appropriate semantic level A new class of queries translate data of a physical world to useful information

Performance Requirements Two challenges High-volume event streams Extracting events from large windows: Low-Latency Low-Latency Time-critical action

SASE Event Language Language structure Event : structure of an event pattern [WHERE ]: value-based predicates over the pattern [WITHIN ]: sliding window over the pattern

A Retail Management Scenario

SASE Event Language Shoplifting Query EVENT SEQ(SHELF-READING s, !(COUNTER-READING C),EXIT-READING e) WHERE x.id = y.id ∧ x.id = z.id /* or equivalently, [id] */ WITHIN 12 hours

Formal Semantics Define the semantics by translating its language constructs to algebraic query expressions. Define the semantics by translating its language constructs to algebraic query expressions. Operators Operators ANY operator : ANY operator : ANY(A1, A2, …, An) (t) ≡ ∃ 1≤i≤ n Ai(t) SEQ_ operator: SEQ_(A1, A2, …, An) (t) ≡ ∃ t1<t2< … <tn=t A1(t1) ∧ A2(t2) ∧ … ∧ An(tn) SEQ_WITHOUT operator: SEQ_WITHOUT(S1, {B}, S2) (t) ≡ ∃ t11< … <t1m<t21< … <t2n=t A11(t11) ∧ … ∧ A1m(t1m) ∧ A21(t21) ∧ … ∧ A2n(t2n) ∧ ( ∀ ti ∈ (t1m, t21) ¬ B(ti)) Selection operator σ(SEQ_(A1, …, An), Ρ) (t) ≡ ∃ t1< … <tn=t A1(t1) ∧ … ∧ An(tn) ∧ (Ρ) WITHIN_ operator WITHIN_ operator WITHIN_(SEQ_(A1, …, An), T) (t) ≡ ∃ t-T<t1< … <tn=t A1(t1) ∧ … ∧ An(tn)

A Basic Query Plan EVENT SEQ(A a, B b, !(C c), D d) WHERE[attr1,attr2] EVENT SEQ(A a, B b, !(C c), D d) WHERE[attr1,attr2] ∧ a.attr4<d.attr4 WITHIN W

Example EVENTSEQ(A, B, !C, D) WHERE[attr1] WITHIN10 seconds a(2)c(1)b(2)a(3)d(2)b(3)c(3)d(3)a(4) Time SSC (A, B, D) α [attr1] WD: D.time – A.time < 10secs NG: !C (B.time<C.time<D.time Λ B.attr1 = C.attr1) TF: sequence to composite event a(2) b(2) d(2) a(2) b(2) d(3) a(2) b(3) d(3) a(3) b(3) d(3) a(2) b(2) d(2) a(3) b(3) d(3) a(2) b(2) d(2) a(3) b(3) d(3) a(2) b(2) d(2) Event Stream Adapted form Ph.D. Comprehensive Exam Talk September 2006 Luping Ding

Discussions 1 Does SASE support not (a and b and c)? not (a and b and c)? 2 What is the main difference between the event query and the relational SQL query?

Sequence Scan and Construction (SSC) Finite Automata are a natural formalism for sequences Finite Automata are a natural formalism for sequences Two phases of processing Two phases of processing Sequence Scan (SS  ): scans input stream to detect matches Sequence Scan (SS  ): scans input stream to detect matches Sequence Construction (SC  ): searches backward (in a summary of the stream) to create event sequences. Sequence Construction (SC  ): searches backward (in a summary of the stream) to create event sequences.

Illustration of SSC a1 c2 b3 a4 d5 b6 d7 c8 d9 O O OO O O

Illustration of SSC (Cont.) a1 c2 b3 a4 d5 b6 d7 c8 d9 a1 b3 d5 a1 b3 d7 a1 b6 d7 a1 b3 d9 a4 b6 d9 a1 b6 d9

Illustration of SSC (Cont.) Should the automaton be this one?? cx c0 a1 c2 b3 a4 d5 b6 d7 c8 d9 0 0

Optimization Issues What are the key issues for optimization? What are the key issues for optimization? Large sliding windows: e.g., “ within past 12 hours ” Large sliding windows: e.g., “ within past 12 hours ” Large intermediate result sizes: may cause wasteful work Large intermediate result sizes: may cause wasteful work Intra-operator optimization to expedite SSC Intra-operator optimization to expedite SSC Cost of sequence construction depends on the window size. Cost of sequence construction depends on the window size. Inter-operator optimizations to reduce intermediate results Inter-operator optimizations to reduce intermediate results How to evaluate predicates early in SSC? How to evaluate predicates early in SSC? How to evaluate windows early in SSC? How to evaluate windows early in SSC? Indexing relevant events in SSC both in temporal order and across value-based partitions Indexing relevant events in SSC both in temporal order and across value-based partitions

Optimization on SSC: Sequence Index RIP (most recent instance in the previous stack) of b6 is set to a4

Optimization on SSC: Sequence Index (Cont.)

Pushing Down Predicate Evaluations

More Optimizations Evaluating additional equivalence tests in SSC Evaluating additional equivalence tests in SSC Multi-attribute partitions: high memory overhead Multi-attribute partitions: high memory overhead (attr1, attr2 … ) (attr1, attr2 … ) Single-attribute partitions & cross filtering in SS→ Single-attribute partitions & cross filtering in SS→ Pushing the window operator down to SSC Pushing the window operator down to SSC Windows in SS→: coarse grained filtering, pruning Windows in SS→: coarse grained filtering, pruning Windows in SC←: precise checking Windows in SC←: precise checking

Discussion Pushing the window operator down to the SSC Pushing the window operator down to the SSC How to do that? How to do that? Can it really can be counted as one Can it really can be counted as one “ optimization technique ” ? “ optimization technique ” ?

Discussion (Cont.) a1 c2 b3 a4 d5 b6 d7 c8 d9 a1 b3 d5 a1 b3 d7 a1 b6 d7 a1 b3 d9 a4 b6 d9 a1 b6 d9 (1) IF b3 – a1 > w (2) IF d9 – a1 > w (3) IF d7 – a1 > w

Performance Evaluation Effectiveness of query processing in SASE Effectiveness of query processing in SASE Sequence index offers an order-of- magnitude improvement with large windows & query result sizes. Sequence index offers an order-of- magnitude improvement with large windows & query result sizes. Partitioned sequence index is highly effective. Pushing one equivalence test to SSC is a must! Partitioned sequence index is highly effective. Pushing one equivalence test to SSC is a must! Etc. Etc.

Questions?

Good night & Good luck !