MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Predictive Publish/Subscribe Matching Joint work with Vinod Muthusamy & Haifeng Liu University of Toronto P-ToPSS.

Slides:



Advertisements
Similar presentations
Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),
FSM Decomposition using Partitions on States 290N: The Unknown Component Problem Lecture 24.
1 A Framework for Event Composition in Distributed Systems Christian Hälg, By Peter R. Pietzuch, Brian Shand, and Jean Bacon.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
EXPERT SYSTEMS Part I.
School of Computer Science and Information Systems
Building Knowledge-Driven DSS and Mining Data
Time Series Data Analysis - II
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
Optimal Degree Distribution for LT Codes with Small Message Length Esa Hyytiä, Tuomas Tirronen, Jorma Virtamo IEEE INFOCOM mini-symposium
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
MIDDLEWARE SYSTEMS RESEARCH GROUP Denial of Service in Content-based Publish/Subscribe Systems M.A.Sc. Candidate: Alex Wun Thesis Supervisor: Hans-Arno.
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Querying Structured Text in an XML Database By Xuemei Luo.
Some Probability Theory and Computational models A short overview.
MIDDLEWARE SYSTEMS RESEARCH GROUP Middleware A Policy Management Framework for Content-based Publish/Subscribe Middleware Hans-Arno Jacobsen Department.
On Reducing the Global State Graph for Verification of Distributed Computations Vijay K. Garg, Arindam Chakraborty Parallel and Distributed Systems Laboratory.
Towards Exploiting User- Centric Information for Proactive Caching in Mobile Networks ‡ , WWRF28, Athens Xenofon Vasilakos Xenofon Vasilakos,
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Total Order in Content-based Publish/Subscribe Systems Joint work with: Vinod Muthusamy, Hans-Arno Jacobsen.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Second Line Intrusion Detection Using Personalization DISA Sponsored GWU-CS.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
MIDDLEWARE SYSTEMS RESEARCH GROUP Modelling Performance Optimizations for Content-based Publish/Subscribe Alex Wun and Hans-Arno Jacobsen Department of.
Parallel Event Processing for Content-Based Publish/Subscribe Systems Amer Farroukh Department of Electrical and Computer Engineering University of Toronto.
MIDDLEWARE SYSTEMS RESEARCH GROUP Adaptive Content-based Routing In General Overlay Topologies Guoli Li, Vinod Muthusamy Hans-Arno Jacobsen Middleware.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Tommy Messelis * Stefaan Haspeslagh Patrick De Causmaecker *
CS526: Information Security Chris Clifton November 25, 2003 Intrusion Detection.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Distributed Ranked Data Dissemination in Social Networks Joint work with: Mo Sadoghi Vinod Muthusamy Hans-Arno.
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.
Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Balanced Billing Cycles and Vehicle Routing of Meter Readers by Chris Groër, Bruce Golden, Edward Wasil University of Maryland, College Park American University,
Congestion Avoidance with Incremental Filter Aggregation in Content-Based Routing Networks Mingwen Chen 1, Songlin Hu 1, Vinod Muthusamy 2, Hans-Arno Jacobsen.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Introduction toData structures and Algorithms
Lecture 7: Constrained Conditional Models
OPERATING SYSTEMS CS 3502 Fall 2017
Jacob R. Lorch Microsoft Research
State Machine Model.
A Framework for Object-Based Event Composition in Distributed Systems
Predicting Enterprise Application Performance Measures through Time-series Forecasting Daniel Elsner, 21st August 2017, Scientific advisor: Pouya Aleatrati.
RE-Tree: An Efficient Index Structure for Regular Expressions
Reinforcement learning (Chapter 21)
Chapter 12: Query Processing
Biomedical Data & Markov Decision Process
Sublinear Algorithmic Tools 2
Learning to Program in Python
CSEP590 – Model Checking and Automated Verification
Composite Subscriptions in Content-based Pub/Sub Systems
Lecture 2- Query Processing (continued)
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Predictive Publish/Subscribe Matching Joint work with Vinod Muthusamy & Haifeng Liu University of Toronto P-ToPSS project Hans-Arno Jacobsen

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Little Anecdote 2 Date: Mon, 14 Sep … 10:37: From: " To: … Cc: … CNS Security Admin Subject: DDoS attack originating from …

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org /var/log/secure* & LogWatch aaron/password from : … abdullah/password from : abraham/password from : abram/password from : account/password from : account/password from : adam/password from : addison/password from : aditya/password from : admin/password from : 18 Time(s) admin/password from : 18 Time(s) administrator/password from : 3 Time(s) administrator/password from : 3 Time(s) jacobsen/password from : 2 Time(s) 3

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org And So It Happened: Post-mortem forensics via events across different logs …denied John successfultimestamp … Johnlogofftimestamp … John successfultimestamp … Johnpassword changed 4 Had set user john with password john! 

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Predictive Analytics? Series of failed login attempts from same IP – System is under attack Series of failed login attempts from same IP, followed by successful login from that IP, followed by immediate logoff – System compromised Could we predict that the system is going to be compromised soon with a certain probability, after observing a partial match of the above pattern? – E.g.,: "failed logins from IP, successful login from IP” 5 Compromised?

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Events, Subscriptions & Publish/Subscribe Here, events are – Login attempts, logoff, system compromised Here, subscriptions are – Specific patterns of interest Series of login attempts from same IP Series of login attempts from same IP, followed by logoff The publish/subscribe system is the abstraction that matches subscriptions based on events observed A match detects the event, e.g., system compromised 6

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Outline Predictive Toronto Publish/Subscribe System Event & subscription language model Matching with P-ToPSS Predicting with P-ToPSS Evaluation 7

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org P-ToPSS is Latest ToPSS Member For many applications raising an alert after a malicious activity occurred is too late – Credit card fraud (fraud committed) – Network intrusion (system compromised) – Problem determination (problem occurred) – Root-cause analysis (system crashed, poor user experience) Capability to predict the probability that a given subscription will match in the future is needed. P-ToPSS computes the probability that a subscription will match based on the event history and based on partial matches observed so far. 8

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org P-ToPSS Model 9 Match Engine cs1 will be matched with Probability 0.5 cs4 will be matched with Probability 0.75 cs2 is matched cs1 is fully matched cs1 will be matched with Probability 0.8 Publish/Subscribe matching problem Find all matches Publish/Subscribe prediction problem Find partial matches Determine subscriptions with matching probability > threshold

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 10 Event Model An event: e = {(a 1,v 1 ),(a 2,v 2 ), …(a n,v n )} Event stream: {e 1, e 2, … e k, …} Events are ordered (system timestamps)

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 11 Subscription Language Model Primitive subscriptions – S = p 1  p 2  p 3,  … – p i is a Boolean predicate Composite subscriptions – CS = R(S 1, S 2, S 3, … S m ) R: Operators – Temporal operators:, : contiguous sequence ; : non-contiguous temporal operator – Boolean operators:  : conjunction  : disjunction Contiguous event sequence No event can be skipped Non-contiguous event sequence Events can be skipped

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Example s 1 : ip=$x  login=denied s 2 : ip=$x  login=denied s 3 : ip=$x  login=success s 4 : ip=$x  login=success s 5 : ip=$x  action=passwd s 6 : ip=$x  action=logoff 12 cs intrusion matched by {e 0, e 1 }, e 2, e 3, e 4 cs intrusion = s 1 ; ( ( s 2 ;s 3 -t 2 <d) )  (s 4,s 5 ) );s 6

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Problem Statement Matching Problem – Given a set of composite subscriptions, CS, and an event stream, {e i }, find all cs = R(s 1, s 2, …, s n ) such there that exists {e j1,e j2,…, e jn }  {e i } and e j1 matches s 1, …, e jn matches s n subject to R and all time constraints are satisfied. Prediction Problem – Find all partially matched cs such that Pr cs (full match | partial) > θ cs 13

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Required Matching Tasks Composite subscription: s 1 ; ( (s 2 ;s 3 -t 1 <d) )  (s 4,s 5 ) );s 6 Primitive subscriptions, like s i, matching single events (i.e., sets of attribute-value-pairs) Sequences of primitive subscriptions matching consecutive and non-consecutive events in the input Boolean expressions, like term 1  term 2 above, matching higher-level patterns of events Computation of probabilities to predict full matches given partial matches 14

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Matching Engine 15 Primitive Subscriptions Matcher State Machine Engine Boolean Expression Tree Matcher Prediction Engine Full matches Event stream Derived events Partial matches Partial matches Predictions (subscription, matching probability > θ S ) Primitive subscription matches s 1 ; ( (s 2 ;s 3 -t 1 <d))  (s 4,s 5 ) );s 6 term 1  term 2 s 2 ;s 3 s3s3

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Algorithms for Matching Tasks Primitive Subscription Matcher – BDD-based approach (our ICDCS’05 algorithm) – Alternatively, our SIGMOD’01 algorithm or our new indEX (fastest Boolean Expression Index in the market) Boolean Expression Tree Matcher (state-based) – Extension of the Rete algorithms as in-memory event processing network (Forgy, 1982) – For extensions & implementation, see our PADRES code base at padres.msrg.orgpadres.msrg.org 16

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Algorithms for Matching Tasks State Machine Engine – Based on evaluating finite state machines (FSMs) – Combined with techniques to merge states to amortize processing of similar subscriptions – Combined with algorithms and data structures to track time conditions Prediction Engine – Based on training and evaluating a Markov model Trained on past events Evaluation over event stream 17

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org State Machine Engine State machine creation State machine evaluation 18

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Example: F, F, N3 -t N1 <d), S We abstract for ease of presentation F represents a primitive subscription that evaluates to true for a failed login S represents a primitive subscription that evaluates to true for a successful login Index in time constrain refers to position (state) in the subscription (FSM) 19 N0N0 N 1 (F) FF S3 -t S1 <d) S N 2 (F,F) N 3 (F,F,F) N 4 (F,F,F, S) t Time of the most recent transition into the state Explicit temporal operator treated as another predicate to be evaluated over transition times tracked for all states Contiguous sequence operator

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 20 N 1 (F) FF S3 -t S1 <d) S N 2 (F,F) N 3 (F,F,F) t1t1 t2t2 t3t3 FF Event stream F time N 1 (F) FF S3 -t S1 <d) S N 2 (F,F) N 3 (F,F,F) S = F, F, N3 -t N1 <d), S Current state N 1 (F) t1t1 At t 1 At t 2 At t 3 F N 1 (F) FF S3 -t S1 <d) S N 2 (F,F) N 3 (F,F,F) FF F N 2 (F,F) t2t2 N 1 (F) t2t2 F F F F F N 3 (F,F,F) t3t3 N 2 (F,F) t3t3 N 1 (F) S3 -t S1 <d) Contiguous sequence operator

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Example: F; S 1 ; F; S S2 -t S1 <T) 21 N0N0 N 1 (F) FS F N 2 (F;S) N 3 (F;S;F) N 4 (F;S;F; Events not contributing to matching a subscriptions are allowed to occur (must remain in current state; achieved via self-links) Upon a match of the next primitive subscription Time conditions are checked, if any Transition times are updated Transition times are only tracked for primary & secondary links Non-contiguous sequence operator F * S * F * Primary link Secondary link Self link Triggered for every event except those that trigger primary & secondary links. First transition into state Continued matching of primitive subscription that led to the transitioning into this state.

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 22 N0N0 N 1 (S 1 ) S1S1 S 1 S 2 S 1 ; S 1 ; S T 3 N 2 (S 1 ;S 2 ) N 3 (S 1 ;S 2 ;S 3 ) not(S 2 ) S 2  not(T 1 ) not(S 3 ) S 3  ( not(T 2 )  not(T 3 ) ) T 1 : (t S2 -t S1 < 3) T 2 : (t S3 -t S1 < 6) T 3 : (t S3 -t S2 > 3) t1t1 time S1S1 t4t4 t7t7 S1S1 S1S1 S2S2 S2S2 S2S2 S3S3 S3S3 not(S 1 ) S1S1 Time(S 1 ): S 1 :t 1 S 1 :t 2 S 1 :t 3 Time(S 2 ): S 2 :t 4  T c (S 1 ) = {t 2, t 3 } S 2 :t 5  T c (S 1 ) = {t 3 } Time(S 3 ): S 3 :t 8 Tc(S 2 ) = {t 4 } Tc(S 1 ) = {t 3 } S 2 : t 4 S 2 : t 5 S 2 : t 6 S 3 : t 7 S 3 : t 8 S1S1 S2S2 S3S3

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 23 Merging State Machines Two states N 1 and N 2 are equivalent iff: 1. The number of incoming transitions of N 1 and N 2 are equal. 2.Any incoming transitions arrive from equivalent states and are triggered by the same set of events. Initial states are equivalent. N0N0 a N 2 (a;b) bc N 1 (a) * N 3 (a;b,c) N0N0 a N 2 (a;b) bd N 1 (a) * N 3 (a;b,d) N0N0 a N 1 (a) M0M0 M 2 (a;b) b c M 1 (a) a * M 4 (a;b,d) M 3 (a;b,c) d N 5 (a) a Merged: a; b; c a; b; d a

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 24 Markov Model for Prediction FSMs record incremental matches of subscriptions Probability of transitioning to next state for a given event depends only on current state Our FSMs are Markov processes Our prediction algorithm uses the properties of Markov processes to predict future matches based on current state and event history – Probability of reaching the final state in n events – … of reaching final state in the next 1, 2, 3, … n events

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 25 Prediction & Training Compute long-run transition probability of reaching a given state Based on the input (event history), we count the number of times transitions are taken Based on counters, we compute transition probabilities of the model Transition probability from state i to j is Complete Markov chain with finite state space p ij = Pr(X n+1 = j| X n = i) – Conditional probability of transitioning to j given i # times transition taken all incoming transitions

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Experiments Synthetic workload Real data set 26

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Effect of Number of Subscriptions 27 Merging reduces number of states by up to 30% for given data set Number of states increases linearly in number of subscriptions More states are required for workloads with less state sharing potential Number of states Average matching time per event Matching time increases in the number of subscriptions More sharing requires more processing as a given event may trigger more transitions Gaussian Uniform More sharing Less sharing

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Effect of Number of Non-contiguous Operators Matching time increases in number of non-contiguous operators More and more subscription instances are partially matched waiting for events Asks for a garbage collection scheme 28 Average matching time per event

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Experiments on Synthetic Workload 29 Precision decreases as look-ahead increases Precision increases as prediction-threshold increases and stabilizes for large thresholds

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Expert Model (full) vs. Learned Model 30 Full model (about 1400 states)Learned model (5 states) Precision defined as True positives / All predictions Result: With increasing look-ahead learned model results in higher precision.

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org Conclusions P-ToPSS is a new publish/subscribe model for event stream processing Predicts the probability a subscription will match in the future Performs traditional publish/subscribe matching Supports state-based, temporal and Boolean operators over predicates (complex subscriptions) Based on Markov chains for prediction Prediction performance of learned model is better than hand-crafted model in our experiments 31

MIDDLEWARE SYSTEMS RESEARCH GROUP msrg.org 32