MIDDLEWARE SYSTEMS RESEARCH GROUP Denial of Service in Content-based Publish/Subscribe Systems M.A.Sc. Candidate: Alex Wun Thesis Supervisor: Hans-Arno Jacobsen Department of Electrical and Computer Engineering Department of Computer Science University of Toronto v0.4
RESEARCH GROUP MIDDLEWARE SYSTEMS Background Context of Thesis Work PADRES middleware platform Content-based Publish/Subscribe (CPS) Originally inspired by distributed dashboard and job scheduling requirements Increasingly motivated by enterprise application integration Need to investigate different facets of security for CPS systems Security amongst top concern in many application scenarios
RESEARCH GROUP MIDDLEWARE SYSTEMS Contributions of Thesis Work DoS Characteristics Attack Taxonomy Attack Experiments DoS Resilience Commonality Model Matching Algorithm DoS Prevention Policy Model Policy Framework
RESEARCH GROUP MIDDLEWARE SYSTEMS Content-based Publish/Subscribe SS P Publishers P Subscribers Broker Network Subscrip- tions Publication (Tuple) Subscriptions (Boolean Functions) Storing Filters (Functions) [(event=prescription), (age>50)] [(event,prescription), (patientID,123), (age,63), (drug,X) …] [(event=prescription), (drug=Y)] “Matching”
RESEARCH GROUP MIDDLEWARE SYSTEMS Matching Performance Optimizations Often based on exploiting similarities (overlap) between subscriptions Avoid unnecessary subscription and predicate evaluations Can we abstract these optimizations? Formalize content-based Matching Plans (order of subscription and predicate evaluations) Quantify performance of existing optimizations Discover future potential optimizations
RESEARCH GROUP MIDDLEWARE SYSTEMS Commonality Model For a subscription set or Disjunctive Commonality Expression Conjunctive Commonality Expression A set of commonality expressions is a subscription topology. Per-Link Matching DNF Subscriptions Shared predicates Clustering on subscription classes or attributes “Pruning” strategies (e.g., number of attributes)
RESEARCH GROUP MIDDLEWARE SYSTEMS Example: Link-Group Topology Depth First Algorithm to determine probabilistically optimal matching plan [Greiner2006] in
8 Example: Link-Group Topology Low Selectivity X X High Selectivity o o
9 Example: Cluster Topology Dramatic scalability effects of clustering in CPS Observed trend depends on proportion of commonalities not number of predicates... X o SimulationExperimental (in PADRES)
RESEARCH GROUP MIDDLEWARE SYSTEMS Extended Implication Relationships Between subscriptionsBetween predicates Between commonalities
RESEARCH GROUP MIDDLEWARE SYSTEMS Simple Implication Expressions Mixed operator lists currently not supported
RESEARCH GROUP MIDDLEWARE SYSTEMS Matching Engine Architecture … Shared pred. index (conj. comm.) … Subscription index … … All predicates index Predicate pool Subscription pool Overlay links (disj. comm.) Map Sorted List (Map) Node elements
RESEARCH GROUP MIDDLEWARE SYSTEMS Matching Engine Architecture True False D.C. True False D.C. Node Element Subscription Predicate Overlay link (conj. comm.) (DNF subs) Implication Lists Node Elements
RESEARCH GROUP MIDDLEWARE SYSTEMS Subscription Insertion Predicate Insertion … Shared pred. index (conj. comm.) … Subscription index … … All predicates index Predicate pool Conj. Comm. Subscription pool Overlay links (disj. comm.) Unknown predicate priorities default to head of list
RESEARCH GROUP MIDDLEWARE SYSTEMS Subscription Insertion Implication List Update a > P’s True -> True list P X i ’s False -> False list P’s False -> False list
RESEARCH GROUP MIDDLEWARE SYSTEMS Performance Experiments Generated subscription workloads from ~50 to ~200,000 predicates {5,10,15,20} Avg. Predicates x {10,100,1000,10000} Subscriptions 4 Different subscription topologies Low/High clustering (5/200 classes) Low/High sharing (subscription overlap) Randomly generated and matched 100 publications
17 Low Sharing High Sharing High Cluster Low Cluster
18 Low Sharing High Sharing High Cluster Low Cluster
RESEARCH GROUP MIDDLEWARE SYSTEMS Cross-cluster Attributes
RESEARCH GROUP MIDDLEWARE SYSTEMS Cross-cluster Attributes
21 Low Sharing High Sharing High Cluster Low Cluster
22 Low Sharing High Sharing High Cluster Low Cluster
RESEARCH GROUP MIDDLEWARE SYSTEMS Conclusions Model captures many existing and potential optimization techniques Implication list approach significantly reduces number of predicate evaluations in all workloads Superior for expensive predicates Implementation trade-off: Control cascade overhead/usage Cluster/Index implication lists as well Optimize iteration over marked nodes Additional clustering/indexing beyond only event class Future work Additional conjunctive/disjunctive commonalities, implication relationships? Implication relationships relevant to message distribution? Rule-based implementation of implication/commonality algorithm? Thank You – Questions?
MIDDLEWARE SYSTEMS RESEARCH GROUP *** Extra Slides ***
25 High clustering, High sharing
26 Low clustering, High sharing
27 Low clustering, Low sharing
28 High clustering, Low sharing
RESEARCH GROUP MIDDLEWARE SYSTEMS Publication matching Commonality Phase … Shared pred. index (conj. comm.) … Subscription index … … All predicates index Predicate pool Subscription pool Overlay links (disj. comm.) Termination Condition: All overlay links have been decided Iterate and evaluate while TC is false
RESEARCH GROUP MIDDLEWARE SYSTEMS Publication Matching Implication Cascade True False D.C. True False D.C. If not already determined, Evaluate Cascade and Mark True FalseD.C. “Advanced” implications handled with a method call triggered by state change (e.g. Predicate becomes true, calls countTruePredicate() on subscriptions)
RESEARCH GROUP MIDDLEWARE SYSTEMS Publication Matching Subscription Phase … Shared pred. index (conj. comm.) … Subscription index … … All predicates index Predicate pool Subscription pool Overlay links (disj. comm.) Iterate and evaluate while TC is false + Cascade and Mark + Cascade and Count
RESEARCH GROUP MIDDLEWARE SYSTEMS Publication Matching Cleanup Phase There is no cleanup phase A counter (Vm) is incremented at the start of each publication matching phase All determined results are versioned (Vd) A determined result is stale if Vd < Vm To avoid overflow, reset counter every: 64bit counter ~= 16x10^18 pubs pub/s ~ 16x10^15 s ~32x10^6 s/year ~ 0.5x10^9 years
RESEARCH GROUP MIDDLEWARE SYSTEMS Publication Matching Sorted Lists Commonality/predicate lists sorted by (p+1/N) p is the predicate selectivity N is the number of subscriptions sharing the predicate Subscriptions sorted by (1-p)n p is average predicate selectivity n is number of predicates Predicate hash sorted by predicate value Commonality/predicate/subscription sorting is meant to be extendable with different priority equations Include predicate cost, length of implication lists, etc …
RESEARCH GROUP MIDDLEWARE SYSTEMS Low Sharing High Sharing High Cluster Low Cluster
RESEARCH GROUP MIDDLEWARE SYSTEMS Low Sharing High Sharing High Cluster Low Cluster
RESEARCH GROUP MIDDLEWARE SYSTEMS Tables Query (Boolean Function) DB Rows (Tuples) Subscrip- tions Publication (Tuple) Subscriptions (Boolean Functions) Storing Functions Storing Data Databases Content-based Publish/Subscribe Inverse Problems Query Plans Matching Plans? Scalable Performance