ICDCS Beijing China Routing of XML and XPath Queries in Data Dissemination Networks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto
ICDCS Beijing China Agenda Motivation Advertisement-based routing Covering Evaluation Conclusions
ICDCS Beijing China Motivation Data sources: publish XML data Data users: register XPath queries The data dissemination network: deliver matching results to a large and dynamically changing group of users Content-based Data Dissemination … XML … Queries Results
ICDCS Beijing China Publish/Subscribe Publisher Subscriber Subscription (XPath) Publication (XML) Advertisement (DTD) Subscriber Matching of XMLs and XPaths [ICDE’06] Matching of Advertisements and XPaths Exploring relations among XPaths
ICDCS Beijing China Covering-based Routing
ICDCS Beijing China Language Model Advertisement: generated from DTDs Non-recursive advertisement e.g., A = /t1/t2/t3…/tn-1/tn Recursive advertisement Simple A = A1(A2)+A3 SeriesA = A1(A2)+A3(A4)+A5 EmbeddedA = A1(A2(A3 )+ A4)+A5 … /personnel/person /personnel/person/name /personnel/person/name/family /personnel/person/name/given /personnel/person/ /personnel/person/url /personnel/person/link DTD Advertisements
ICDCS Beijing China Language Model Subscription: XPaths Absolute e.g., /c/d/*/e Relative e.g., c/d/*/e Descendant operators e.g., c//e/*/c c d e * e * c b a
ICDCS Beijing China Advertisement-based Routing P(A) P(S) P(A) P(S) P(A) P(S) Subscription (S) Broker A1: /a/b/*/e A2: /b/e A3: /a/b/d A4: /a/b/e …
ICDCS Beijing China Overlapping Algorithms S = /a /b /c /* /b /e AdvSubOverlap **Y *tY t*Y ttY t1t2N Next Table A = /a /b /c /* /b /c /* /b /e /a /b /c /* /b /c /* /b /e /a /b /c /* /b /e /a /b /c /* /b /c /* /b /e /a /b /c /* /b /e /a /b /c /* /b /c /* /b /e e.g, S = /a /b //c /* /b //e Basic case: Other cases:
ICDCS Beijing China Subscription Tree Subscriptions are maintained in a hierarchical tree A child has more than one parent Siblings may intersect If a publication does not match a node, it does not match any of the descendants ROOT /a /b/e/c/f /*/bd/a/b /a/b/a/c/a/*/d /a/b/d/a/c/d /b/d/b/e /b/d/a pointer
ICDCS Beijing China Tree Maintenance Insert Delete
ICDCS Beijing China Covering Algorithms Similar to Adv-Sub overlapping algorithms Absolute simple XPEs Relative simple XPEs XPEs with // operator e.g., S1S2Cover **Y *tY t*N ttY t1t2N S2 = /a /a /* //c /e /c /d S1 = /* /a //e /c /a /a /*//c /e /c /d /* /a /e /c /a //c /e /c /d/*
ICDCS Beijing China Merging Rules Rules XPEs with one difference (e.g., element, op) e.g., S1= /a/*/c/d S2 = /a/*/c/e S = /a/*/c/* XPEs with different sub-XPEs e.g., … XPE1 XPE2 … S1 S2 … S // Merge degree P(S1) P(S2) P(S)
ICDCS Beijing China Evaluation Setup Implemented in C++ Overlay with 127 content-based routers Cluster (each node:1.86GHz, 4G) vs. PlanetLab Workloads are generated from two DTDs: NITF and PSD Metrics Number of subscriptions per router Network traffic XPE processing time Notification delay
ICDCS Beijing China Routing Table Size
ICDCS Beijing China Routing Table Size
ICDCS Beijing China Network Traffic MethodNetwork TrafficDelay(ms) No-Adv-No-Cov654, No-Adv-With-Cov572, With-Adv-No-Cov398, With-Adv-With-Cov326, With-Adv-With-CovPM254, With-Adv-With-CovIPM257,
ICDCS Beijing China Process Time
ICDCS Beijing China Notification Delay (PSD)
ICDCS Beijing China Notification Delay (NITF)
ICDCS Beijing China Related Work Locating data sources in large distributed systems [Galanis et al. 2003] DHT based approach Data summary Query aggregation for scalable data dissemination [Chan et al. 2002] Equivalence between the original query set and the aggregated set ONYX [Diao et al. 2004] Deliver part of the XML documents Share common prefixes among queries using NFA XTreeNet [Fenner et al. 2005] Unify the pub/sub model and the query/response model Avoid repeatedly matching at each hop
ICDCS Beijing China Conclusions Investigate advertisement-based routing for XML data dissemination networks Propose a novel data structure to maintain covering & merging relationships among XPEs. Perform experimental evaluation on a 127 broker overlay to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85% Future work Extend to tree patterns Share common prefixes among XPEs in overlapping and covering algorithms
ICDCS Beijing China Q & A Contact Middleware systems research group, University of Toronto
ICDCS Beijing China Process Time Number of Subscriptions Time (ms)
ICDCS Beijing China Notification Delay (NITF)
ICDCS Beijing China Notification Delay (PSD) Number of Hops Notification Delay (ms)
ICDCS Beijing China False Positives
ICDCS Beijing China Conclusions Investigate advertisement-based routing for XML data dissemination networks Present algorithms to determine the covering relations among arbitrary XPEs Propose a novel data structure to maintain covering & merging relationships among XPEs. Explore rules to merge similar XPEs in order to further reduce the routing table size Perform experimental evaluation on a 127 broker overlay to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85%