June 25 th PDPTA Incorporating an XML Matching Engine into Distributed Brokering Systems Shrideep Pallickara, Geoffrey Fox and Marlon Pierce spallick, Community Grid Computing Laboratory, Pervasive Technology Labs Indiana University.
June 25 th PDPTA Talk Outline Motivation NaradaBrokering Overview Organization of XPath Profiles and XML Advertisements Optimizations Performance Measurements Conclusions & Future Work
June 25 th PDPTA Motivation Increasingly interactions between entities are getting to be network-centric. As scale of the system increases backbone messaging infrastructure gravitates towards distributed systems. –Eliminate single point of failures, bottlenecks etc. Entities interacting using XML encapsulated interactions will specify complex constraints. –Since volume will increase, constraints would get more fine grained. This provides underpinnings to route Web Service invocations –Messaging infrastructure forms substrate on which we build lightweight and location independent services.
June 25 th PDPTA NaradaBrokering: Overview Based on a network of cooperating broker nodes –Cluster based architecture allows system to scale Provides a scaleable distributed event service –Publish/Subscribe model. Also JMS compliant –P2P interaction support. JXTA and Gnutella (started) –Audio/Video Apps –Federation of Grid Systems (just starting) Engineering Issues –Support for multiple network protocols. –Tunnel through firewalls/proxies
June 25 th PDPTA NaradaBrokering: Organization
June 25 th PDPTA XPath Query language that searches for, locates, and identifies parts of XML documents. Uses compact, non-XML syntax –Uses path syntax to navigate hierarchical structure of XML documents. Operates on abstract, logical structure of XML documents Matching queries to XML documents –We say a XPath query matches an XML event if that XML event satisfies constraint specified in the query.
June 25 th PDPTA XPath Profiles and XML Advertisements XPath Profile –Specification of an XPath constraint that XML events must satisfy prior to being routed to the client. –Interest in events conforming to a specific template. –Match real-time XML events XML Advertisements –This could be a resource that is described in XML. –Clients interested in locating resources can use an XPath query to locate them. –Disovery Matching times increase with –Increase in the number of profiles/advertisements being maintained –Complexity of the matching operation XPath, SQL matching tends to be more expensive.
June 25 th PDPTA Organization of Profiles and Routing Client profiles are stored hierarchically within the system. –A broker maintains client profiles, cluster-controller maintains broker profiles/advertisements and so on. When an event is received, the event is matched against stored profiles and destinations are computed –A cluster-controller computes broker destinations. A broker computes client destinations. Every broker node, when supplied with a set of destinations, computes the best broker-hops to take to reach these destinations.
June 25 th PDPTA XPath Profile Matching Optimizations XPath Profiles have the following format –Destination is a 32 bit integer of form 000….001…00 Matching process returns with a destination list. –Starts with an empty list –When there is a match destination is added. Simply perform bitwise OR operation. –So if both brokers … and … are interested the destination list would be … Once a destination is added to the computed list, XPath profiles registered to this destination are not considered for subsequent matching against the same XML event. –The savings are enormous especially when there a large number of profiles. Not all nodes involved in the calculation process –Matching costs are amortized over the entire broker network.
June 25 th PDPTA XML Advertisements and Optimizations Organizations and such –Advertisements have a destination associated with them too. –The organizational scheme is similar to profiles. –XPath query issued by a client is matched against stored advertisements. –Controllers at different levels return results. Optimizations –Eliminating location of the same resource from the same unit. A cluster controller would’ve returned all resources for that cluster, no need to match advertisements (at super-cluster controller) registered to that cluster. –We could limit the default number of matching advertisements that are returned as a result of the query.
June 25 th PDPTA Restricting Scope of Matching Ensure resources aren’t available beyond a realm – Restrict propagation of advertisements/profiles. E.g. profile/advertisement not to be sent beyond cluster. ACLs could be included with advertisements –Checked to ensure service not seen for queries with improper credentials. Specifying depth of queries –Ensure localized resources. For e.g. one would be interested in resources advertised by clients within its super cluster.
June 25 th PDPTA Experimental results Stand alone process Pentium-3 1 GHZ 256MB RAM, JVM JRE 1.4 XPath profiles are evenly distributed over 32 sub-unit destinations. Xalan parser
June 25 th PDPTA
June 25 th PDPTA
June 25 th PDPTA What the numbers mean With optimizations profile matching times varies between milliseconds for 10,000 profiles. –Our conjecture is that in most practical situations performance would be similarly enhanced. For advertisements the costs would vary depending on the number of results requested. –Clearly can be used in the discovery of resources since these queries don’t have stringent real time constraints. Computing costs are incurred at controllers. –Matching costs are thus amortized over the network.
June 25 th PDPTA Conclusions and Future work As far as we know this is the first system to incorporate both distributed XPath profile & XML advertisement matching. –Content routed to valid destinations. –Results demonstrate that the scheme is indeed feasible. Future Work –Equivalence of XPath queries. –Effective organization of “related” advertisements is another entry point for reduction of costs associated with discovery Advertisements that have related schema or whose DOM have similar nodes. –Investigate use of Native XML databases such as Xindice and eXist.
June 25 th PDPTA Related work Publish/Subscribe systems –Elvin, Sienna, Gryphon P2P Systems –JXTA, Gnutella JMS systems –Uses TextMessage to package XML document.