Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference on Data Engineering (ICDE2005)
XML wodehouse psmith london wodehouse psmith london 1234
XML
XML XPath pc : parent – child ad : ancestor-descendant
Scoring Function The traditional tf*idf function is defined in IR. tf : term frequency : quantifies the relative importance of a keyword in an individual document. idf : inverse document frequency : quantifies the relative importance of an individual keyword in the collection of documents.
Scoring Function XML unlike traditional IR An answer to an XPath query need not be an entire document, but can be any node in a document. An XPath query consists of several predicates linking the returned node to other query nodes, instead of simply “ keyword containment in the document ” (as in IR).
Scoring Function XPath Component Predicates XPath query Q q0 : query answer node qi, 1 <= i <= l : other query nodes p( q0, qi ) : XPath axis between query nodes q0 and qi, i>=1 P Q (component predicates of Q): set of predicates {p(q0,qi)}, 1<= i <= l
Scoring Function XML idf
Scoring Function XML tf
Scoring Function XML tf*idf Score
Whirlpool Architecture
Servers and Server Queues Top-k Set Router and Router Queue
Server Predicates Generation
Whirlpool
Scheduling between components Single-threaded Multi-threaded
Experimental
Conclusion Whirlpool, an adaptive evaluation strategy for computing exact and approximate top-k answers of XPath queries. We are investigating new directions such as increasing the number of threads per server for maximal parallelism.