Grouping Robin Burke ECT 360
Outline Grouping: Sibling difference method Uniquifying in XPath Grouping: Muenchian method Generated ids Keys Moded Templates Lab
Grouping Problem we want to impose a new hierarchy “organize books by rating” Requires more tree manipulation Characteristic requirement a list of unique elements the ratings a way to find elements for each value all books with a given rating
Two Techniques First easy to understand inefficient Second more complex more likely to be efficient
Sibling difference How do I get a list of unique elements? Idea Sort this groups like elements together Grab elements different from their preceding sibling the “start points” of each bucket
Example Document book1(3), book2(2), book3(3), book5(2), book6(5) Sort book6(5),book1(3),book3(3),book2(2), book5(1) Select elements different from preceding sibling book6(5), book1(3), book2(2), book5(1)
Implementation Need preceding-sibling axis no abbreviation XPath expression all nodes such that they are not equal to their preceding sibling Example =
Example
Problem To build unique list must sort then scan whole list To find matching nodes must scan whole list again for each value Not efficient O(n 2 )
Muenchian grouping Idea index the elements retrieve from the index Can also be used to filter document contents
Muenchian grouping What we need ids keys
IDs id unique identifier for each element DTD / schema can define attributes as ID attributes but document writers must encode must ensure uniqueness
generate-id XSLT can generate ids for each node guaranteed unique Uses within-page navigation node comparison /entries/entry[5] = /entries/entry[last()] checks string contents, not node identity Example
Keys IDs don't allow grouping on content Keys generate an index into document based on XPath expression
Example Key element Interpretation Create an index for book elements index by the rating attribute call it "books_by_rating" Key function key ('books_by_rating', 5) Interpretation Return a node set of all of the nodes Indexed in the "books_by_rating" index with value 5
Example, cont'd
Muenchian grouping Need list of unique values retrieve based on these values The idea create a key index solves step #2 how to get unique values
Unique values First node with a given value key('books_by_rating', 5)[1] Can't go through all values must go through all nodes checking for these In other words go through all the books test if the current node is the first one in the index list if so, use it and its associated value otherwise ignore
XPath rendering Example /book-list/book[generate-id() = How to read this for all book nodes b grab the first indexed book node b 1 with the same rating filter b = b 1
Alternate logic A node set is a set if x = y then { x, y } = { x } = { y } Example /book-list/book[ count(. | = 1] How to read this for all book nodes b create a node set consisting of b and b 1 the first indexed book node with the same rating filter set size = 1
Example id method count method
Building a page index Index labels at the top of the page one for each unique key Sections in the page one for each unique key Links from labels to sections label section name
Example
More complex example Index First letter of author's first names Need substring function substring(author-list/author[1], 1, 1) Use variables for complex XPath expressions
Moded templates We process the same nodes twice using for-each Can we process the same nodes twice with apply-templates? Problem template match criterion in both cases the same
Moded templates cont'd Solution additional attribute: mode specified in apply-templates call and in template definition only the template of the appropriate mode will be used
Example
Lab