Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January EECS /21/2000 Presented by Weiming Zhou
Outline Introduction - Data Model - Query Language Indexes in Lore Query plans using indexes Conclusions
Data Model - Object Exchange Model (OEM)
The Lorel Query Language (Lorel) Example 1 select DB.Movie.Title where DB.Movie.Actor.Name = “Harrison Ford” Example 2 select T from DB.Movie M, M.Title T where exists A in M.Actor : exists N in A.Name : N = “Harrison Ford”
Indexes In Lore Value index Text index Link index Path index Edge index
Value index Similar to attribute indexes in Relational DBMS Example Suppose we create a Value index for DB.Movie.Year If we perform a lookup for DB.Movie.Year = “1956”, Result: &12.
Text Index An information-retrieval style keyword search. Restricted by incoming labels. Locates string values containing specific words. Useful for strings containing a significant amount of text. Implementation: Inverted lists - map a given word w and label l to a list of atomic values with incoming edge l that contain word w. Example: Lookup for all objects with an atomic string value containing the word “Ford" and an incoming edge Name. Results: {, }.
Link Index Locates parents of a given object. Serves as back-pointers Implementation Extendible hashing One Link Index for the entire database graph Example The Link Index lookup for object &17 returns parent object &6, and the lookup for object &21 returns object &13.
Path Index Locate all objects reachable by a given labeled path. Provided by DataGuide. Example select DB.Movie.Title Using the Path Index to directly locate all objects reachable via DB.Movie.Title. Results: &5; &9; &1 4.
Edge Index All parent-child pairs connected via a specified label. Example Look up label “Year” in Edge Index Results: &2-&7, &3-&12
Query Plans Using Indexes Top-Down Bottom-Up Hybrid Example select T from DB.Movie M, M.Title T where exists A in M.Actor : exists N in A.Name : N = “Harrison Ford”
Top-Down Query Plan Exhaustive Top-down traversals DB.Movie.Actor.Name = “Harrison Ford” &17, &21 Link Index &17 &2, &21 &4 DB.Movie.Title &5, &14
Bottom-Up Query Plan Look up Value Index DB.Movie.Actor.Name = “Harrison Ford” &17, &21 Link Index &17 &2, &21 &4 DB.Movie.Title &5, &14
Hybrid Query Plan select X from A.B X where exists Y in X.C : Y =5 Bottom-up: Value Index A.B.C = “5” Top-down: A.B Intersect
Conclusions Presents Lore’s indexing structures: Value Index, Text Index, Link Index, Path Index and Edge Index. Query plans using indexes Preliminary performance results: at least an order of magnitude improvement when indexes are used for query processing.