Presentation is loading. Please wait.

Presentation is loading. Please wait.

Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial.

Similar presentations


Presentation on theme: "Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial."— Presentation transcript:

1 Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial footprint 99.9% points, 0.1% boxes & polygons anticipate getting more (overlapping) polygons –1+ types excellent to horrible discriminator No obvious way to partition them

2 Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database Informix Dynamic Server 2000 Indexes –names: Verity (external; blade interface) –footprint: MapInfo (R-tree; blade interface) –types: B-tree

3 Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database Queries –dynamic –general case: arbitrary boolean combinations –in practice: [name] AND [footprint] AND [([type] OR [type] OR...)] Desired behavior –see some results immediately  queries –after seeing some results, ability to kill query

4 Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database Observed behavior: –Either: query answered quickly (< 2 minutes) optimizer picks “right” index –Or: query takes forever (> 30 minutes) optimizer picks “wrong” index Complications: –no way to kill or interrupt database thread, through JDBC or otherwise

5 Greg Janée chit-chat with CS database folks 10/26/01 Challenge #1 Solve ADEPT’s query problem –Multiple, different data types spatial, text, traditional linear types –Discriminability of any given index greatly depends on both data & query –Dynamic queries

6 Greg Janée chit-chat with CS database folks 10/26/01 Load balancing Why –distribute those queries-from-hell –increase reliability How –multiple independent, identical databases –middleware directs query to “best” database –database executes query in its entirety

7 Greg Janée chit-chat with CS database folks 10/26/01 Load balancing What the middleware knows about a database: –current & maximum connection count –current queries –amount of time each query has been processing (QPT) Idea: –score each DB inversely proportional to  QPT 2

8 Greg Janée chit-chat with CS database folks 10/26/01 Challenge #2 Define metric –overall query processing time –response time Do better –incorporate connection counts into formula –analyze queries –keep history

9 Greg Janée chit-chat with CS database folks 10/26/01 Query translation Gazetteer is an instance of a much more general problem To wit: –how to describe the automatic translation of dynamic queries written in an abstract query language to SQL –in an easy, powerful, flexible way –making as few assumptions as possible about the underlying schema –and producing “reasonable” SQL not so bad as to preclude database’s optimizer from working

10 Greg Janée chit-chat with CS database folks 10/26/01 Query translation Collection (  database) contains items having IDs –query should return IDs; duplicates OK –response time more important than overall QPT Search bucket –abstract, typed thing against which constraints may be placed;  standard buckets Each collection supports 1+ buckets in idiosyncratic ways Query language –arbitrary boolean combinations of bucket constraints

11 Greg Janée chit-chat with CS database folks 10/26/01 Query translation Standard buckets –subject-related text title assigned term –originator –geographic location –coverage date –object type, feature type,... –format –identifier

12 Greg Janée chit-chat with CS database folks 10/26/01 Query translation Buckets grouped by types –spatial e.g., overlaps a given polygon –temporal e.g., contains a given date range –hierarchical e.g., is any kind of geographic work –textual, qualified textual e.g., contains the phrase “luis obispo” –numeric e.g., > 5.7 meters

13 Greg Janée chit-chat with CS database folks 10/26/01 Query translation Example: translating a spatial bucket –MapInfo datablade ST_Contains(table.column, “HG_Box(coords)”) –Geodetic datablade Inside(“GeoBox(coords)”, table.column) –four bounding coordinate columns table.northcolumn >=... and table.southcolumn <=... and...

14 Greg Janée chit-chat with CS database folks 10/26/01 Query translation Existing Python-based scripting system –easy to configure & extend –comes with library of standard translation techniques –"geographic-location" : Bucket( "spatial", standardSpatialOperators, spatialToInformixMapInfo, ["j_holding", "footprint"]) How to extend to boolean queries?

15 Greg Janée chit-chat with CS database folks 10/26/01 Query translation Too-easy solution: –given constraints C 1, C 2 that translate into SQL constraints (T 1, S 1 ), (T 2, S 2 ) then constraint C 1 op C 2 where op is AND, OR, or AND NOT becomes select id from T 1 where S 1 op id in (select id from T 2 where C 2 ) But: Informix appears to execute subqueries in their entirety before considering the outer query

16 Greg Janée chit-chat with CS database folks 10/26/01 Query translation The problems with JOINs –handling ANDs tables may have {1, 1?, 1+, 0+} rows per item –handling ORs may require UNION but UNION is not nestable

17 Greg Janée chit-chat with CS database folks 10/26/01 Query translation The problem with disjunctive normal form: –may be inefficient select id from table where S1 and (S2 or S3) –versus select id from table where S1 and S2 union select id from table where S1 and S3

18 Greg Janée chit-chat with CS database folks 10/26/01 Challenge #3 Design a translation description system –easy to configure & extend –makes as few assumptions as possible about the underlying schema –should produce reasonable SQL by default –supports customization of translation process –supports pattern-based overrides


Download ppt "Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial."

Similar presentations


Ads by Google