Download presentation
Presentation is loading. Please wait.
Published byBrice Hensley Modified over 9 years ago
1
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial footprint 99.9% points, 0.1% boxes & polygons anticipate getting more (overlapping) polygons –1+ types excellent to horrible discriminator No obvious way to partition them
2
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database Informix Dynamic Server 2000 Indexes –names: Verity (external; blade interface) –footprint: MapInfo (R-tree; blade interface) –types: B-tree
3
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database Queries –dynamic –general case: arbitrary boolean combinations –in practice: [name] AND [footprint] AND [([type] OR [type] OR...)] Desired behavior –see some results immediately queries –after seeing some results, ability to kill query
4
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database Observed behavior: –Either: query answered quickly (< 2 minutes) optimizer picks “right” index –Or: query takes forever (> 30 minutes) optimizer picks “wrong” index Complications: –no way to kill or interrupt database thread, through JDBC or otherwise
5
Greg Janée chit-chat with CS database folks 10/26/01 Challenge #1 Solve ADEPT’s query problem –Multiple, different data types spatial, text, traditional linear types –Discriminability of any given index greatly depends on both data & query –Dynamic queries
6
Greg Janée chit-chat with CS database folks 10/26/01 Load balancing Why –distribute those queries-from-hell –increase reliability How –multiple independent, identical databases –middleware directs query to “best” database –database executes query in its entirety
7
Greg Janée chit-chat with CS database folks 10/26/01 Load balancing What the middleware knows about a database: –current & maximum connection count –current queries –amount of time each query has been processing (QPT) Idea: –score each DB inversely proportional to QPT 2
8
Greg Janée chit-chat with CS database folks 10/26/01 Challenge #2 Define metric –overall query processing time –response time Do better –incorporate connection counts into formula –analyze queries –keep history
9
Greg Janée chit-chat with CS database folks 10/26/01 Query translation Gazetteer is an instance of a much more general problem To wit: –how to describe the automatic translation of dynamic queries written in an abstract query language to SQL –in an easy, powerful, flexible way –making as few assumptions as possible about the underlying schema –and producing “reasonable” SQL not so bad as to preclude database’s optimizer from working
10
Greg Janée chit-chat with CS database folks 10/26/01 Query translation Collection ( database) contains items having IDs –query should return IDs; duplicates OK –response time more important than overall QPT Search bucket –abstract, typed thing against which constraints may be placed; standard buckets Each collection supports 1+ buckets in idiosyncratic ways Query language –arbitrary boolean combinations of bucket constraints
11
Greg Janée chit-chat with CS database folks 10/26/01 Query translation Standard buckets –subject-related text title assigned term –originator –geographic location –coverage date –object type, feature type,... –format –identifier
12
Greg Janée chit-chat with CS database folks 10/26/01 Query translation Buckets grouped by types –spatial e.g., overlaps a given polygon –temporal e.g., contains a given date range –hierarchical e.g., is any kind of geographic work –textual, qualified textual e.g., contains the phrase “luis obispo” –numeric e.g., > 5.7 meters
13
Greg Janée chit-chat with CS database folks 10/26/01 Query translation Example: translating a spatial bucket –MapInfo datablade ST_Contains(table.column, “HG_Box(coords)”) –Geodetic datablade Inside(“GeoBox(coords)”, table.column) –four bounding coordinate columns table.northcolumn >=... and table.southcolumn <=... and...
14
Greg Janée chit-chat with CS database folks 10/26/01 Query translation Existing Python-based scripting system –easy to configure & extend –comes with library of standard translation techniques –"geographic-location" : Bucket( "spatial", standardSpatialOperators, spatialToInformixMapInfo, ["j_holding", "footprint"]) How to extend to boolean queries?
15
Greg Janée chit-chat with CS database folks 10/26/01 Query translation Too-easy solution: –given constraints C 1, C 2 that translate into SQL constraints (T 1, S 1 ), (T 2, S 2 ) then constraint C 1 op C 2 where op is AND, OR, or AND NOT becomes select id from T 1 where S 1 op id in (select id from T 2 where C 2 ) But: Informix appears to execute subqueries in their entirety before considering the outer query
16
Greg Janée chit-chat with CS database folks 10/26/01 Query translation The problems with JOINs –handling ANDs tables may have {1, 1?, 1+, 0+} rows per item –handling ORs may require UNION but UNION is not nestable
17
Greg Janée chit-chat with CS database folks 10/26/01 Query translation The problem with disjunctive normal form: –may be inefficient select id from table where S1 and (S2 or S3) –versus select id from table where S1 and S2 union select id from table where S1 and S3
18
Greg Janée chit-chat with CS database folks 10/26/01 Challenge #3 Design a translation description system –easy to configure & extend –makes as few assumptions as possible about the underlying schema –should produce reasonable SQL by default –supports customization of translation process –supports pattern-based overrides
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.