Exploiting Type and Space in a Main Memory Query Engine Thomas Schwarz Matthias Grossmann, Daniela Nicklas, Bernhard Mitschang Universität Stuttgart, Institute of Parallel and Distributed Systems
University of Stuttgart Center of Excellence Outline Motivation and Scenarios Index Structures Related Work Experiments Conclusion
University of Stuttgart Center of Excellence City-Guide Szenario Query: How do I get to the closest hotel? Hotel Youth-HostelMuseum
University of Stuttgart Center of Excellence Typical Data and Type Hierarchy Typical data Root0 Building1 Museum 3 Res- taurant 4 Road5 Local Road 6 Main Road 7 Hotel 2 Typical type hierarchy NameID Type Youth Hostel 8
University of Stuttgart Center of Excellence Type Hierarchy of TIGER/Line Data Sets D51130 Airport or airfield D52131 Train station D53132 Bus terminal D97 Landmark A115 Primary..., unseparated A159 Primary..., separated A1913 Prim.., bridge A14 Primary Highway With Limited Access A434 Local, Neighborhood, and Rural Road D5128 Transpor- tation Terminal D4122 Educational or religious Institution B166 Railroad Main Line B , in tunnel A1 Road B63 Railroad H213 Hydrography 0 the root type CFCCType ID description 258 types
University of Stuttgart Center of Excellence Typical Queries Typical queries ask for Gas stations next to the planned route Nearest base stations for wireless internet Sights / landmarks / buildings in a given area All roads / only major roads in a given area Disjunctive queries Restrict type of queried objects Restrict location of queried objects Exploit these characteristics for speedup Leverage a dedicated index structure Combine both primary access paths
University of Stuttgart Center of Excellence System Architecture Data Provider Mobile Device Application Discovery Service Integration Middleware Data Provider Mobile Device Application main memory query engine
University of Stuttgart Center of Excellence Selectivity Factor Universe Query area Very selective, -but- Low selectivity factor: 20% Universe Query area Not very selective, -but- High selectivity factor: 70%
University of Stuttgart Center of Excellence Usage Scenarios Data Provider All hotels of a brand in the entire country Few updates Discovery Service Metadata on diverse data providers Few updates Mobile Device Data for single application, around the user Many updates Integration Middleware Diverse data relevant in the range of the base station Many updates Spatial selectivity factor Type selectivity factor 1%10%100% 1% 10% 100%
University of Stuttgart Center of Excellence Summary of the Requirements Simple query capabilitites suffice Combine Type and Space Cope with different workloads Fast response times
University of Stuttgart Center of Excellence Outline Motivation and Scenarios Index Structures Related Work Experiments Conclusion
University of Stuttgart Center of Excellence Separate Indexes Array (Main road) 7 (Hotel) 2 (Local road) 6 (Restaurant) 4 (Root) 0 (Museum) 3 (Road) 5 (Building) 1 Spatial index (Quadtree) chooses Separate Lists Cost-based optimizer Candidates Type predicate Spatial predicate Final Result
University of Stuttgart Center of Excellence Real 3D Index Building + all subtypes type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query
University of Stuttgart Center of Excellence Type Hierarchy Linearization Treat type information like a spatial dimension Root0 Building1 Museum 3 Res- taurant 4 Road5 Local Road 6 Main Road 7 Hotel 2
University of Stuttgart Center of Excellence Type Hierarchy Linearization Treat type information like a spatial dimension Root0 Building1 Museum 3 Res- taurant 4 Road5 Local Road 6 Main Road 7 Hotel 2 Type dimension Building + all subtypes
University of Stuttgart Center of Excellence Effects of the Spacing in the Type Dimension type dimension spatial dimension Objects are primarily grouped by their type Objects are primarily grouped by their position inner node of index tree object spatial dimension type dimension wide spacing between mapped values narrow spacing between mapped values Affects clustering of objects Determine best type mapping range
University of Stuttgart Center of Excellence Type Mapping Variant: Equal Spread (ES) Type ID mapped value type mapping range range containing all subtypes Same gap between all mapped values The simplest variant
University of Stuttgart Center of Excellence Type Mapping Variant: Type Hierarchy (TH) Type ID mapped value range containing all subtypes type mapping range Same gap between a type and its direct subtypes Cluster objects with same supertype
University of Stuttgart Center of Excellence Type Mapping Variant: Object Distribution (OD) Type ID mapped value range containing all subtypes Size of gap corresponds to the number of instances of a type Cluster infrequent objects by location, cluster frequent objects by type type mapping range Requires additional histogram information
University of Stuttgart Center of Excellence Related Work Spatial Indexes We use them, but don‘t build one Object-oriented Databases Use only point access methods Object-relational Databases Separate table for each type Query many tables for all subtypes Single global table Use point access methods
University of Stuttgart Center of Excellence Outline Motivation and Scenarios Index Structures Related Work Experiments Conclusion
University of Stuttgart Center of Excellence Experimental Setup Data sets from 9 counties in California (TIGER/Line 2003) Universe Width: 15 to 100 km Height: 26 to 115 km 12k to 203k objects 258 types
University of Stuttgart Center of Excellence Total Average Query and Update Response Time Weighted query response time Usage Scenarios -> integration borders Type selectivity Spatial selectivity
University of Stuttgart Center of Excellence Comparing the Type Mapping Ranges 100% 110% 120% 130% 140% 150% 160% ESTHODESTHODESTHODESTHOD Data Provider Discovery Service Integration Middleware Mobile Device Relative Response Time A, = 15km B, = 150km C, = 1500km D, = 15000km E, = 60000km type mapping range Almost best type mapping range is sufficient ^ ^ ^ ^ ^
University of Stuttgart Center of Excellence Comparing the Approaches 100% 110% 120% 130% 140% Data Provider Discovery Service Integration Middleware Mobile Device 150% 200% 250% 300% 350% 400% 450% 500% 550% 600% 650% 700% 481% 518% 690% 446% Relative Response Time SEP indexing approach R3D.1:1 R3D.ES R3D.TH R3D.OD Type mapping does matter! Object Density is the best variant More impact with low type selectivity
University of Stuttgart Center of Excellence Resource Consumption k17k23k46k 54k 72k 151k175k203k Bytes Nano seconds SEPR3D.OD.BR3D.OD.CR3D.OD.DR3D.OD.E Indexing approach Data set size 12k17k23k46k54k72k 151k175k 203k Insertion time per objectMemory per object Scales well with larger data sets Speed costs resources
University of Stuttgart Center of Excellence Conclusion Location-conscious main memory query engine Exploits characteristics of typical queries Deployable to many components Real 3D Index Best performance Type mapping range: Larger than expected Type mapping variant: Object Density Separate Indexes Best resource consumption
University of Stuttgart Center of Excellence Outlook Virtualize query processing Dynamically distribute query capabilites according to load Integrate other dimensions Valid time Measurement time