Exploiting Type and Space in a Main Memory Query Engine Thomas Schwarz Matthias Grossmann, Daniela Nicklas, Bernhard Mitschang Universität Stuttgart, Institute.

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
1 Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces Dmitri Krioukov CAIDA/UCSD Joint work with F. Papadopoulos, M.
GrooveSim: A Topography- Accurate Simulator for Geographic Routing in Vehicular Networks 簡緯民 P
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Stephen D. Bay 1 and Mark Schwabacher 2 1 Institute for.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
A New Point Access Method based on Wavelet Trees Nieves R. Brisaboa, Miguel R. Luaces, Diego Seco Database Laboratory University of A Coruña A Coruña,
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Computational Data Modeling and Query Processing in Road Networks Irina Aleksandrova, Augustas Kligys, Laurynas Speičys 4-th WIM meeting, Aalborg 2002.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
ICPCA 2008 Research of architecture for digital campus LBS in Pervasive Computing Environment 1.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
TrafficView: A Driver Assistant Device for Traffic Monitoring based on Car-to-Car Communication Sasan Dashtinezhad, Tamer Nadeem Department of CS, University.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
1 Where do spatial context-models end and where do ontologies start? A proposal of a combined approach Christian Becker Distributed Systems Daniela Nicklas.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Cs6390 summer 2000 Tradeoffs for Packet Classification 1 Tradeoffs for Packet Classification Members: Jinxiao Song & Yan Tong.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
A Delaunay Triangulation Architecture Supporting Churn and User Mobility in MMVEs Mohsen Ghaffari, Behnoosh Hariri and Shervin Shirmohammadi Advanced Communications.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Fast Nearest Neighbor Search with Keywords. Abstract Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions.
Research Interests Georgia Koloniari Computer Science Department University of Ioannina, Greece.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Dept. of Electrical Engineering and Computer Science, Northwestern University Context-Aware Optimization of Continuous Query Maintenance for Trajectories.
1 Heterogeneity in Multi-Hop Wireless Networks Nitin H. Vaidya University of Illinois at Urbana-Champaign © 2003 Vaidya.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
HAIR: Hierarchical Architecture for Internet Routing Anja Feldmann TU-Berlin / Deutsche Telekom Laboratories Randy Bush, Luca Cittadini, Olaf Maennel,
Sensor Database System Sultan Alhazmi
IP Services over Bluetooth: Leading the Way to a New Mobility Markus Albrecht Matthias Frank Peter Martini Markus Schetelig Asko Vilavaara Andre Wenzel.
Universität Stuttgart Institute of Parallel and Distributed Systems (IPVS) Universitätsstraße 38 D Stuttgart Scalable Processing of Trajectory-Based.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
A direct train to train communication is affected by the conditions of the railway environment. Research of the specific characteristics require a railway.
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Benefits of integrating meta data into a context model Nicola Hönle, Uwe-Philipp Käppeler, Daniela Nicklas, Thomas Schwarz, Matthias Grossmann Nexus Center.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Approaching Fine-grain Access Control for Distributed Biomedical Databases within Virtual Environments Onur Kalyoncu, Yi Pan, Matthias Assel High Performance.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
U.S. Census Data & TIGER/Line Files
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Bigtable: A Distributed Storage System for Structured Data
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.
Strategies for Spatial Joins
Architecture and Algorithms for an IEEE 802
Record Storage, File Organization, and Indexes
A Black-Box Approach to Query Cardinality Estimation
Sub-millisecond Stateful Stream Querying over
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Spatio-temporal Pattern Queries
Spatial Online Sampling and Aggregation
DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS
Communication and Memory Efficient Parallel Decision Tree Construction
Joining Interval Data in Relational Databases
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Database Design and Programming
Korea University of Technology and Education
Donghui Zhang, Tian Xia Northeastern University
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
All about Indexes Gail Shaw.
Presentation transcript:

Exploiting Type and Space in a Main Memory Query Engine Thomas Schwarz Matthias Grossmann, Daniela Nicklas, Bernhard Mitschang Universität Stuttgart, Institute of Parallel and Distributed Systems

University of Stuttgart Center of Excellence Outline Motivation and Scenarios Index Structures Related Work Experiments Conclusion

University of Stuttgart Center of Excellence City-Guide Szenario Query: How do I get to the closest hotel? Hotel Youth-HostelMuseum

University of Stuttgart Center of Excellence Typical Data and Type Hierarchy Typical data Root0 Building1 Museum 3 Res- taurant 4 Road5 Local Road 6 Main Road 7 Hotel 2 Typical type hierarchy NameID Type Youth Hostel 8

University of Stuttgart Center of Excellence Type Hierarchy of TIGER/Line Data Sets D51130 Airport or airfield D52131 Train station D53132 Bus terminal D97 Landmark A115 Primary..., unseparated A159 Primary..., separated A1913 Prim.., bridge A14 Primary Highway With Limited Access A434 Local, Neighborhood, and Rural Road D5128 Transpor- tation Terminal D4122 Educational or religious Institution B166 Railroad Main Line B , in tunnel A1 Road B63 Railroad H213 Hydrography 0 the root type CFCCType ID description 258 types

University of Stuttgart Center of Excellence Typical Queries Typical queries ask for Gas stations next to the planned route Nearest base stations for wireless internet Sights / landmarks / buildings in a given area All roads / only major roads in a given area Disjunctive queries Restrict type of queried objects Restrict location of queried objects Exploit these characteristics for speedup Leverage a dedicated index structure Combine both primary access paths

University of Stuttgart Center of Excellence System Architecture Data Provider Mobile Device Application Discovery Service Integration Middleware Data Provider Mobile Device Application main memory query engine

University of Stuttgart Center of Excellence Selectivity Factor Universe Query area Very selective, -but- Low selectivity factor: 20% Universe Query area Not very selective, -but- High selectivity factor: 70%

University of Stuttgart Center of Excellence Usage Scenarios Data Provider All hotels of a brand in the entire country Few updates Discovery Service Metadata on diverse data providers Few updates Mobile Device Data for single application, around the user Many updates Integration Middleware Diverse data relevant in the range of the base station Many updates Spatial selectivity factor Type selectivity factor 1%10%100% 1% 10% 100%

University of Stuttgart Center of Excellence Summary of the Requirements Simple query capabilitites suffice Combine Type and Space Cope with different workloads Fast response times

University of Stuttgart Center of Excellence Outline Motivation and Scenarios Index Structures Related Work Experiments Conclusion

University of Stuttgart Center of Excellence Separate Indexes Array (Main road) 7 (Hotel) 2 (Local road) 6 (Restaurant) 4 (Root) 0 (Museum) 3 (Road) 5 (Building) 1 Spatial index (Quadtree) chooses Separate Lists Cost-based optimizer Candidates Type predicate Spatial predicate Final Result

University of Stuttgart Center of Excellence Real 3D Index Building + all subtypes type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Traversing the Index Spatial dimension Type dimension Query

University of Stuttgart Center of Excellence Type Hierarchy Linearization Treat type information like a spatial dimension Root0 Building1 Museum 3 Res- taurant 4 Road5 Local Road 6 Main Road 7 Hotel 2

University of Stuttgart Center of Excellence Type Hierarchy Linearization Treat type information like a spatial dimension Root0 Building1 Museum 3 Res- taurant 4 Road5 Local Road 6 Main Road 7 Hotel 2 Type dimension Building + all subtypes

University of Stuttgart Center of Excellence Effects of the Spacing in the Type Dimension type dimension spatial dimension Objects are primarily grouped by their type Objects are primarily grouped by their position inner node of index tree object spatial dimension type dimension wide spacing between mapped values narrow spacing between mapped values Affects clustering of objects Determine best type mapping range

University of Stuttgart Center of Excellence Type Mapping Variant: Equal Spread (ES) Type ID mapped value type mapping range range containing all subtypes Same gap between all mapped values The simplest variant

University of Stuttgart Center of Excellence Type Mapping Variant: Type Hierarchy (TH) Type ID mapped value range containing all subtypes type mapping range Same gap between a type and its direct subtypes Cluster objects with same supertype

University of Stuttgart Center of Excellence Type Mapping Variant: Object Distribution (OD) Type ID mapped value range containing all subtypes Size of gap corresponds to the number of instances of a type Cluster infrequent objects by location, cluster frequent objects by type type mapping range Requires additional histogram information

University of Stuttgart Center of Excellence Related Work Spatial Indexes We use them, but don‘t build one Object-oriented Databases Use only point access methods Object-relational Databases Separate table for each type Query many tables for all subtypes Single global table Use point access methods

University of Stuttgart Center of Excellence Outline Motivation and Scenarios Index Structures Related Work Experiments Conclusion

University of Stuttgart Center of Excellence Experimental Setup Data sets from 9 counties in California (TIGER/Line 2003) Universe Width: 15 to 100 km Height: 26 to 115 km 12k to 203k objects 258 types

University of Stuttgart Center of Excellence Total Average Query and Update Response Time Weighted query response time Usage Scenarios -> integration borders Type selectivity Spatial selectivity

University of Stuttgart Center of Excellence Comparing the Type Mapping Ranges 100% 110% 120% 130% 140% 150% 160% ESTHODESTHODESTHODESTHOD Data Provider Discovery Service Integration Middleware Mobile Device Relative Response Time A, = 15km B, = 150km C, = 1500km D, = 15000km E, = 60000km type mapping range Almost best type mapping range is sufficient ^ ^ ^ ^ ^

University of Stuttgart Center of Excellence Comparing the Approaches 100% 110% 120% 130% 140% Data Provider Discovery Service Integration Middleware Mobile Device 150% 200% 250% 300% 350% 400% 450% 500% 550% 600% 650% 700% 481% 518% 690% 446% Relative Response Time SEP indexing approach R3D.1:1 R3D.ES R3D.TH R3D.OD Type mapping does matter! Object Density is the best variant More impact with low type selectivity

University of Stuttgart Center of Excellence Resource Consumption k17k23k46k 54k 72k 151k175k203k Bytes Nano seconds SEPR3D.OD.BR3D.OD.CR3D.OD.DR3D.OD.E Indexing approach Data set size 12k17k23k46k54k72k 151k175k 203k Insertion time per objectMemory per object Scales well with larger data sets Speed costs resources

University of Stuttgart Center of Excellence Conclusion Location-conscious main memory query engine Exploits characteristics of typical queries Deployable to many components Real 3D Index Best performance Type mapping range: Larger than expected Type mapping variant: Object Density Separate Indexes Best resource consumption

University of Stuttgart Center of Excellence Outlook Virtualize query processing Dynamically distribute query capabilites according to load Integrate other dimensions Valid time Measurement time