ORDB Implementation Discussion

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
ORDB Implementation Discussion. From RDB to ORDB Issues to address when adding OO extensions to DBMS system.
Physical Database Monitoring and Tuning the Operational System.
ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.
ORDB Implementation Discussion. From RDB to ORDB Issues to address when adding OO extensions to DBMS system.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Access Path Selection in a Relational Database Management System Selinger et al.
CSCE Database Systems Chapter 15: Query Execution 1.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Completeness Criteria for Object- Relational Database Systems by Won Kim April 2002 Sang Ho Lee School of Computing, Soongsil University
Examples (D. Schmidt et al)
Practical Database Design and Tuning
Information Retrieval in Practice
Chapter 2 Memory and process management
Data Indexing Herbert A. Evans.
Module 11: File Structure
Chapter 14: System Protection
Indexing Structures for Files and Physical Database Design
CHP - 9 File Structures.
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Database Management System
Physical Database Design
Object-Oriented Databases
Database Management Systems (CS 564)
Methodology – Physical Database Design for Relational Databases
Physical Database Design for Relational Databases Step 3 – Step 8
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Chapter 12 Outline Overview of Object Database Concepts
Main Memory Management
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
Chapter 15 QUERY EXECUTION.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Database management concepts
Physical Database Design
Database Query Execution
File System B. Ramamurthy B.Ramamurthy 11/27/2018.
Computer Architecture
Practical Database Design and Tuning
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Lecture 2- Query Processing (continued)
Database management concepts
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Implementation of Relational Operations
Chapter 11 Database Performance Tuning and Query Optimization
Indexing 4/11/2019.
Evaluation of Relational Operations: Other Techniques
Query Optimization.
Query Processing.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
CSE 542: Operating Systems
Evaluation of Relational Operations: Other Techniques
CSE 542: Operating Systems
Database management systems
Presentation transcript:

ORDB Implementation Discussion

From RDB to ORDB Issues to address when adding OO extensions to DBMS system

Layout of Data Deal with large data types : ADTs/blobs special-purpose file space for such data, with special access methods Large fields in one tuple : One single tuple may not even fit on one disk page Must break into sub-tuples and link via disk pointers Flexible layout : constructed types may have flexible sized sets, , e.g., one attribute can be a set of strings. Need to provide meta-data inside each type concerning layout of fields within the tuple Insertion/deletion will cause problems when contiguous layout of ‘tuples’ is assumed

Layout of Data More layout design choices (clustering on disk): Lay out complex object nested and clustered on disk (if nested and not pointer based) Where to store objects that are referenced (shared) by possibly several other and different structures Many design options for objects that are in a type hierarchy with inheritance Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access

Why (Object) Identifier ? Distinguish objects regardless of content and location Evolution of object over time Sharing of objects without copying Continuity of identity (persistence) Versions of a single object

Objects/OIDs/Keys Relational keys: RDB human meaningful name (mix data value with identity) Variable name : PL give name to objects in program (mix addressability with identity) Object identifier : ODB system-assigned globally unique name (location- and data-independent )

OIDs System generated Globally unique Logical identifier (not physical representation; flexibility in relocation) Remains valid for lifetime of object (persistent)

OID Support OID generation : Object handling : uniqueness across time and system Object handling : Operations to test equality/identify Operations to manipulate OIDs for object merging and copying. Deal with avoiding dangling references

OID Implementation By address (physical) By structured address 32 bits; direct fast access like a pointer By structured address E.g., page and slot number Both some physical and logical information By surrogates Purely logical oid Use some algorithm to assure uniqueness By typed surrogates Contains both type id and object id Determine type of object without fetching it

ADTs Type representation: size/storage Type access : import/export Type manipulation: special methods to serve as filter predicates and join predicates Special-purpose index structures : efficiency

ADTs Mechanism to add index support along with ADT: External storage of index file outside DBMS Provide “access method interface” a la: Open(), close(), search(x), retrieve-next() Plus, statistics on external index Or, generic ‘template’ index structure Generalized Search Tree (GiST) – user-extensible Concurrency/recovery provided

Query Processing Query Parsing : Query Rewriting: Type checking for methods Subtyping/Overriding Query Rewriting: May translate path expressions into join operators Deal with collection hierarchies (UNION?) Indices or extraction out of collection hierarchy

Query Optimization Core New algebra operators must be designed : such as nest, unnest, array-ops, values/objects, etc. Query optimizer must integrate them into optimization process : New Rewrite rules New Costing New Heuristics

Query Optimization Revisited Existing algebra operators revisited : SELECT Where clause expressions can be expensive So SELECT pushdown may be bad heuristic

Selection Condition Rewriting EXAMPLE: (tuple.attribute < 50) Only CPU time (on the fly) (tuple.location OVERLAPS lake-object) Possibly complex CPU-heavy computations May Involve both IO and CPU costs State-of-art: consider reduction factor only Now, we must consider both factors: Cost factor : dramatic variations Reduction factor: unrelated to cost factor

Operator Ordering op1 op2

Ordering of SELECT Operators Cost factor : now could be dramatic variations Reduction factor: orthogonal to cost factor We want maximal reduction and minimal cost: Rank ( operator ) = (reduction) * ( 1/cost ) Order operators by increasing ‘rank’ High rank : (good) -> low in cost, and large reduction Low rank (bad) -> high in cost, and small reduction

Access Structures/Indices ( on what ?) Indices that are ADT specific Indices on navigation path Indices on methods, not just on columns Indices over collection hierarchies (trade-offs) Indices for new WHERE clause expressions not just =, <, > ; but also “overlaps”,”similar”

Registering New Index (to Optimizer) What WHERE conditions it supports Estimated cost for “matching tuple” (IO/CPU) Given by index designer (user?) Monitor statistics; even construct test plans Estimation of reduction factors/join factors Register auxiliary function to estimate factor Provide simple defaults

Methods Use ADT/methods in query specification Achieves: flexibility extensibility

Methods Extensibility : Dynamic linking of methods defined outside DB Flexibility : Overwriting methods for type hierarchy Semantics : Use of “methods” with implied semantics? Incorporation of methods into query process may cause side-effects? Performance of methods may be unpredictable ? Termination may not be guaranteed?

Methods “Untrusted” methods : Handling of “untrusted” methods : corrupt server modify DB content (side effects) Handling of “untrusted” methods : restrict language; interpret vs compile, separate address space of DB server

Query Optimization with Methods Estimation of “costs” of method predicates See earlier discussion Optimization of method execution: Methods may be very expensive to execute Idea: Similar as handling correlated nested subqueries Recognize repetition and rewrite physical plan. Provide some level of pre- computation and reuse

Strategies for Method Execution 1. If called on same input, cache that one result 2. If on full column, presort column first (groupby) 3. Or, in general use full precomputation: Precompute results for all domain values (parameters) Put in hash-table : fct (val ); During query processing lookup in hash-table val  fct (val) Or, possibly even perform a join with this table

Query Processing User-defined methods User-defined aggregate functions: E.g., “second largest” or “most brightest picture” Distributive aggregates: incremental computation

Query Processing: Distribute Aggregates For incremental computation of distributive aggregates: Provide: Initialize(): set up state space Iterate(): per tuple update the state Terminate(): compute final result based on state; and cleanup state For example : “second largest” Initialize(): 2 fields Iterate(): per tuple compare numbers Terminate(): remove 2 fields

Following Disk Pointers? Complex object structures with object pointers may exist (~ disk pointers) Navigate complex objects following pointers Long-running transaction like in CAD design may work with complex object for longer duration Question : What to do about “pointers” between subobjects or related objects ?

Following Disk Pointers: Options Swizzle : Swizzle = Replace OIDs references by in-memory pointers Unswizzle = Convert back to disk-pointers when flushing to disk. Issues : In-memory table of OIDs and their state Indicate in each object, pointer type via a bit. Different policies for swizzling: never on access attached to object brought in

Persistence? We may want both persistent and transient data Why ? Programming language variables Handle intermediate data May want to apply queries to transient data

Properties for Persistence? Orthogonal to types : Data of any type can be persistent Transparent to programmer : Programmer can treat persistent and non-persistent objects the same way Independent from mass storage: No explicit read and write to persistent database

Models of Persistence Persistence by type Persistence by call Persistence by reachability

Model of Persistence : by type Parallel type systems: Persistence by type, e.g., int and dbint Programmer is responsible to make objects persistent Programmer must make decision at object creation time Allow for user control by “casting” types

Model of Persistence : by call Persistence by explicit call Explicit create/delete to persistent space E.g., objects must be placed into “persistent containers” such as relations in order to be kept around Eg., Insert object into Collection MyBooks; Could be rather dynamic control without casting Relatively simple to implement by DBMS

Model of Persistence: by reachability Use global (or named) variables to objects and structures Objects being referenced by other objects that are reachable by application, then they are also persistent by transitivity No explicit deletes; rather need garbage collection to garbage the objects away once no longer referenced Garbage collection techniques : mark&sweep : mark all objects reachable from persistent roots; then delete others scavenging : copy all reachable objects from one space to the other; but may suffer in disk-based environment due to IO overhead and distruction of clustering

Tradeoffs By type By call By reference Persistent/ transient Orthogonal to type At creation time/any time Can objects dynamically switch (flex) Transparent to use; DB independent Explicit control by user DBMS impl cost

Summary A lot of work to get to OO support : From physical database design/layout issues up to logical query optimizer extensions ORDB: Reuses existing implementation base and incrementally adds new features on (but relation is first-class citizen)