bdbms: A Database System for Scientific Data Management Mohamed Y. Eltabakh, Mourad Ouzzani, Walid G. Aref, Ahmed Elmagarmid, Yasin Silva, Umer Arshad, David Salt, Ivan Baxter Purdue University, Department of Computer Science, Cyber Center, Department of Horticulture and Landscape Architecture Annotation Management Annotations at multiple granularities (tuple vs. column, cell) Annotating data and operations Provenance (lineage) is handled as a special type of annotations Attach articles about this entry (Tuple level) This column is computed using a prediction tool (Column level) Experimentally verified (Cell level) S1S1 copy S2S2 Local insert operation P1P1 update S3S3 overwrite Q1: Where do these values come from? Q2: What is the source of this value at time T? AnnotationsProvenance (lineage) Data copied from Database D 1 (Table level) Adding Annotations at various Granularities Storage Optimization Techniques Archiving/Restoring Annotations Propagating/Filtering Annotations ADD NNOTATION [AS VIEW] TO VALUE [ON UPADTE PROPAGTE] [ON AGGREGATION PROPAGATE] ON ARCHIVE NNOTATION FROM WHERE ON CREATE ANNOTATION TABLE ON SELECT [DISTINCT] C i [PROMOTE ( C j, C k, …)], … FROM Relation_name [ANNOTATION ( S 1, S 2, …)], … [WHERE ] [GROUP BY [HAVING ] Compression: Annotation tables store annotations in a compressed form Indexing: Building spatial index structures on annotations for efficient retrieval Categorization: Annotation tables allow categorization of annotations Archived annotations are not propagated along with query results ANNOTATION: qualifier to specify the propagated annotations PROMOTE: Carries the annotations from un-projected attributes ADD ANNOTATION Query Processing Execute the SELECT statement Identify the output rows and columns Map the rows and columns to an ordered domain Which mapping is more efficient? Storage_Order Mapping Correlated_Columns Mapping Correlated_Rows Mapping Map the target table cells to be annotated to rectangles Snapshot versus View Annotations Snapshot Annotations: command is evaluated once and the annotation is attached to the current query results View Annotations: command is evaluated on the current database snapshot and continuously applied over new tuples Eager Approach: apply the annotation command at the insertion time Lazy Approach: apply the annotation command at the query time Archiving Annotations SELECT statement Query Processing Identify cells on which annotations are archived Map the cells to rectangles Representation of Archived Annotations A single annotation rectangle may be divided into smaller ones How to divide an annotation rectangle? Non-traditional and Novel Access Methods Efficient indexing structures New operators to support complex search operations Efficient query processing Indexing compressed sequences Data compression techniques Biological sequences are very large Compressed sequences New index structures for compressed sequences Indexing Compressed Sequences (SBC-Tree) Compression techniques gain significant importance: Significant storage reduction Reducing buffer requirements Reducing number of I/Os Enhance the overall system performance Spatial Data Indexing (SP-GiST Framework) Implementing non-traditional indexes involves significant overhead Functionalities (Insertion, deletion, searching), Storage management, integration, Recovery and concurrency control Extensible indexing frameworks Software engineering solution, One-time core development, Many times low- cost instantiation of a variety of index structures