CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Advertisements

SQL Server Accelerator for Business Intelligence (SSABI)
Learning about software Interfaces.  In this lab, you will examine  Excel Spreadsheet Interface  Access Database Interface  You will also learn about.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Introduction to Structured Query Language (SQL)
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
11 3 / 12 CHAPTER Databases MIS105 Lec14 Irfan Ahmed Ilyas.
Introduction to Structured Query Language (SQL)
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Chapter 13 The Data Warehouse
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Midterm 1 Concepts Relational Algebra (DB4) SQL Querying and updating (DB5) Constraints and Triggers (DB11) Unified Modeling Language (DB9) Relational.
Microsoft Access Ervin Ha.
IST Databases and DBMSs Todd S. Bacastow January 2005.
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
CSC2012 Database Technology & CSC2513 Database Systems.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Web-Enabled Decision Support Systems
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
Introduction to SQL Steve Perry
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
Concepts and Terminology Introduction to Database.
CS 474 Database Design and Application Terminology Jan 11, 2000.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Siebel 8.0 Module 5: EIM Processing Integrating Siebel Applications.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 Functional Dependencies and Normalization Chapter 15.
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
XML Access Control Koukis Dimitris Padeleris Pashalis.
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Normalisation RELATIONAL DATABASES.  Last week we looked at elements of designing a database and the generation of an ERD  As part of the design and.
MS ACCESS How and Why Second Semester First Quarter Project One.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
SSMS SQL Server Management System. SQL Server Microsoft SQL Server is a Relational Database Management System (RDBMS) Relational Database Management System.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
Fundamental of Database Systems
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CPSC-310 Database Systems
Databases and DBMSs Todd S. Bacastow January
Chapter 1: Introduction
Chapter 13 The Data Warehouse
ICT Database Lesson 1 What is a Database?.
Database management concepts
Database.
The Relational Model Textbook /7/2018.
Data Model.
Unit I-2.
Database management concepts
Automating and Validating Edits
Query Processing CSD305 Advanced Databases.
Course Instructor: Supriya Gupta Asstt. Prof
Assertions and Triggers
Presentation transcript:

CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Outline  Motivation  Interface  Algebra  Guidance Feature  Storage  Evaluation

 Non-technical people directly exposed to data.  Hard to design a schema in advance.  Start with a simple structure and grow it as needed.  We call this process organic schema evolution Motivation

Motivation Cont’d  While users have the freedom of organically growing their schema, the data is now subject to denormalization.  Consequently, users have to explicitly deal with duplicated data entries, which may produce errors that violate integrity constraints.  Therefore, an organic database system must:  Make it easy for the end user to make schema changes  Guarantee efficient and safe data entry  Implement these features with low cost

Challenges  Schema Update Specification  Data Migration  Data Entry  Schema Evolution Performance

Outline  Motivation  Interface  Algebra  Guidance Feature  Storage  Evaluation

 Flat spreadsheets NameCityAddress KeithAnn Arbor202 Main NameCityAddress MaryChicago2364 Bishop KeithAnn Arbor101 Plymouth Spreadsheet? IDNameCity 1MaryChicago 2KeithAnn Arbor IDAddress Bishop 2101 Plymouth 2202 Main Person Address v.s. Hierarchical semantics

How to support hierarchical semantics?  We permit nesting! NameCity [Address] Address MaryChicago2364 Bishop KeithAnn Arbor 101 Plymouth 202 Main

Span Table  Span Table: a next-generation spreadsheet that nests data in a single representation: Specify an evolution by dragging StateName inside Address Specify an evolution by dragging Person upward. schema data

Outline  Motivation  Interface  Algebra  Guidance Feature  Storage  Evaluation

Data Migration in Schema Evolution  Data needs to be migrated from the old schema to the new one.  May involve data copy/merge.  Users need to edit in a cell-by-cell manner. NameCityAddress MaryChicago2364 Bishop KeithAnn Arbor101 Plymouth KeithAnn Arbor202 Main NameCity[Address] Address MaryChicago2364 Bishop KeithAnn Arbor 101 Plymouth 202 Main

Introducing Operators!  Schema restructuring operators:  IMPORT, EXPORT, FLOAT, SINK  Extended spreadsheet operators:  Schema modification: Adding/Dropping Columns  Data manipulation: Inserting/Deleting/Updating Tuples  Collectively, we call this set of operators Span Table Algebra.

Span Table Algebra: Schema Restructuring Operators OperatorDescription Import(A)Move A inward into a descendant relation. Export(A)Move A outward into an ancestor relation. Sink(A)Push A to create a new leaf relation. Float(A)Lift A to create a new intermediate level. NameCityAddress MaryChicago2364 Bishop KeithAnn Arbor101 Plymouth KeithAnn Arbor202 Main Sink(Address) NameCity[Address] Address MaryChicago2364 Bishop KeithAnn Arbor 101 Plymouth 202 Main Name[Address] CityAddress MaryChicago2364 Bishop Keith Ann Arbor 101 Plymouth Ann Arbor 202 Main Import(City)Export(City) NameCity[Address] Address MaryChicago2364 Bishop KeithAnn Arbor 101 Plymouth 202 Main

Span Table Algebra: Expressive Power Analysis  Import and Export etc. can be expressed in terms of Nest and Unnest:  Nest and Unnest can be expressed as a sequence of Span Table Operators: Detailed proofs in paper appendix.

Outline  Motivation  Interface  Algebra  Guidance Feature  Storage  Evaluation

Inevitable Denormalization  Traditional design uses data integrity constraints  We can not do this since we have no pre-defined constraints  Denormalization ABC a0b0c0 a0b0c1 FD: A  B ABC a0b0c0 a0b1c1

Guide User Data Entry  We maintain a set of “soft” functional dependencies (FDs) to guide user data entry:  Inductive completion  Error prevention IDNameCourseGrade 1PeterMathA 2PeterPhysicsA 3LeoMathB FD: Name  Grade IDNameCourseGrade 1PeterMathA 2PeterPhysicsA 3LeoMathB Leo IDNameCourseGrade 1PeterMathA 2PeterPhysicsA 3LeoMathB LeoB IDNameCourseGrade 1PeterMathA 2PeterPhysicsA 3LeoMathB LeoC (1) rollback (2) also update relevant entries to preserve data integrity (3) force the entry and update the soft FDs. IDNameCourseGrade 1PeterMathA 2PeterPhysicsA 3LeoMathC LeoC IDNameCourseGrade 1PeterMathA 2PeterPhysicsA 3LeoMathB LeoC FD: Name, Course  Grade

How to Manage FDs?  Frequent data entry  Frequent FD re-induction  Past solution too expensive to be applied  Incremental FD Induction (IFDI):  Induce Initial FDs and maintain important data structures.  Maintain these structures and incrementally re-induce FDs.  We optimize the way to update these structures so that the algorithm is able to respond in real time.

Outline  Motivation  Interface  Algebra  Guidance Feature  Storage  Evaluation

Vertical Partitioning  Span tables are vertically partitioned and stored in relational databases.  Connecting span table to underlying storage:  Upward mapping  Downward mapping

Outline  Motivation  Interface  Algebra  Guidance Feature  Storage  Evaluation

Evaluation:  Our experiments are designed to answer four questions:  Span Table usability  Guidance feature usability  IFDI efficiency  Storage performance

Evaluation: User Study on Schema Operations  Tasks:  Schema Design: Create the schema for an address book.  Schema Update: Move an attribute from one relation to another in a gene database.  Measure:  Time to complete each task.  Compared against SSMS (MS SQL Server Management Studio 2008). All users failed in this task using SSMS since they were unable to migrate the data manually. In contrast, all of them were able to complete the task within seconds with CRIUS. Schema DesignSchema Update

Evaluation: User study on Integrity-Based Guidance  The three tasks:  Insert a new contact and his address into the address book.  Update the cell phone number of one contact.  Update the address of one contact to the address of another contact.  Measure:  time to complete each task, and  overall count of key strokes/mouse clicks.  Compare with and without the guidance feature on.

Conclusion  The design and implementation of CRIUS  Span table algebra  Integrity-based guidance based on IFDI  Storage  Evaluation

Questions ? ?

IFDI: Inducing Initial FDs IDNameCourseGrade 1PeterMathA 2PeterPhysicsA 3LeoMathB 4LeoPhysicsB 5JackMathA Attribute Partitions: P N = {(1,2), (3,4), (5)} P C = {(1,3,5), (2,4)} P G = {(1,2,5), (3,4)} P NC = {(1), (2), (3), (4), (5)} P NG = {(1,2), (3,4), (5)} P CG = {(1,5), (2), (3), (4)} P NCG = {(1), (2), (3), (4), (5)} X  Y iff P X = P XUY NCG NGNCCG CNG {(1,2), (3,4)}{(1,3,5), (2,4)}{(1,2,5), (3,4)} {}{(1,2), (3,4)}{(1,5)} {} Attribute Lattice: N  G since P N = P NG NC  G since P NC = P NCG (dominated by the above) P XUY = P X · P Y

IFDI: Maintaining FDs on Value Update IDNameCourseGrade 1PeterMathA 2PeterPhysicsABAB 3LeoMathB 4LeoPhysicsB 5JackMathA Attribute Partitions: P G = {(1,2,5), (3,4)} P NG = {(1,2), (3,4), (5)} P CG = {(1,5), (2), (3), (4)} P NCG = {(1), (2), (3), (4), (5)} P G = {(1,5), (2,3,4)} P NG = {(1), (2), (3,4), (5)} P CG = {(1,5), (2, 4), (3)} P NCG = {(1), (2), (3), (4), (5)} X  Y iff P X = P XUY NCG NGNCCG CNG {(1,2), (3,4)}{(1,3,5), (2,4)} {(1,2,5), (3,4)} {} {(1,2), (3,4)} {(1,5)} {} Attribute Lattice: N  G no longer holds since P N ≠ P NG NC  G since P NC = P NCG {(3,4)} {(1,5), (2,4)} {} {(1,5), (2,3,4)} ↑ ↑ ↑ ↑ Attribute Partitions: P N = {(1,2), (3,4), (5)} P C = {(1,3,5), (2,4)} P G = {(1,2,5), (3,4)}  P G = {(1,5), (2,3,4)} P NC = {(1), (2), (3), (4), (5)} P NG = {(1,2), (3,4), (5)}  P NG = {(1), (2), (3,4), (5)} P CG = {(1,5), (2), (3), (4)}  P CG = {(1,5), (2, 4), (3)} P NCG = {(1), (2), (3), (4), (5)}  P NCG = {(1), (2), (3), (4), (5)} Only visit half of the lattice nodes!

P’ G = {(1,5), (2,3,4)} P C = {(1,3,5), (2,4)} P CG = {(1,5), (2)}P CG = {(1,5), (2), (3), (4)} IFDI: Maintaining FDs on Value Update Cont’d  How do we efficiently update attribute partitions? P CG = {(1,5), (2), (3), (4)}  P CG = {(1,5), (2, 4), (3)} when tuple 2 is updated. P C = {(1,3,5), (2,4)} P G = {(1,2,5), (3,4)} P CG = {} S 1 = {1,5} S 2 = {2} P G = {(1,2,5), (3,4)} S 1 = {3} S 2 = {4} S 1 = {} S 2 = {} P C = {(1,3,5), (2,4)} P’ G = {(1,5), (2,3,4)} P’ CG = {(1,5), (2, 4), (3)} P CG = P C · P G P’ CG = P C · P’ G P C = {(1,3,5), (2,4)} Naively re-computing product:Incrementally update product: P’ G = {(1,5), (2,3,4)} P’ CG = Update (P CG, P C, P’ G, tid) P CG = {(1,5), (2), (3), (4)}tid = 2 1) Remove tuple from the old group: 2) Add tuple to the new group: P’ CG = {(1,5), (2), (3), (4)}P’ CG = {(1,5), (3), (4)} P’ CG = {(1,5), (2, 4), (3)}

Evaluation: User Study on Schema Operations Cont’d  Task:  move an attribute across relations in a gene database (the same as before).  Measure:  time to complete the task.  Compare CRIUS with a strawman system with only nested relational operators.

Evaluation: Performance of IFDI  Task:  Re-generate the minimal FDs on value update.  Measure:  The time to complete the task.  Compare IFDI with the naive algorithm. a five-column table with varying row sizea ten-thousand-row table with varying column size.

Evaluation: Performance of Vertical Storage  Tasks:  Execute an schema update.  Load data from the relational back-end and construct a span table.  Measure:  Time to complete each task.  Compare CRIUS with the naive storage MB ms Time to move an attribute with varying DB size.Time to display data with varying DB size.