SQLSaturday Mountain View, March 15, 2014

Slides:



Advertisements
Similar presentations
MSc IT UFCE8K-15-M Data Management Prakash Chatterjee Room 2Q18
Advertisements

CIT 613: Relational Database Development using SQL Introduction to SQL.
Accounting System Design
Relational Databases Chapter 4.
Fundamentals, Design, and Implementation, 9/e COS 346 Day 8.
Fundamentals, Design, and Implementation, 9/e Chapter 5 Database Design.
Centralian Senior College. Examples  Add and subtract  Write a paragraph  An amoeba  The conventions of punctuation  When oppression meets resistance,
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Database Management Exploring the Territory. Database vs Flat Files Flat Files –Characters-fields-records-files Files are not designed to work together.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Chapter 8 Introduction to Hypothesis Testing
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
CODD’s 12 RULES OF RELATIONAL DATABASE
Module 3: The Relational Model.  Overview Terminology Relational Data Structure Mathematical Relations Database Relations Relational Keys Relational.
Lecture 7 Integrity & Veracity UFCE8K-15-M: Data Management.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
CIT 613: Relational Database Development using SQL Introduction to SQL DeSiaMorePowered by DeSiaMore 1.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Human Computer Interaction Lecture 21 User Support
Getting started with Accurately Storing Data
Logical Database Design and the Rational Model
Introduction To DBMS.
Copyright © Cengage Learning. All rights reserved.
Chapter 5 Database Design
Axiomatic Number Theory and Gödel’s Incompleteness Theorems
The Relational Database Model
CHAPTER 7 DATABASE ACCESS THROUGH WEB
IS 130 Information systems 1
Chapter 6 - Database Implementation and Use
David M. Kroenke and David J
Database Systems Chapter 3 1.
(C) Copyright Fabian Pascal
Thinking Skills Paper 2.
STRUCTURE OF PRESENTATION :
Analyzing Strategic Management Cases
Fundamentals of Information Systems, Sixth Edition
LIS 384K.11 Database-Management Principles and Applications
Data Cleansing with SQL and R Kevin Feasel
Databases and Database Management Systems Chapter 9
WHAT IS THE NATURE OF SCIENCE?
Chapter 4 Relational Model Characteristics
Fundamentals of Information Systems
Lecture 2 The Relational Model
Qualitative and Quantitative Data
Database Design Using Normalization
Translation of ER-diagram into Relational Schema
CIS 336 Competitive Success/snaptutorial.com
CIS 336 PAPERS Education Your Life-- cis336papers.com.
CIS 336 Education for Service-- snaptutorial.com.
CIS 336 STUDY Education for Service-- cis336study.com.
CIS 336 Teaching Effectively-- snaptutorial.com
CIS 336 PAPERS Education for Service-- cis336papers.com.
COS 346 Day 8.
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Chapter 3 The Relational Database Model
Accounting System Design
Database solutions Database environment Marzena Nowakowska Faculty of Management and Computer Modelling Kielce University of Technology rooms: 3.21 C,
Accounting System Design
SQL: Structured Query Language
Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 8 Objective Test Items.
Chapter 5 Advanced Data Modeling
Developing a Data Model
Retail Sales is used to illustrate a first dimensional model
Copyright © 2018, 2015, 20 Pearson Education, Inc. All Rights Reserved Database Concepts Eighth Edition Chapter # 2 The Relational Model.
Design tools and techniques for a relational database system
Somethings are fairly straightfoward
Relational data model. Codd's Rule E.F Codd was a Computer Scientist who invented Relational model for Database management. Based on relational model,
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Presentation transcript:

SQLSaturday Mountain View, March 15, 2014 THE LAST NULL IN THE COFFIN --------------------------- A Relational Solution to Missing Data SQLSaturday Mountain View, March 15, 2014 Fabian Pascal www.DBDebunk.com

RM: 2VL Propositions: true (facts) or false R-tables No missing data* Real World Propositions: true (facts) or false R-tables No missing data* Inferences provably logically correct with respect to the real world * Missing data  non-R tables Copyright (c) 2012 Fabian Pascal All Rights Reserved

CWA Sample #102 has 49.2% SiO2 Sample_ID SiO2% ----------+--------- 102 49.2 Perfect knowledge All present rows: true propositions All absent rows: false propositions Copyright (c) 2012 Fabian Pascal All Rights Reserved

IMPERFECT KNOWLEDGE “[2VL/CWA] may indeed be the real world of warehouse management and parts ordering, but it is most certainly not the real world of observational science, where data items are quite routinely imprecise, incomplete, or missing. In this real world, the correct result of a query is not in general either 'true' or 'false' but can be 'unknown‘ ... --S. Henley Copyright (c) 2012 Fabian Pascal All Rights Reserved

INTERPRETATION Sample #102 is reported to contain 49.2% SiO2 Sample_ID SiO2% ----------+--------- 102 49.2 Copyright (c) 2012 Fabian Pascal All Rights Reserved

REPRESENTATION Sample #102 is reported to contain 49.2% SiO2 Sample_ID SiO2% ----------+-------------- 102 Reported 49.2 Copyright (c) 2012 Fabian Pascal All Rights Reserved

PERFECTLY VALID … there is no obvious reason to exclude semi-numeric data (such as "below 0.1% detection limit"), or non-numeric data (such as "sample contaminated, submitting for re-analysis") or (heaven forbid!) "missing" - even the word "null" if this is not a red rag to a bull. Any such data values (or non-values) might perfectly validly be transcribed from a laboratory report, where conventionally a "-" character is used to signify that an analysis value is missing (or alternatively some such code as "n/a" for "not analysed") … there is no a priori reason to discriminate against putting such codes into the database. --S. Henley Copyright (c) 2012 Fabian Pascal All Rights Reserved

VALUES AND NON-VALUES Values Domain-specific Special values Default values Non-values Marks (absence of values) Copyright (c) 2012 Fabian Pascal All Rights Reserved

BEEN THERE, DONE THAT A-mark: unknown I-mark: inapplicable 4VL (Codd) A-mark: unknown I-mark: inapplicable Default values (Date) Renounced! Binary relations/6NF (Darwen) Copyright (c) 2012 Fabian Pascal All Rights Reserved

SQL NULL Consistent Sufficient NULL behavior Ad-hoc/arbitrary No sound nVL n>2 (McGoveran) Consistent Sufficient NULL behavior Ad-hoc/arbitrary Insidious (representation) unintuitive complex Misused as 4VL Copyright (c) 2012 Fabian Pascal All Rights Reserved

TRUE? “[2VL/CWA] may indeed be the real world of warehouse management and parts ordering, but it is most certainly not the real world of observational science, where data items are quite routinely imprecise, incomplete, or missing. In this real world, the correct result of a query is not in general either 'true' or 'false‘, but can be 'unknown‘ …” --S. Henley Copyright (c) 2012 Fabian Pascal All Rights Reserved

“MISSINGNESS” The fact remains that working within the CWA and 2VL, although Date, Darwen, and Pascal have each proposed methods by which the 'null' representation of missing data can be avoided, none have suggested any way in which the 'missingness' of data can properly be manipulated. The basic reason for this is that when the required correct answer is unknown", this simply cannot be produced by a two-valued logic which knows only "true" or "false". --S. Henley Copyright (c) 2012 Fabian Pascal All Rights Reserved

CONFUSION OVER REALMS The real world obeys 2VL regardless of what our knowledge of it is! “Bundling” imperfect knowledge with the real world inhibits ability to realize that 2VL/CWA is the solution, not the problem; Overcome this confusion and a relational solution presents itself. Copyright (c) 2012 Fabian Pascal All Rights Reserved

THE LAST NULL IN THE COFFIN 2VL/CWA solution Guarantees data integrity and provably logically correct query results with respect to real world; Avoids the problems of 3VL/NULL; Requires no changes to the relational model; Is mostly transparent to users; Puts burden on the DBMS, where it belongs; Less likely to confuse users & DBMS designers; Keeps users better apprised of the existence and implications of missing data; Encourages/rewards minimizing missing data. Copyright (c) 2012 Fabian Pascal All Rights Reserved

HINTS Assert only the known! “Missingness”: whose attribute? Known Known unknown Copyright (c) 2012 Fabian Pascal All Rights Reserved

KNOWN Copyright (c) 2012 Fabian Pascal All Rights Reserved

KNOWN UNKNOWN Copyright (c) 2012 Fabian Pascal All Rights Reserved

IMPLEMENTATION APPROACH http://bookboon.com/en/go-faster-ebook Copyright (c) 2012 Fabian Pascal All Rights Reserved

Copyright (c) 2012 Fabian Pascal All Rights Reserved

DATA FUNDAMENTALS Education--distinct from tool-specific training--useful for any and all DBMS products used; Dispelling myths and misconceptions about Explain the practical implications of Data fundamentals Concepts, principles and methods Little, no, or incorrect coverage in the industry For data professionals and users who prefer To think for themselves Understanding to "cookbooks" Soundness to marketing fads and fashion Copyright (c) 2012 Fabian Pascal All Rights Reserved

SEMINAR & PAPER SERIES PRACTICAL DATABASE FOUNDATIONS 0. Truly Relational: What It Really Means Business Modeling for Database Design The Costly Illusion: Normalization, Integrity and Performance The Final NULL in the Coffin: A Relational Solution to Missing Data The Key to Keys: A Matter of Identity More forthcoming Copyright (c) 2012 Fabian Pascal All Rights Reserved

www.dbdebunk.com Articles on data fundamentals; Debunkings of industry claims; Articles on data fundamentals; Online exchanges I participate in; Contributions to other publishers; Weekly Quotes & To Laugh or Cry? Industry material for which it is difficult to know which of the two reactions is warranted; Illustrates the poor state of foundation knowledge; Offer opportunity to test oneself on knowledge and comprehension of data fundamentals; Copyright (c) 2012 Fabian Pascal All Rights Reserved