University of Florida’s dchecker: Software for ensuring semantic data integrity Nicholas Rejack, MS 1, Christopher P. Barnes 1, Michael Conlon, PhD 2

Slides:



Advertisements
Similar presentations
Relational Database Systems Higher Information Systems Advanced Implementation in MySQL/PHP.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Database Integrity, Security and Recovery Database integrity Database integrity Database security Database security Database recovery Database recovery.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Introduction to Structured Query Language (SQL)
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
5 Chapter 5 Structured Query Language (SQL1) Revision.
A Guide to MySQL 7. 2 Objectives Understand, define, and drop views Recognize the benefits of using views Use a view to update data Grant and revoke users’
Introduction to Structured Query Language (SQL)
Project Implementation for COSC 5050 Distributed Database Applications Lab6.
10/3/2000SIMS 257: Database Management -- Ray Larson Relational Algebra and Calculus University of California, Berkeley School of Information Management.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
Database Constraints. Database constraints are restrictions on the contents of the database or on database operations Database constraints provide a way.
CONSTRAINTS AND UPDATES CHAPTER 3 (6/E) CHAPTER 5 (5/E) 1.
Chapter 7 Constraints and Triggers Spring 2011 Instructor: Hassan Khosravi.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Chapter 4: Organizing and Manipulating the Data in Databases
Chapter 6: Integrity and Security Thomas Nikl 19 October, 2004 CS157B.
Database Lecture # 1 By Ubaid Ullah.
The Relational Model These slides are based on the slides of your text book.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
Database Management System Lecture 6 The Relational Database Model – Keys, Integrity Rules.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
An Investigation of Oracle and SQL Server with respect to Integrity, and SQL Language standards Presented by: Paul Tarwireyi Supervisor: John Ebden Date:
Database Technical Session By: Prof. Adarsh Patel.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Chapter 9 Integrity. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.9-2 Topics in this Chapter Predicates and Propositions Internal vs.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
DBMS Spring 2014 Database Integrity Sources: Security in Computing, Pfleeger and Pfleeger, Prentice Hall, 2003 Lecture Slides, CSE6243, MSU, Rayford B.
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
University of Florida CTSI: Consuming and disambiguating publications data from Microsoft Academic Search in VIVO. Nicholas Rejack 1, Erik Schmidt 1, Michael.
1 The Relational Model. 2 Why Study the Relational Model? v Most widely used model. – Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. v “Legacy.
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
IST 220 Introduction to Databases Course Wrap-up.
Maintaining a Database Access Project 3. 2 What is Database Maintenance ?  Maintaining a database means modifying the data to keep it up-to-date. This.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Concepts 2440: 180 Database Concepts Instructor:
Constraints cis 407 Types of Constraints & Naming Key Constraints Unique Constraints Check Constraints Default Constraints Misc Rules and Defaults Triggers.
Chapter 9 Constraints. Chapter Objectives  Explain the purpose of constraints in a table  Distinguish among PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK,
Oracle 11g: SQL Chapter 4 Constraints.
Introduction to Database System Adisak Intana Lecturer Chapter 7 : Data Integrity.
7 1 Constraints & Triggers Chapter Constraints and triggers? Constraints: Certain properties that the DBMS is required to enforce –E.g. primary.
Chapter 4 Constraints Oracle 10g: SQL. Oracle 10g: SQL 2 Objectives Explain the purpose of constraints in a table Distinguish among PRIMARY KEY, FOREIGN.
Database Management Systems (DBMS)
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
The relational model A data model (in general) : Integrated collection of concepts for describing data (data requirements). Relational model was introduced.
Constraints Lesson 8. Skills Matrix Constraints Domain Integrity: A domain refers to a column in a table. Domain integrity includes data types, rules,
Session 1 Module 1: Introduction to Data Integrity
ITS232 Introduction To Database Management Systems Siti Nurbaya Ismail Faculty of Computer Science & Mathematics, Universiti Teknologi MARA (UiTM), Kedah.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Date: 13/03/2015 Training Reference: 2015 GIS_01 Document Reference: 2015GIS_01/PPT/L4 Issue: 2015/L4/1/V1 Addis Ababa, Ethiopia GIS Data Base Management.
Lecture 03 Constraints. Example Schema CONSTRAINTS.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Constraints Advanced Database Systems Dr. AlaaEddin Almabhouh.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Database Constraints ICT 011. Database Constraints Database constraints are restrictions on the contents of the database or on database operations Database.
Database Constraints Ashima Wadhwa. Database Constraints Database constraints are restrictions on the contents of the database or on database operations.
Chapter 6 - Database Implementation and Use
Presentation transcript:

University of Florida’s dchecker: Software for ensuring semantic data integrity Nicholas Rejack, MS 1, Christopher P. Barnes 1, Michael Conlon, PhD 2 1 Clinical and Translational Science Informatics and Technology, University of Florida 2 Clinical and Translational Science Institute, University of Florida Introduction Upkeep of data integrity requires constant vigilance. The problem of ensuring data integrity in DBMS is well understood, but this is not the case for semantic web triple data. UF VIVO: a researcher networking application The University of Florida has implemented VIVO ( a semantic web application for researcher networking. Although VIVO will reject malformed RDF/XML that does not pass validation, it has relatively few restrictions on the data it will accept. Types of data integrity Unique identifiers must truly be unique per individual, a property defined as a book title must not hold chapter headings, people must not also be classed as organizations, and so forth. Ensuring data is properly defined semantically helps supplementary processes like automated semantic reasoning. UF’s dchecker software Dchecker is a Python script that runs daily on a set of associated SPARQL queries. Queries can be added indefinitely to expand the capabilities. Some examples of data constraints checked: referential integrity: links between authors and their publications must be valid. Positions must be linked to people and organizations. domain integrity: numeric identifiers must consist only of integers. semantic integrity: unique identifiers must not be duplicated across people, or on a single person data restrictions integrity: restricted data must not be exposed. people within sensitive organizations must be protected from public display. constraintDBMSsemantic systems referential integrity foreign keys must reference a primary key in a parent table references to other URIs must be valid domain integritycolumns must be declared on a defined domain data properties must hold proper data types (strings, ints, etc.) semantic integrityoften defined identically to domain integrity data must conform to the definitions inherent in the ontology being used Conclusion Unlike DBMS systems where data can be de-normalized across many tables, semantic data on particular subjects collects on one URI. Future semantic data checking should consider the totality of facts collected on a URI to ensure semantic correctness: people should not have facts that would correspond to books such as page numbers, for example. Future expansion of dchecker should be able to circumvent some of the constraints of the SPARQL query language and perform multiple parallel queries to compare the results. Insertion of problematic entries into the dchecker report would make data repair easy. In addition, rules for automated data correction would enable hands-free cleanup. Fig. 1: example SPARQL queries in command line Fig. 2: sample data quality report. Note the non-zero entries representing errors that need to be corrected.