Special Interest Activity Trio – A System for Integrated Management of Data, Uncertainty, and Lineage ITK478 Yan Cui.

Slides:



Advertisements
Similar presentations
Advanced SQL (part 1) CS263 Lecture 7.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom.
SQL This presentation will cover: A Brief History of DBMS View in database MySQL installation.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Introduction to Structured Query Language (SQL)
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Fundamentals, Design, and Implementation, 9/e Chapter 12 ODBC, OLE DB, ADO, and ASP.
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
Introduction to Structured Query Language (SQL)
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University.
Database Systems More SQL Database Design -- More SQL1.
Introduction to Structured Query Language (SQL)
A Guide to MySQL 3. 2 Objectives Start MySQL and learn how to use the MySQL Reference Manual Create a database Change (activate) a database Create tables.
Concepts of Database Management Sixth Edition
Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom Stanford University.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Microsoft Access 2010 Chapter 7 Using SQL.
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
Introduction to SQL J.-S. Chou Assistant Professor.
Rationale Aspiring Database Developers should be able to efficiently query and maintain databases. This module will help students learn the Structured.
Session 5: Working with MySQL iNET Academy Open Source Web Development.
Database Design for DNN Developers Sebastian Leupold.
©Silberschatz, Korth and Sudarshan4.1Database System Concepts Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
Introduction to Databases Chapter 7: Data Access and Manipulation.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Chapter 7 Working with Databases and MySQL PHP Programming with MySQL 2 nd Edition.
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Introduction to SEQUEL. What is SEQUEL? Acronym for Structural English Query Language Acronym for Structural English Query Language Standard language.
1 Working with MS SQL Server Textbook Chapter 14.
SQL/Lesson 4/Slide 1 of 45 Using Subqueries and Managing Databases Objectives In this lesson, you will learn to: *Use subqueries * Use subqueries with.
SQL (Chapter 2: Simple queries; Chapter 7 and 8: Nested and DML queries) Many of the examples in this document are based on the tables in the next slide.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
Concepts of Database Management Seventh Edition
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Nitin Singh/AAO RTI ALLAHABAD 1 SQL Nitin Singh/AAO RTI ALLAHABAD 2 OBJECTIVES §What is SQL? §Types of SQL commands and their function §Query §Index.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 7 Introduction to Structured.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
CpSc 462/662: Database Management Systems (DBMS) (TEXNH Approach) Relational Schema and SQL Queries James Wang.
ITEC 3220A Using and Designing Database Systems Instructor: Prof. Z. Yang Course Website: 3220a.htm
Chapter 5 : Integrity And Security  Domain Constraints  Referential Integrity  Security  Triggers  Authorization  Authorization in SQL  Views 
Agenda for Class 2/20/2014 Introduce Microsoft’s SQL Server database management system. Use the lab to discuss how to CREATE, DROP and populate (INSERT)
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.
Web Programming MySql JDBC Web Programming.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
 CONACT UC:  Magnific training   
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Introduction to Database Programming with Python Gary Stewart
Big Data Yuan Xue CS 292 Special topics on.
3 A Guide to MySQL.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
SQL: Advanced Options, Updates and Views Lecturer: Dr Pavle Mogin
TRIO Data Uncertainty Lineage Data Model Query Language System
Trio A System for Data, Uncertainty, and Lineage
20761B 12: Using Set Operators Module 12   Using Set Operators.
Chapter 8 Working with Databases and MySQL
Database systems Lecture 3 – SQL + CRUD
SQL Fundamentals in Three Hours
The Trio System for Data, Uncertainty, and Lineage: Overview and Demo
Contents Preface I Introduction Lesson Objectives I-2
Chapter 8 Advanced SQL.
Presentation transcript:

Special Interest Activity Trio – A System for Integrated Management of Data, Uncertainty, and Lineage ITK478 Yan Cui

Agenda Understand Trio new database system System requirements of installing Trio system Source codes/packages for Trio system Procedure of Trio system installation Configuration of Trio System Experiment Trio DBMS Using TrioExplorer and TrioPlus (demo) Trio API Integrated Into Other Python Scripts Trio API and Translator (Python) Trio Query Language (TriQL) Structure (demo) Advantage and Disadvantage of using Trio DBMS Documenting Trio system report/bugs

Understand Trio new database system Basic concept of new Trio DBMS ◦ Trio is a new kind of database system (DBMS), which was developed by Stanford University Lab at Dec, ◦ An extended relational model called Uncertainty Lineage Database (ULDB), and also supports Trio’s query language Called TriQL [1]. ◦ Handles structured data, uncertainty of data, and data lineage together in a fully integrated manner. Trio System architecture (components) Uncertainty-Lineage Database (ULDB) TriQL: The Trio Query Language

Trio System architecture Four primary components of Trio DBMS ◦ command-line client (TrioPlus) ◦ TrioExplorer ◦ Trio API and translator (Python) ◦ standard relational DBMS (PostgreSQL) Uncertainty-Lineage of Database ◦ Encoded data table ◦ Lineage table ◦ Trio metadata ◦ Trio Stored Procedures

Uncertainty-Lineage Database (ULDB) Alternatives ‘?’ (Maybe) Annotations Numerical Confidences Lineage Sample: Drives (person, color, car) and Saw (witness, color, car) uncertainty tables with/no with confidence.

Uncertainty-Lineage Database (ULDB) Alternatives ◦ Definition: Alternatives are presenting uncertainty about the contents of a tuple [2] ◦ ‘||’ annotation ◦ Drives(person, color, car) and Saw (witness, color, car) uncertainty tables ◦ ‘Select * from Drives”

Uncertainty-Lineage Database (ULDB) ‘?’ (Maybe) Annotations ◦ Definition: ‘?’ annotation present the existence of a tuple on the x-tuple, also called maybe x-tuple [2] ◦ Drives(person, color, car) and Saw (witness, color, car) uncertainty tables ◦ ‘select * from Drives’

Uncertainty-Lineage Database (ULDB) Numerical Confidences ◦ Definition: Numerical confidence also was considered as probability [2]. ◦ Drives(person, color, car) and Saw (witness, color, car) uncertainty tables with confidence ◦ ‘select * from Drives’

Uncertainty-Lineage Database (ULDB) Lineage ◦ Definition: “recorded at the granularity of tuple alternatives: Lineage connects as x-tuple alternative to other x-tuple alternative.” in [2] ◦ Drives(person, color, car) and Saw (witness, color, car) uncertainty tables with confidence ◦ ‘select person from Drives’

TriQL: The Trio Query Language Two major parts of TriQL ◦ built-in functions and predicates for querying confidence values and lineage ◦ regular SQL syntax

System requirements of installation Operating systems - as Linux, Mac OS X, and Win-32 (XP, Vista, and 32-bit Server) PostgreSQL database (version 8.2.5, , , and ) - Linux and Win32 Python API - windows, Linux/Unix, Mac OS X, OS/2, and Amiga

Source codes/packages for Trio system Listing source codes Python 2.4 can be downloaded from Easy_install can be downloaded from and the file called ez_setup.py. Readline 1.7.win32 can be downloaded from ctypes win32-py2.4 can be download from PostgreSQL 8.1 can be downloaded from Graphviz 2.14 is the only version compatible with Trio API. It is available in PyGreSQL can be downloaded from Pylons can be downloaded from PLY 2.2 can be downloaded from PyParsing can be downloaded from PyDot can be downloaded from Trio API 1.0 can be downloaded from

Procedure of Trio system installation Python ◦ Python 2.4 windows version (python msi). ◦ Install Python in C:/Python directory ◦ Set path=c:/Python24; in environment variables Readline ◦ Readline-1.7.win32-py2.4.exe and install into Python directory Ctypes ◦ Ctypes win32-py2.4.exe and install into Python directory PostgreSQL ◦ PostgreSQL 8.1 windows version (postgresql-8.1.msi ) ◦ Install PostgreSQL 8.1 and set path C:\Program Files\PostgreSQL\8.1\bin; after completed the installation. Graphviz ◦ Graphviz 2.14 version, install to your workstation and set path C:\PROGRA~1\ATT\Graphviz\bin; in environment variables after completed the installation

Procedure of Trio system installation (cont) Easy_install - Download ez_setup.py in C:/ directory PyGreSQL – In command line, cd\ to c: directory, and run python ez_setup.py PyGreSQL to install components. Pylons – In command line, cd\ to c: directory, and run python ez_setup.py Pylons==0.9.5 to install Pylons. Set path c:\python24\Scripts in environment variables. PLY – In command line, cd\ to c: directory, and run python ez_setup.py Ply==2.2 or easy_install Ply==2.2. PyParsing – In command line, cd\ to c: directory, and run python ez_setup.py PyParsing. PyDot – download the source from website. Access to folder in command line, and then install manually by running ‘python setup.py install’. Trio API ◦ Download source code in any directory ◦ Copy Trio-1.0\spi\triospi_win32.dll to PostgreSQL’s lib directory and renamed as triospi.dll

Procedure of Trio system installation (cont) PostgreSQL installation

Configuration of Trio System Windows superuser authentication to access PostgreSQL TrioExplorer TrioPlus

Configuration of Trio System Windows superuser authentication for PostgreSQL ◦ Open pgadmin III for PostgreSQL ◦ Create new login role  Role name ‘myname’ (as same as windows login account)  Set password (Password can be any) and check all role privileges and click ok. ◦ Create new database  Database name ‘myname’ (as same as username)  Owner is ‘myname’ and click ok ◦ Initialize Trio schema information  In Trio-1.0\setup, open setup.py with notepad to comment out the last three codes and put the following. os.system("psql %s %s < setup.sql" % (pgdbname, username)) os.system("psql %s %s < setup_triospi.sql" % (pgdbname, username)) os.system("psql %s %s < trio_get_conf.sql" % (pgdbname, username))  Save the file, and at the command line, cd \Trio-1.0\setup, and run ‘python setup.py myname myname’

Configuration of Trio System (cont) pgadmin III for PostgreSQL 8.1

Configuration of Trio System (cont) TrioExplorer ◦ Make sure PostgreSQL is working. ◦ Running TrioExplore – Ensure path ‘c:\python24\Scripts;’ in environment variables. And double click ‘start_te_server.bat’ under Trio-1.0\explorer. ◦ At the command line, you are now prompted for an admin user login to PostgreSQL, which should have been created along with your PostgreSQL installation and which will be used by TrioExplorer to create new user roles and database instances. ◦ TrioExplorer should now be reachable from your browser using For new users can now press ‘Create a new user’ and create their own Trio login and database instances, which are then managed by the PostgreSQL server.

Configuration of Trio System (cont) TrioPlus ◦ Create new PostgreSQL user role and database instance  Run ‘createuser demo’  Run ‘createdb demo’, the name must be the same as username ◦ Initialize Trio schema information for new user by access as same as windows superuser authentication to access PostgreSQL. Use TrioExplorer will be easily just press ‘Create new role’ in Web. ◦ Connect to new Trio database using the command line clients by running ‘python trioplus.py –u demo –d demo –p’

Trio API Integrated Into Other Python Scripts TrioCnx ◦ TrioCnx(pgdb) – PyGreSQL connection pgdb ◦ cursor() – return a new TrioCursor object for the current connection. ◦ commint() – commits the current transaction. ◦ rollback() – performs a rollback for the current transaction. ◦ close() – closes the Trio connection TrioCursor ◦ execute(triql) - Executes a TriQL statement triql for this cursor object. ◦ fetchone() - Fetches a single XTuple object from the current cursor position. ◦ fetchall() - Fetches and returns a list of all XTuple objects beginning from the current cursor position.

Trio API Integrated Into Other Python Scripts (cont) XTuple ◦ len() - Returns the number of Alternative objects contained in this XTuple object. ◦ getAlternative(idx) - Returns the Alternative object at the designated index idx. ◦ getConfidence() - Returns the confidence value (if any) of this XTuple object as the sum of its Alternative objects' confidence values. ◦ getQuestionMark() - Returns whether this XTuple object has a question mark or not. Alternative  getLineage() – returns a list of immediate lineage information  traceLineage() – performs a transitive lineage traversal for this alternative back to the base data  getConfidence() - Returns the confidence value of this alternative.  computeConfidence() - Computes the confidence value

Trio API And Translator (Python) TriQL built-in functionsTriQL keywords Aggregate(), Alternative(), As(), Avg(), BinaryExpression(), Brackets(), Cascade(), ColumnList(), Column(), Command(), ComputeConfidences(), Conf(), ConfUniform(), ConfScaled(), Count(), CreateIndex(), CreateTableAs(), CreateTable(), CreateTempTableAs(), CreateTempTable(), DataType(), Dot(), DropIndex(), DropTableList(), DropTable(), Eavg(), Ecount(), EcountStar(), Emax(), Emin(), Empty(), Esum(), FromClause(), GroupAlts(), GroupbyClause(), GroupByKey(), HavingClause(), HorizontalSelect(), IdentifierList(), InsertList(), Insert(), InsertSpec(), Lineage(), Literal(), Lsum(), Max(), Maybe(), Maybe(), Min(), OrderBy(), Question(), SelectClause(), Select(), SelectOptions(), SetOperator(), Star(), Sum(), UnaryExpression(), UncertainSet(), UniformTable(), ViewTable(), WhereClause(), WithConfidences() 'all', 'and', 'any', 'as', 'avg', 'by', 'cascade', ‘compute', 'conf', 'confidences', 'count', 'create', 'distinct', 'drop', 'eavg', 'ecount', 'emax', 'emin', 'esum', 'except', 'exists', ‘flatten', 'float', ‘float4', 'float8', 'from', 'group', 'groupalts', 'having', 'in', 'index', 'insert', 'int', 'int4', 'int8', 'intersect', 'into', 'is', 'like', 'lineage', #'lavg', #'lcount', #'lmax', #'lmin', #'lsum', 'max', 'maybe', 'min', 'noconf', 'nolineage', 'nomaybe', 'not', 'null', 'on', 'or', 'order', 'scaled', 'scaledbyexp', 'select', 'sum', 'table', 'temporary', 'trio', 'trio_aid', 'trio_xid', 'uncertain', 'uniform', 'union', 'merged', 'values', 'varchar', 'view', 'where', 'with'

Demo TrioExplorer and TrioPlus Trio Query Language (TriQL) Structure ◦ Drop index/table ◦ Create Trio table/index ◦ TriQL language ◦ TriQL manual: ◦ Sample: TriQL ScriptTriQL Script

Current Supported for TriQL Supported DDL and Insert CommandsSupported Subset of TriQL Query Language  Create Table  Create Trio Table  Create Index  Drop Table - Including “Cascade” option to drop derived tables Drop Index Insert Into T Values o With or without alternatives, confidence values, and ? o T must be a base table Insert Into T o T must be a base table  Select-Project-Join queries Create Table T as  Select Distinct Union, Intersect, or Except of two subqueries (duplicate- eliminating) Union All of two subqueries Merged, Flatten, and GroupAlts  Horizontal subqueries in Where clause o Including shortcuts and aggregation/group-by/having Horizontal subqueries in Select clause o Including shortcuts and aggregation but not group- by/having  Conf() function with any number of tables or Conf(*) o except for Conf(*) in the SELECT clause in conjunction with DISTINCT or MERGED  Maybe() predicate  “Uniform ” in From clause “as conf” for query-defined result confidence values o Including “Uniform as conf” and “Scaled as conf” “Compute Confidences” at end of query Lineage() predicate o Not in horizontal subqueries o “= ⇒ ” abbreviation allowed NoLineage, NoConf, and NoMaybe

Table of TriQL contents ULDBsUncertainty attributes, maybe annotations and confidence values SQL over ULDBsSelection, projection, join, subqueries, duplicate- elimination, grouping and aggregation, aggregate variants, set operators, order by Flatten and GroupAltsFlatten is used to turn tuples with alternative values into regular tuples, while GroupAlts is used to create or restructure alternative values Horizontal subqeries: The [ ] construct[ ] in the where clause, [ ] with joins, Syntactic shortcuts in [ ], [ ] in the select clause, [ ] with Self-Joins Built-in Functions Conf() and Maybe() Multi-table conf() Result confidencesResult confidence evaluation, uniform and scaled result confidences, On-Demand confidence computation Built-in Predicate Lineage()The Lineage() predicate lets queries filter joined tuples based on whether they are related via lineage Options Nolineage, Noconf, and NoMaybeIndicate lineage, confidence values, and/or?'s should be omitted from query results Data modificationInsert statement, delete statement, update statement

Advantage and Disadvantage Advantage ◦ Open source and free support for any non-benefit users to experience new Trio DBMS ◦ Advanced components in relational DBMS ◦ Computing confidences ◦ Efficient, Convenient, safe, Multi-User storage of and access to, Massive, Persistent Disadvantage ◦ Time cost for query ◦ Dependency ◦ On development stage

Advantage and Disadvantage Disadvantage ◦ Time cost for query Using ‘SELECT attr-list FROM X 1, X 2,..., X n WHERE predicate’ as a query example in [6] for a comparison between relational database and ULDB. Over standard relational database:  For each tuple in cross-product of X 1, X 2,..., X n  Evaluate the predicate  If true, project attr-list to create result tuple Over ULDB:  For each tuple in cross-product of X 1, X 2,..., X n  Create “super tuple” T from all combinations of alternatives  Evaluate predicate on each alternative in T ; keep only the true ones  Project attr-list on each alternative to create result tuple  Details: ‘?’, lineage, confidences

Documenting Trio system report/bugs The install instruction in website indicated unclearly the version of Graphviz for Trio system. In Graphviz website only has version 2.16 but not compatable except version Graphviz version 2.14 is available for download in The windows authentication supperuser needs to be created first in the PostgreSQL in order to connect to database. After established the connection, TrioExplorer and TrioPlus can use the supperuser’s login and password as windows authentiction to access to database system. However, it doesn’t mention at all in the installation procedure on how to create this typle of new user. The only way to solve it is to use PostgreSQL->pgadmin III manually. After created the supperuser, I have to modify some codes in setup.py in Trio-1.0->setup directory in order to run ‘python setup.py –u myname –d myname –p’. TriQL query statements in there are many samples queries not working properly as desired.

References M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara. Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR '07), Pacific Grove, California, January Demonstration description.Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom. An Introduction to ULDBs and the Trio System. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1):5-16, March 2006.An Introduction to ULDBs and the Trio System Trio: A System for integrated Management of Data, Uncertainty, and Lineage. Retrieved on November, 18, 2007 from PostgreSQL. Retrieved on November, 20, 2007 from Python. Retrieved on November, 15, 2007 from Trio: A System for Data, Uncertainty, and Lineage. given by Jennifer at various venues, Ppt.

Questions?