MystiQ The HusQies* *Nilesh Dalvi, Brian Harris, Chris Re, Dan Suciu University of Washington.

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 14: DATA PROVENANCE PRINCIPLES OF DATA INTEGRATION.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization Christopher Re and Dan Suciu University of Washington 1.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
Basic SQL Introduction Presented by: Madhuri Bhogadi.
Representing and Querying Correlated Tuples in Probabilistic Databases
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington.
Results of the survey and relational dbs Fall 2011.
Top-K Query Evaluation on Probabilistic Data Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington.
A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.
Online Aggregation Liu Long Aggregation Operations related to aggregating data in DBMS –AVG –SUM –COUNT.
Efficient Query Evaluation on Probabilistic Databases
E FFICIENT T OP - K Q UERY E VALUATION ON P ROBABILISTIC D ATA P APER B Y C HRISTOPHER R´ E N ILESH D ALVI D AN S UCIU Presented By Chandrashekar Vijayarenu.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Uncertainty Lineage Data Bases Very Large Data Bases
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Importance Sampling. What is Importance Sampling ? A simulation technique Used when we are interested in rare events Examples: Bit Error Rate on a channel,
1 Management of Probabilistic Data: Foundations and Challenges Nilesh Dalvi and Dan Suciu Univerisity of Washington.
1 View Theory Dan Suciu Computer Science Department University of Washington.
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Presented by Xi Zhang Feburary 8 th, 2008.
The Relational Model Codd (1970): based on set theory Relational model: represents the database as a collection of relations (a table of values --> file)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
1 Probabilistic/Uncertain Data Management -- IV 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic databases”, VLDB’ Sen, Deshpande. “Representing.
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
1 On Provenance of Non-Answers for Queries over Extracted Data Jiansheng Huang Ting Chen AnHai Doan Jeffrey F. Naughton.
1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
Structured Querying of Web Text A Technical Challenge Kulsawasd Jitkajornwanich University of Texas at Arlington CSE6339 Web Mining.
Structured Querying of Web Text: A Technical Challenge Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko University of Washington.
Structured Querying of Web Text: A Technical Challenge Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko Presenter: Shahina.
Streaming XPath / XQuery Evaluation and Course Wrap-Up Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems December.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
EXAM 1 NEXT TUESDAY…. EXAMPLE QUESTIONS 1.Why is the notion of a “state” important in relational database technology? What does it refer to? 2.What do.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Event Detection and Notification in the World-Wide Sensor Web Magdalena Balazinska with Evan Welbourne, Garret Cole, Nodira Khoussainova, Julie Letchner,
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Probabilities in Databases and Logics I Nilesh Dalvi and Dan Suciu University of Washington.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Modified by Veeranjaneyulu Sadhanala.
Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International University Gerhard Weikum, MPI Informatik.
1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab.
Supporting Ranking and Clustering as Generalized Order-By and Group-By
Exam : Querying Microsoft SQL Server 2012/2014
A Course on Probabilistic Databases
1Z0-071 Exam : Oracle Database 12c SQL
Probabilistic Data Management
Approximate Lineage for Probabilistic Databases
Associative Query Answering via Query Feature Similarity
Queries with Difference on Probabilistic Databases
Lecture 16: Probabilistic Databases
The Trio System for Data, Uncertainty, and Lineage: Overview and Demo
Probabilistic Databases
Query Processing.
Database SQL.
Probabilistic Databases with MarkoViews
Presentation transcript:

MystiQ The HusQies* *Nilesh Dalvi, Brian Harris, Chris Re, Dan Suciu University of Washington

Outline Overview Demo / discussions Conclusions

MystiQ General purpose probabilistic database system Motivation: manage imprecisions in data

What MystiQ Does Tables stored in relational database Tables  Events (= Probabilistic tables) Expressive probabilistic model Maybe/Or tuples Views over events Confidences for views

What MystiQ Does Query semantics: –SQL: joins, distinct, aggregates/group-by –Point probabilities –Top-k answers, guaranteed ranking Query evaluation –Safe plans –Monte Carlo simulation (Luby-Karp)

What MystiQ Does Not No syntax for popular probabilistic models –BNs, PRMs, rules with confidences –Can be expressed but indirectly No lineage No probabilities on continuous values

Using MystiQ Store data in RDBMS (demo: postgres) Write a configuration file Run SQL queries on MystiQ

Probabilistic Tables = Events ProdPriceColorShapeprob Camera19.99 RedRound0.3 BlueSquare0.7 Gizmo255 BlueRound0.2 BlueSquare0.1 YellowPointed0.4 Product(prod,price,color,shape,prob) ProductEvent(prod,price,color,shape)

Configuration File Tables  Events (= Probabilistic tables) CREATE TABLE Product(prod, color, shape, prob) CREATE EVENT ProductEvent(prod) choice(color, shape) ON Product(prob) CREATE TABLE Product(prod, color, shape, prob) CREATE EVENT ProductEvent(prod) choice(color, shape) ON Product(prob)

Demo

Views Standard: Tables  Tables (  Events ) Probabilistic: Events  Events later

A BN in MystiQ Color Shape Weight ColorShapeWeightprob RedRound Light0.3 Medium0.7 Heavy0.2 BlueSquare Light0.1 Medium0.4

Applying BN to a Table ProdColorShapeWeightprob CameraRedRound Light0.3 Medium0.7 Heavy0.2 CameraBlueSquare Light0.1 Medium0.4 Product(prod,price,color,shape,prob) ProductEvent(prod,price,color,shape)

Applications of ProbDB ? Fuzzy object matching: IMDB + AMZN Information extraction What else ???

Development Developed under a TGIF grant Free license (on request) for research institutions

Current/Future Work Constraint, Data mappings Theory of conjunctive queries on probdb Cleaning of sensor data (w/ Balazinska)