MystiQ The HusQies* *Nilesh Dalvi, Brian Harris, Chris Re, Dan Suciu University of Washington
Outline Overview Demo / discussions Conclusions
MystiQ General purpose probabilistic database system Motivation: manage imprecisions in data
What MystiQ Does Tables stored in relational database Tables Events (= Probabilistic tables) Expressive probabilistic model Maybe/Or tuples Views over events Confidences for views
What MystiQ Does Query semantics: –SQL: joins, distinct, aggregates/group-by –Point probabilities –Top-k answers, guaranteed ranking Query evaluation –Safe plans –Monte Carlo simulation (Luby-Karp)
What MystiQ Does Not No syntax for popular probabilistic models –BNs, PRMs, rules with confidences –Can be expressed but indirectly No lineage No probabilities on continuous values
Using MystiQ Store data in RDBMS (demo: postgres) Write a configuration file Run SQL queries on MystiQ
Probabilistic Tables = Events ProdPriceColorShapeprob Camera19.99 RedRound0.3 BlueSquare0.7 Gizmo255 BlueRound0.2 BlueSquare0.1 YellowPointed0.4 Product(prod,price,color,shape,prob) ProductEvent(prod,price,color,shape)
Configuration File Tables Events (= Probabilistic tables) CREATE TABLE Product(prod, color, shape, prob) CREATE EVENT ProductEvent(prod) choice(color, shape) ON Product(prob) CREATE TABLE Product(prod, color, shape, prob) CREATE EVENT ProductEvent(prod) choice(color, shape) ON Product(prob)
Demo
Views Standard: Tables Tables ( Events ) Probabilistic: Events Events later
A BN in MystiQ Color Shape Weight ColorShapeWeightprob RedRound Light0.3 Medium0.7 Heavy0.2 BlueSquare Light0.1 Medium0.4
Applying BN to a Table ProdColorShapeWeightprob CameraRedRound Light0.3 Medium0.7 Heavy0.2 CameraBlueSquare Light0.1 Medium0.4 Product(prod,price,color,shape,prob) ProductEvent(prod,price,color,shape)
Applications of ProbDB ? Fuzzy object matching: IMDB + AMZN Information extraction What else ???
Development Developed under a TGIF grant Free license (on request) for research institutions
Current/Future Work Constraint, Data mappings Theory of conjunctive queries on probdb Cleaning of sensor data (w/ Balazinska)