Querying Factorized Probabilistic Triple Databases Denis Krompaß 1, Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science. Ludwig Maximilian.

Slides:



Advertisements
Similar presentations
Some Prolog Prolog is a logic programming language
Advertisements

ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Representing and Querying Correlated Tuples in Probabilistic Databases
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Probabilistic Latent-Factor Database Models Denis Krompaß 1, Xueyan Jiang 1,Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Ensemble Emulation Feb. 28 – Mar. 4, 2011 Keith Dalbey, PhD Sandia National Labs, Dept 1441 Optimization & Uncertainty Quantification Abani K. Patra, PhD.
NP-Complete William Strickland COT4810 Spring 2008 February 7, 2008.
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Homework for November 2011 Nikolay Kostov Telerik Corporation
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Non-Negative Tensor Factorization with RESCAL Denis Krompaß 1, Maximilian Nickel 1, Xueyan Jiang 1 and Volker Tresp 1,2 1 Department of Computer Science.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Laboratory for Perceptual Robotics – Department of Computer Science Whole-Body Collision-Free Motion Planning Brendan Burns Laboratory for Perceptual Robotics.
The Tornado Model: Uncertainty Model for Continuously Changing Data Byunggu Yu 1, Seon Ho Kim 2, Shayma Alkobaisi 2, Wan Bae 2, Thomas Bailey 3 Department.
Large-Scale Factorization of Type- Constrained Multi-Relational Data Denis Krompaß 1, Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer.
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
CS2008/CS5035 Exam Preparation. Dept. of Computing Science, University of Aberdeen2 Organization of Lecture Notes Group 1 - SQL –L1 – Introduction –L2.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Ensemble Solutions for Link-Prediction in Knowledge Graphs
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
{ Pop Music In the 2000s.  Music during this time lacked distinction  Much of what was happening in the late 90s continued into the early 2000s  Not.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
Querying Business Processes Under Models of Uncertainty Daniel Deutch, Tova Milo Tel-Aviv University ERP HR System eComm CRM Logistics Customer Bank Supplier.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Distributed Maintenance of Cache Freshness in Opportunistic Mobile Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania.
Popular Music PowerPoint Your Name Genre Date. Background information History/Development of genre.  e.g. decade, geographical area, ethnic groups Add.
KNOWLEDGE BASED TECHNIQUES INTRODUCTION many geographical problems are ill-structured an ill-structured problem "lacks a solution algorithm.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
MGS3100_01.ppt/Aug 25, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Introduction - Why Business Analysis Aug 25 and 26,
Haley: A Hierarchical Framework for Logical Composition of Web Services Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS
INTRODUCTION TO QUERIES William Klingelsmith. Reminders MyITLab Lesson C due 10/22 Homework #4 (Nielson TV Problem) due 10/26 Last day to drop is also.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
INTRODUCTION TO QUERIES William Klingelsmith. Reminders MyITLab Lesson C due 10/22 Homework #4 (Nielson TV Problem) due 10/26 Last day to drop is also.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
DFT Applications Technology to calculate observables Global properties Spectroscopy DFT Solvers Functional form Functional optimization Estimation of theoretical.
Example 1 A record company is interested in the difference in their ratings for two different artists. Given the ratings below, are the artists significantly.
Doug Raiford Phage class: introduction to sequence databases.
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
MUSIC HU Any questions before we get started? Looking ahead…
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
DATA COLLECTION BY: NOUR AND KAITLAND. OUR SURVEY QUESTIONS 1.) Which music genre do you prefer the most? —Pop –Alternative Rock –Indie –Hip Hop –Rock.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)
A Teen's Personal Guide to Music on the Web
A Course on Probabilistic Databases
A review of audio fingerprinting (Cano et al. 2005)
Probabilistic Data Management
Lesson 1: Databases.
Lecture 16: Probabilistic Databases
Introduction to Databases
Probabilistic Databases
Music Press Starter: Answer these questions… Genres and Magazines
Introduction to Decision Sciences
Presentation transcript:

Querying Factorized Probabilistic Triple Databases Denis Krompaß 1, Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science. Ludwig Maximilian University, 2 MIT, Cambridge and Istituto Italiano di Tecnologia 3 Corporate Technology, Siemens AG

Outline 1.Introduction and Motivation  Knowledge Bases are Triple Stores  Exploiting Uncertainty in KBs 2.Constructing Probabilistic Knowledge Bases with the RESCAL Tensor-Factorization 3.Complex Querying of Factorized Probabilistic Knowledge Base Representations 2

Knowledge Bases are Triple Stores  Represents facts of the world in a machine readable form S UBJECT P REDICATE O BJECT  Many False Facts  Incomplete  Only few attempts to represent triple-uncertainty + -  Machine Readable  Utilize Background Knowledge in Applications  Provide Additional Information (Search) 3

Benefits of Exploiting Uncertainty in Knowledge Bases ? ? ? ? ? ? ? ? ? ? ? SubjectObjectPRndVar JackJane0.5X1X1 JackLucy0.1X2X2 JackJim0.1X3X3 JaneLucy0.4X4X4 ………… SubjectObjectPRndVar JackJane0.7Y1Y1 JackLucy0.2Y2Y2 JackJim0.4Y3Y3 JaneLucy0.4Y4Y4 ………… ? Probabilistic Tuple-Independent Database “Does Jack know someone who is a friend of Lucy?” “No, Jack does not know such a person.” “Jack might know such a person with a probability of 63%.”* SubjectObject JackJane JimLucy SubjectObject JackJane Jim Lucy *Considering that a person knows and is a friend of itself p≥0.5 Deterministic Database 4

Challenges when Exploiting Uncertainty in Knowledge Bases “Does Jack know someone who is a friend of Lucy?”  Can be intractable for larger KBs  Millions of Entities  Thousands of Relation Types  Intractable many possible triples  Unsafe queries can be intractable due to exponential complexity (#P)  Safe queries  polynomial complexity  can lead to long query processing times. Reintroducing Uncertainty Complex Querying RESCAL SubjectObjectPRndVar JackJane0.5X1X1 JackLucy0.1X2X2 JackJim0.1X3X3 JaneLucy0.4X4X4 ………… SubjectObjectPRndVar JackJane0.7Y1Y1 JackLucy0.2Y2Y2 JackJim0.4Y3Y3 JaneLucy0.4Y4Y4 ………… Probabilistic Tuple-Independent Database “Jack might know such a person with a probability of 63%.”* *Considering that a person knows and is a friend of itself ? ? ? ? ? ? ? ? ? ? ? ? SubjectObject JackJane JimLucy SubjectObject JackJane Jim Lucy Deterministic Database 5

SubjectObjectPRndVar JackJane0.5X1X1 JackLucy0.1X2X2 JackJim0.1X3X3 JaneLucy0.7X4X4 ………… SubjectObjectPRndVar JackJane0.6Y1Y1 JackLucy0.2Y2Y2 JackJim0.4Y3Y3 JaneLucy0.8Y4Y4 ………… Probabilistic KB Construction with RESCAL ≈ ×× ? ? ? ? ? ? ? ? ? ? ? 0.4? Denis Krompaß et al. Large-Scale Factorization of Type-Constrained Multi-Relational Data. DSAA’2014 Explicit Representation 1.6 ×10 11 Triples Factorized Representation 225 × 10 6 Parameters SubjectObject JackJane JimLucy SubjectObject JackJane Jim Lucy Adjacency Tensor 1.0 -∞ +∞ Query a triple: „Is Jack a friend of Lucy?“ „Jack might be a friend of Lucy with a probability of 0.1 %“ 6

Complex Querying: Pure Extensional Query Evaluation on Factorized PKBs “Does Jack know someone who is a friend of Lucy?” “Does Jack know a soccer player?”  For each existential quantifier the independent-project rule has to be applied  Nested loops Complexity scales already cubic in the size of the database if we ask the query for all persons in the database 7

Querying DBpedia-Music (44345 Entities, 7 Relations)  „What songs or albums from the Pop- Rock genre are from musical artists that have/had a contract with Atlantic Records?“  „Which musical artists from the Hip- Hop Music genre have/had a contract with Shady, Aftermath or Death Row Records?“ 8 <2 seconds

Avoiding Independent-Project “Does Jack know a soccer player?” knows(), soccer()knowsSoccerPlayerOf()  A is already known from past factorization of the initial KB 1.Construct deterministic compound relation X (*) 2.Approximate latent representation of compound relation X (*) Initial Deterministic KBFactorized KB 9

Querying DBpedia-Music (44345 Entities, 7 Relations)  „What songs or albums from the Pop- Rock genre are from musical artists that have/had a contract with Atlantic Records?“  „Which musical artists from the Hip- Hop Music genre have/had a contract with Shady, Aftermath or Death Row Records?“ 10

Querying DBpedia-Music (44345 Entities, 7 Relations)  „Which musical artists from the Hip-Hop music genre are associated with musical artists that have/had a contract with Interscope Records and are involved in an album whose first letter is a ‘T’?“ Exploiting Approximated Compound-Relations Pure Extensional Query Evaluation Processing not finished after 6 hours! 3.6 seconds AUC:

Summary Uncertainty can be reintroduced into deterministic representations of Knowledge Bases with RESCAL The factorized Representation can be exploited for complex querying Extensional query evaluation can be significantly accelerated by exploiting the RESCAL model (Compound Relations) 12

Questions ? 13