Probabilistic Latent-Factor Database Models Denis Krompaß 1, Xueyan Jiang 1,Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
n-ary Relations and Their Applications
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Probabilistic Information Retrieval Part I: Survey Alexander Dekhtyar department of Computer Science University of Maryland.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Representing and Querying Correlated Tuples in Probabilistic Databases
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
Probabilistic Clustering-Projection Model for Discrete Data
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
The Web of data with meaning... By Michael Griffiths.
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
Linked Sensor Data Harshal Patni, Cory Henson, Amit P. Sheth Ohio Center of Excellence in Knowledge enabled Computing (Kno.e.sis) Wright State University,
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Non-Negative Tensor Factorization with RESCAL Denis Krompaß 1, Maximilian Nickel 1, Xueyan Jiang 1 and Volker Tresp 1,2 1 Department of Computer Science.
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
C O R P O R A T E T E C H N O L O G Y Information & Communications Intelligent Autonomous Systems Infinite Hidden Relational Models Zhao Xu 1, Volker Tresp.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Large-Scale Factorization of Type- Constrained Multi-Relational Data Denis Krompaß 1, Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer.
Scalable Text Mining with Sparse Generative Models
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Copyright Antidot™ 1 Linked Enterprise Data LEVERAGING THE SEMANTIC WEB STACK IN A CORPORATE ENVIRONMENT ISWC 2012 – BOSTON FABRICE LACROIX –
Cao et al. ICML 2010 Presented by Danushka Bollegala.
A NON-IID FRAMEWORK FOR COLLABORATIVE FILTERING WITH RESTRICTED BOLTZMANN MACHINES Kostadin Georgiev, VMware Bulgaria Preslav Nakov, Qatar Computing Research.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Knowledge representation
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
Ensemble Solutions for Link-Prediction in Knowledge Graphs
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris Lin, Neeraj Koul, and Vasant.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Loris Bazzani*, Marco Cristani*†, Vittorio Murino*† Speaker: Diego Tosato* *Computer Science Department, University of Verona, Italy †Istituto Italiano.
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Developing “Geo” Ontology Layers for Web Query Faculty of Design & Technology Conference David George, Department of Computing.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Natural Language Questions for the Web of Data Mohamed Yahya 1, Klaus Berberich 1, Shady Elbassuoni 2 Maya Ramanath 3, Volker Tresp 4, Gerhard Weikum 1.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Querying Factorized Probabilistic Triple Databases Denis Krompaß 1, Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science. Ludwig Maximilian.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
The Scientific Method SE Bio 2.B
Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
Using linked data to interpret tables Varish Mulwad September 14,
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Exploiting Relevance Feedback in Knowledge Graph Search
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.
A Review of Relational Machine Learning for Knowledge Graphs CVML Reading Group Xiao Lin.
Probabilistic Data Management
Information Extraction from Wikipedia: Moving Down the Long Tail
Big Data Quality the next semantic challenge
ESWC’14 龚赛赛.
NJVR: The NanJing Vocabulary Repository
Enriching Structured Knowledge with Open Information
Michal Rosen-Zvi University of California, Irvine
Presentation transcript:

Probabilistic Latent-Factor Database Models Denis Krompaß 1, Xueyan Jiang 1,Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science. Ludwig Maximilian 2 University, Oettingenstraße 67, Munich, Germany MIT, Cambridge, MA and Istituto Italiano di Tecnologia, Genova, Italy 3 Corporate Technology, Siemens AG, Otto-Hahn-Ring 6, Munich, Germany

Outline 1.Knowledge Bases Are Triple Graphs 2.Link/Triple Prediction With RESCAL 3.Probabilistic Database Models 4.Querying Factorized Multi-Graphs 5.Including Local Closed World Assumptions 6.Conclusion

Knowledge Bases are Triple Graphs  Linked Open Data (LOD) and large ontologies like DBpedia, Yago, Knowledge Graph are graph-based knowledge representations using light-weight ontologies, and are accessible to machine learners  Millions of entities, hundreds to thousands of relations  Millions to billions of facts about the world

Knowledge Bases are Triple Graphs  Linked Open Data (LOD) and large ontologies like DBpedia, Yago, Knowledge Graph are graph-based knowledge representations using light-weight ontologies, and are accessible to machine learners  Millions of entities, hundreds to thousands of relations  Millions to billions of facts about the world  They are all triple oriented and more or less follow the RDF standard  RDF: Resource Description Framework

Knowledge Bases are Triple Graphs  Linked Open Data (LOD) and large ontologies like DBpedia, Yago, Knowledge Graph are graph-based knowledge representations using light-weight ontologies, and are accessible to machine learners  Millions of entities, hundreds to thousands of relations  Millions to billions of facts about the world  They are all triple oriented and more or less follow the RDF standard  RDF: Resource Description Framework  Goal in Triple oriented Database is to predict the existence of a triple (new links in the graph)

Knowledge Bases are Triple Graphs  Linked Open Data (LOD) and large ontologies like DBpedia, Yago, Knowledge Graph are graph-based knowledge representations using light-weight ontologies, and are accessible to machine learners  Millions of entities, hundreds to thousands of relations  Millions to billions of facts about the world  They are all triple oriented and more or less follow the RDF standard  RDF: Resource Description Framework  Recent application: Google Knowledge Vault (Dong et al.)  Goal in Triple oriented Database is to predict the existence of a triple (new links in the graph)

Triple Prediction with RESCAL e.g. marriedTo(Jack,Jane) Nickel et al. ICML 2011, WWW 2012

Triple Prediction with RESCAL ranking 1.r k (e 3,e 5 ) 2.r k (e 2,e 3 ) 3.r k (e 6,e 5 ) 4.… e1e1 e2e2 e3e3 e4e4 e6e6 e7e7 e5e5 e1e1 e2e2 e3e3 e6e6 e7e7 e5e5 e4e4 Relation r k e.g. marriedTo(Jack,Jane) Nickel et al. ICML 2011, WWW 2012

Probabilistic Database Models  In some Data, quantities are attached to the links: – Rating of an user for a movie – The amount of times team A played against team B – Amount of medication for a patient ModelNationsKinshipUMLS RESCAL Bernoulli Poisson Multinomial ModelNationsKinshipUMLS RESCAL Binomial Poisson Multinomial RESCAL-P Binary DataCount Data

Local Closed World Assumptions  Large Knowledge Bases contain hundreds of relation types  Most of relation types only relate only subsets of entities present in the KB  Local Closed World Assumptions (LCA) marriedTo(Jack,Jane) marriedTo(Cat,Rome) Type Constraints for Relation Types √

Local Closed World Assumptions  Large Knowledge Bases contain hundreds of relation types  Most of relation types only relate only subsets of entities present in the KB  Local Closed World Assumptions (LCA) marriedTo(Jack,Jane) marriedTo(Cat,Rome) Type Constraints for Relation Types  RESCAL does not support Type Constraints √

Exploiting Type Constraints  Type Constraints can be introduced into the RESCAL factorization (Krompaß et al. DSAA 2014)  Type Constraints:  Given by the Knowledge Base  Or can be approximated with the observed data

Exploiting Type Constraints  Performance (prediction quality and runtime) is improved especially for large multi-relational datasets DBpedia Entities, 511 Relation Types, 16,945,046 triples, Sparsity: 6.52 × RESCAL+ Type Constraints RESCAL DBpedia-Music 311,474 Entities, 7 Relation Types, 1,006,283 triples

Querying Factorized Knowledge Bases  Probabilities of single triples can be inferred through RESCAL  Exploit theory of probabilistic databases for complex querying  Problem:  Extensional Query Evaluation can be very slow (even though it has polynomial time complexity)  Krompaß et al. ISWC 2014 (Best Research Paper Nominee)  We approximate views generated by independent project rules  We are able to construct factorized views which are very memory efficient

Querying Factorized Knowledge Bases „Which musical artists from the Hip Hop music genre have/had a contract with Shady, Aftermath or Death Row records?”

Higher Order Relations  In principle, we are not restricted to binary relations  Binary Relations:  N-ary Relations

Conclusion  When the data is complete and sparse, RESCAL Gaussian likelihood is most efficient  Applicable to binary and count data  The Gaussian likelihood model can compete with the more elegant likelihood cost functions  Model complexity can be reduced by considering Type Constraints

Questions?