Relational Representations Daniel Lowd University of Oregon April 20, 2015.

Slides:



Advertisements
Similar presentations
Joint Inference in Information Extraction Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Markov Networks Alan Ritter.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Markov Logic Networks Instructor: Pedro Domingos.
Patch to the Future: Unsupervised Visual Prediction
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Foundations of Comparative Analytics for Uncertainty in Graphs Lise Getoor, University of Maryland Alex Pang, UC Santa Cruz Lisa Singh, Georgetown University.
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
School of Computing Science Simon Fraser University Vancouver, Canada.
Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Learning From Data Chichang Jou Tamkang University.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Markov Logic Networks: A Unified Approach To Language Processing Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with.
Scalable Text Mining with Sparse Generative Models
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Introduction to Machine Learning Approach Lecture 5.
Introduction to Data Mining Engineering Group in ACL.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Artificial Intelligence (AI) Addition to the lecture 11.
Data Mining Techniques
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.
Markov Logic And other SRL Approaches
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
LAC group, 16/06/2011. So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Theoretic Frameworks for Data Mining Reporter: Qi Liu.
CPSC 322, Lecture 31Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 25, 2015 Slide source: from Pedro Domingos UW & Markov.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
International Conference on Fuzzy Systems and Knowledge Discovery, p.p ,July 2011.
Finite State Machines (FSM) OR Finite State Automation (FSA) - are models of the behaviors of a system or a complex object, with a limited number of defined.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:
Business Analytics Several odds and ends Copyright © 2016 Curt Hill.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Learning Bayesian Networks for Complex Relational Data
Brief Intro to Machine Learning CS539
Basic Intro Tutorial on Machine Learning and Data Mining
Using Natural Language Processing to Aid Computer Vision
Graph Neural Networks Amog Kamsetty January 30, 2019.
CS565: Intelligent Systems and Interfaces
Statistical Relational AI
Presentation transcript:

Relational Representations Daniel Lowd University of Oregon April 20, 2015

Caveats The purpose of this talk is to inspire meaningful discussion. I may be completely wrong. My background: Markov logic networks, probabilistic graphical models

Q: Why relational representations? A: To model relational data.

Relational Data A relation is a set of n-tuples: Friends: {(Anna,Bob), (Bob,Anna), (Bob,Chris)} Smokes: {(Bob),(Chris)} Grade: {(Anna, CS612, Fall2012, “A+”),…} Relations can be visualized as tables: Typically make closed world assumption: all tuples not listed are false. AnnaBob Anna BobChris Friends Bob Chris Smokes

Relational Knowledge First-order logic Description logic Logic programs General form: A set of rules of the form “For every tuple of objects (x 1,x 2,…,x k ), certain relationships hold.” e.g., For every pair of objects (x,y), if Friends(x,y) is true then Friends(y,x) is true.

Statistical Relational Knowledge First-order logic Description logic Logic programs General form: A set of rules of the form “For every tuple of objects (x 1,x 2,…,x k ), certain relationships probably hold.” (Parametrized factors or “parfactors”.) Bayesian networks Markov networks Dependency networks e.g., For every pair of objects (x,y), if Friends(x,y) is true then Friends(y,x) is more likely.

Applications and Datasets What are the “killer apps” of relational learning? They must be relational.

Graph or Network Data Many kinds of networks: – Social networks – Interaction networks – Citation networks – Road networks – Cellular pathways – Computer networks – Webgraph

Graph Mining

Well-established field within data mining Representation: nodes are objects, edges are relations Many problems and methods – Frequent subgraph mining – Generative models to explain degree distribution and graph evolution over time – Community discovery – Collective classification – Link prediction – Clustering What’s the difference between graph mining and relational learning?

Social Network Analysis

Specialized vs. General Representations In many domains, the best results come from more restricted, “specialized” representations and algorithms. Specialized representations and algorithms – May represent key domain properties better – Typically much more efficient – E.g., stochastic block model, label propagation, HITS General representations – Can be applied to new and unusual domains – Easier to define complex models – Easier to modify and extend – E.g., MLNs, PRMs, HL-MRFs, ProbLog, RBNs, PRISM, etc.

Specializing and Unifying Representations There have been many representations proposed over the years, each with their own advantages and disadvantages. How many do we need? Which comes first, representational power or algorithmic convenience? What are the right unifying frameworks? When should we resort to domain-specific representations? Which domain-specific ideas actually generalize to other domains?

Applications and Datasets What are the “killer apps” of general relational learning? They must be relational. They should probably be complex.

BioNLP Shared Task Workshop In 2009, Riedel et al. win with a Markov logic network! They claim Markov logic contributed to their success: “Furthermore, the declarative nature of Markov Logic helped us to achieve these results with a moderate amount of engineering. In particular, we were able to tackle task 2 by copying the local formulae for event prediction and adding three global formulae.” However, converting this problem to an MLN was non-trivial: "In future work we will therefore investigate means to extend Markov Logic (interpreter) in order to directly model event structure.” Task: Extract biomedical information from text.

BioNLP Shared Task Workshop For 2011, Riedel and McCallum produce a more accurate model as a factor graph: Is this a victory or a loss for relational learning? Task: Extract biomedical information from text.

Other NLP Tasks? Hoifung Poon and Pedro Domingos obtained great NLP results with MLNs: “Joint Unsupervised Coreference Resolution with Markov Logic,” ACL “Unsupervised Semantic Parsing,” EMNLP Best Paper Award. “Unsupervised Ontology Induction from Text,” ACL …but Hoifung hasn’t used Markov logic in any of his follow-up work: “Probabilistic Frame Induction,” NAACL (with Jackie Cheung and Lucy Vanderwende) “Grounded Unsupervised Semantic Parsing,” ACL “Grounded Semantic Parsing for Complex Knowledge Extraction,” NAACL (with Ankur P. Parikh and Kristina Toutanova)

MLNs were successfully used to obtain state-of-the-art results on several NLP tasks. Why were they abandoned? Because it was easier to hand-code a custom solution as a log-linear model.

Software There are many good machine learning toolkits – Classification: scikit-learn, Weka – SVMs: SVM-Light, LibSVM, LIBLINEAR – Graphical models: BNT, FACTORIE – Deep learning: Torch, Pylearn2, Theano What’s the state of software for relational learning and inference? – Frustrating. – Are the implementations too primitive? – Are the algorithms immature? – Are the problems just inherently harder?

Hopeful Analogy: Neural Networks In computer vision, specialized feature models (e.g., SIFT) outperformed general feature models (neural networks) for a long time. Recently, convolutional nets are best and are used everywhere for image recognition. What changed? More processing power and more data. Specialized relational models are widely used. Is there a revolution in general relational learning waiting to happen?

Conclusion Many kinds of relational data and models – Specialized relational models are clearly effective. – General relational models have potential, but they haven’t taken off. Questions: – When can effective specialized representations become more general? – What advances do we need for general-purpose methods to succeed? – What “killer apps” should we be working on?