Joint Inference in Information Extraction Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Slides:



Advertisements
Similar presentations
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Advertisements

Max-Margin Weight Learning for Markov Logic Networks
1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Sparsification and Sampling of Networks for Collective Classification
Discriminative Training of Markov Logic Networks
Improving the Accuracy and Scalability of Discriminative Learning Methods for Markov Logic Networks Tuyen N. Huynh Adviser: Prof. Raymond J. Mooney PhD.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
Jointly Identifying Temporal Relations with Markov Logic Katsumasa Yoshikawa †, Sebastian Riedel ‡, Masayuki Asahara †, Yuji Matsumoto † † Nara Institute.
Online Structure Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Department of Computer Science The University of Texas at Austin.
Markov Logic Networks: Exploring their Application to Social Network Analysis Parag Singla Dept. of Computer Science and Engineering Indian Institute of.
Relational Representations Daniel Lowd University of Oregon April 20, 2015.
Markov Logic Networks Instructor: Pedro Domingos.
NetSci07 May 24, 2007 Entity Resolution in Network Data Lise Getoor University of Maryland, College Park.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.
Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.
Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Unifying Logical and Statistical AI Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok,
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Statistical Relational Learning Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Unifying Logical and Statistical AI Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Stanley Kok, Daniel Lowd,
Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.
Markov Logic Networks: A Unified Approach To Language Processing Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with.
Markov Logic: A Simple and Powerful Unification Of Logic and Probability Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint.
1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Learning Hidden Markov Model Structure for Information Extraction Kristie Seymour, Andrew McCullum, & Ronald Rosenfeld.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Pedro Domingos Dept. of Computer Science & Eng.
Introduction to Automatic Speech Recognition
Webpage Understanding: an Integrated Approach
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Markov Logic Parag Singla Dept. of Computer Science University of Texas, Austin.
How We’re Going to Solve the AI Problem Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Graphical models for part of speech tagging
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Machine Learning For the Web: A Unified View Pedro Domingos Dept. of Computer Science & Eng. University of Washington Includes joint work with Stanley.
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.
Information Extraction: Distilling Structured Data from Unstructured Text. -Andrew McCallum Presented by Lalit Bist.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.
CPSC 422, Lecture 32Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 32 Nov, 27, 2015 Slide source: from Pedro Domingos UW & Markov.
Markov Logic Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.
Markov Logic: A Representation Language for Natural Language Semantics Pedro Domingos Dept. Computer Science & Eng. University of Washington (Based on.
The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
Chapter 21 Robotic Perception and action Chapter 21 Robotic Perception and action Artificial Intelligence ดร. วิภาดา เวทย์ประสิทธิ์ ภาควิชาวิทยาการคอมพิวเตอร์
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Fill-in-The-Blank Using Sum Product Network
Brief Intro to Machine Learning CS539
Lecture 7: Constrained Conditional Models
Efficient Inference on Sequence Segmentation Models
Knowledge Discovery, Machine Learning, and Social Mining
Markov Logic Networks for NLP CSCI-GA.2591
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.
Probabilistic Databases with MarkoViews
Presentation transcript:

Joint Inference in Information Extraction Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Problems of Pipeline Inference AI systems typically use pipeline architecture Inference is carried out in stages E.g., information extraction, natural language processing speech recognition, vision, robotics Easy to assemble & low computational cost, but … Errors accumulate along the pipeline No feedback from later stages to earlier ones Worse: Often process one object at a time

We Need Joint Inference ? S. Minton Integrating heuristics for constraint satisfaction problems: A case study. In AAAI Proceedings, Minton, S(1993 b). Integrating heuristics for constraint satisfaction problems: A case study. In: Proceedings AAAI. AuthorTitle

We Need Joint Inference A topic of keen interest Natural language processing [Sutton et. al., 2006] Statistical relational learning [Getoor & Taskar, 2007] Etc. However … Often very complex to set up Computational cost can be prohibitive Joint inference may hurt accuracy E.g., [Sutton & McCallum, 2005] State of the art: Fully joint approaches are still rare Even partly-joint ones require much engineering

Joint Inference in Information Extraction: State of the Art Information Extraction (IE): Segmentation + Entity Resolution (ER) Previous approaches are not fully joint: [Pasula et. al., 2003]: Citation matching Perform segmentation in pre-processing [Wellner et. al., 2004]: Citation matching Only one-time feedback from ER to segmentation Both Pasula and Wellner require much engineering

Joint Inference in Information Extraction: Our Approach We develop the first fully joint approach to IE Based on Markov logic Performs segmentation and entity resolution in a single inference process Applied to citation matching domains Outperform previous approaches Joint inference improves accuracy Requires far less engineering effort

Outline Background: Markov logic MLNs for joint citation matching Experiments Conclusion

Markov Logic A general language capturing complexity and uncertainty A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number Together with constants, it defines a Markov network with One node for each ground predicate One feature for each ground formula F, with the corresponding weight w

Markov Logic Open-source package: Alchemy alchemy.cs.washington.edu Inference: MC-SAT [Poon & Domingos, 2006] Weight Learning: Voted perceptron [Singla & Domingos, 2005]

Outline Background: Markov logic MLNs for joint citation matching Experiments Conclusion

Citation Matching Extract bibliographic records from citation lists Merge co-referent records Standard application of information extraction Here: Extract authors, titles, venues

Types and Predicates Output Input token = {Minton, Hoifung, and, Pedro,...} field = {Author, Title, Venue} citation = {C1, C2,...} position = {0, 1, 2,...} Token(token, position, citation) HasPunc(citation, position) …… InField(position, field, citation) SameCitation(citation, citation)

MLNs for Citation Matching Isolated segmentation Entity resolution Joint segmentation: Jnt-Seg Joint segmentation: Jnt-Seg-ER

Isolated Segmentation Essentially a hidden Markov model (HMM) Token(+t, p, c) InField(p, +f, c) InField(p, +f, c) InField(p+1, +f, c) Add boundary detection: InField(p, +f, c) ¬HasPunc(c, p) InField(p+1, +f, c) Other rules: Start with author, comma, etc.

Entity Resolution Obvious starting point: [Singla & Domingos, 2006] MLN for citation entity resolution Assume pre-segmented citations However … Poor results from merging this and segmentation Entity resolution misleads segmentation InField(p, f, c) Token(t, p, c) InField(p', f, c') Token(t, p', c') SameCitation(c, c') Example of how joint inference can hurt

Entity Resolution Devise specific predicate and rule to pass information between stages SimilarTitle(citation, position, position, citation, position, position) Same beginning trigram and ending unigram No punctuation inside Meet certain boundary conditions SimilarTitle(c, p1, p2, c', p1', p2') SimilarVenue(c, c') SameCitation(c, c') Suffices to outperform the state of the art

Joint Segmentation: Jnt-Seg Leverage well-delimited citations to help more ambiguous ones JointInferenceCandidate(citation, position, citation) S. Minton Integrating heuristics for constraint satisfaction problems: A case study. In AAAI Proceedings, Minton, S(1993 b). Integrating heuristics for constraint satisfaction problems: A case study. In: Proceedings AAAI.

Joint Segmentation: Jnt-Seg Augment the transition rule: InField(p, +f, c) ¬HasPunc(c, p) ¬( c JointInferenceCandidate(c,p,c)) InField(p+1, +f, c) Introduces virtual boundary based on similar citations

Joint Segmentation: Jnt-Seg-ER Jnt-Seg could introduce wrong boundary in dense citation lists Entity resolution can help: Only consider co-referent pairs InField(p, +f, c) ¬HasPunc(c, p) ¬( c JointInferenceCandidate(c,p,c) SameCitation(c, c)) InField(p+1, +f, c)

MLNs for Joint Citation Matching Jnt-Seg + ER or Jnt-Seg-ER + ER 18 predicates, 22 rules State-of-the-art system in citation matching: Best entity resolution results to date Both MLNs outperform isolated segmentation

Outline Background: Markov logic MLNs for joint citation matching Experiments Conclusion

Datasets CiteSeer 1563 citations, 906 clusters Four-fold cross-validation Largest fold: 0.3 million atoms, 0.4 million clauses Cora 1295 citations, 134 clusters Three-fold cross-validation Largest fold: 0.26 million atoms, 0.38 million clauses

Pre-Processing Standard normalizations: Case, stemming, etc. Token matching: Two tokens match Differ by at most one character Bigram matches unigram formed by concatenation

Entity Resolution: CiteSeer (Cluster Recall)

Entity Resolution: Cora ApproachPairwise Recall Pairwise Precision Cluster Recall Fellegi-Sunter Joint MLN

Segmentation: CiteSeer (F1) Potential Records

Segmentation: CiteSeer (F1) All Records

Segmentation: Cora (F1) Potential Records

Segmentation: Cora (F1) All Records

Segmentation: Error Reduction by Joint Inference CoraAuthorTitleVenue All24%7%6% Potential67%56%17% CiteSeerAuthorTitleVenue All6%3%0% Potential35%44%8%

Conclusion Joint inference attracts great interest today Yet very difficult to perform effectively We propose the first fully joint approach to information extraction, using Markov logic Successfully applied to citation matching Best results to date on CiteSeer and Cora Joint inference consistently improves accuracy Use Markov logic & its general efficient algorithms Far less engineering than Pasula and Wellner

Future Work Learn features and rules for joint inference (E.g., statistical predicate invention, structural learning) General principles for joint inference Apply joint inference to other NLP tasks