IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:

Slides:



Advertisements
Similar presentations
Information Extraction Lecture 7 – Linear Models (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Advertisements

Discriminative Training of Markov Logic Networks
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.
Machine learning continued Image source:
A Comparison of String Matching Distance Metrics for Name-Matching Tasks William Cohen, Pradeep RaviKumar, Stephen Fienberg.
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
Optimizing Statistical Information Extraction Programs Over Evolving Text Fei Chen Xixuan (Aaron) Feng Christopher Ré Min Wang.
Extracting Personal Names from Applying Named Entity Recognition to Informal Text Einat Minkov & Richard C. Wang Language Technologies Institute.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Let us build a platform for structure extraction and matching that.... Sunita Sarawagi IIT Bombay TexPoint fonts used.
Sparse vs. Ensemble Approaches to Supervised Learning
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
Hidden Markov Models (HMMs) for Information Extraction
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Sparse vs. Ensemble Approaches to Supervised Learning
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Online Learning Algorithms
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Review: Hidden Markov Models Efficient dynamic programming algorithms exist for –Finding Pr(S) –The highest probability path P that maximizes Pr(S,P) (Viterbi)
Information Extraction Yunyao Li EECS /SI /29/2006.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Information Extraction Yahoo! Labs Bangalore Rajeev Rastogi Yahoo! Labs Bangalore.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Graphical models for part of speech tagging
Ling 570 Day 17: Named Entity Recognition Chunking.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CONCEPTS AND TECHNIQUES FOR RECORD LINKAGE, ENTITY RESOLUTION, AND DUPLICATE DETECTION BY PETER CHRISTEN PRESENTED BY JOSEPH PARK Data Matching.
Interactive Deduplication using Active Learning Sunita Sarawagi and Anuradha Bhamidipaty Presented by Doug Downey.
Distance functions and IE – 5 William W. Cohen CALD.
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
Distance functions and IE – 4? William W. Cohen CALD.
Conditional Markov Models: MaxEnt Tagging and MEMMs William W. Cohen CALD.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
Conditional Markov Models: MaxEnt Tagging and MEMMs William W. Cohen Feb 8 IE Lecture.
Revisiting Output Coding for Sequential Supervised Learning Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University.
Semi-automatic Product Attribute Extraction from Store Website
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Conditional Markov Models: MaxEnt Tagging and MEMMs
1 Introduction to Machine Learning Chapter 1. cont.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Edit Distances William W. Cohen.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING.
IE With Undirected Models: the saga continues
Conditional Random Fields
Max-margin sequential learning methods
CRFs for SPLODD William W. Cohen Sep 8, 2011.
Klein and Manning on CRFs vs CMMs
CSE 574 Finite State Machines for Information Extraction
Information Extraction Lecture
Sunita Sarawagi IIT Bombay Team: Rahul Gupta (PhD)
IE With Undirected Models
NER with Models Allowing Long-Range Dependencies
The Voted Perceptron for Ranking and Structured Classification
Sequential Learning with Dependency Nets
Presentation transcript:

IE with Dictionaries Cohen & Sarawagi

Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks: 0 –Projects are due: 4/28 (last day of class) –Additional requirement: draft (for comments) no later than 4/21

Finding names you know about Problem: given dictionary of names, find them in text –Important task beyond (biology, link analysis,...) –Exact match is unlikely to work perfectly, due to nicknames (Will Cohen), abbreviations (William C), misspellings (Willaim Chen), polysemous words (June, Bill), etc –In informal text it sometimes works very poorly –Problem is similar to record linkage (aka data cleaning, de-duping, merge-purge,...) problem of finding duplicate database records in heterogeneous databases.

Finding names you know about Technical problem: –Hard to combine state of the art similarity metrics (as used in record linkage) with state of the art NER system due to representational mismatch: Opening up the box, modern NER systems don’t really know anything about names....

IE as Sequential Word Classification Yesterday Pedro Domingos spoke this example sentence. Person name: Pedro Domingos A trained IE system models the relative probability of labeled sequences of words. To classify, find the most likely state sequence for the given words: Any words said to be generated by the designated “person name” state extract as a person name: person name location name background

IE as Sequential Word Classification Modern IE systems use a rich representation for words, and clever probabilistic models of how labels interact in a sequence, but do not explicitly represent the names extracted. w t-1 w t O t w t+1 O t +1 O t - 1 identity of word ends in “-ski” is capitalized is part of a noun phrase is in a list of city names is under node X in WordNet is in bold font is indented is in hyperlink anchor last person name was female next two words are “and Associates” … … part of noun phrase is “Wisniewski” ends in “-ski”

Semi-Markov models for IE Train on sequences of labeled segments, not labeled words. S=(start,end,label) Build probability model of segment sequences, not word sequences Define features f of segments (Approximately) optimize feature weights on training data f(S) = words x t...x u, length, previous words, case information,..., distance to known name maximize: with Sunita Sarawagi, IIT Bombay

Details: Semi-Markov model

Segments vs tagging Fredpleasestopbymyofficethisafternoon Personother Loc otherTime t 1 =u 1 =1t 2 =2, u 2 =4t 3 =5,u 3 =6t 4 =u 4 =7t 5 =u 5 =8 Fredpleasestopbymyofficethisafternoon PersonotherLocotherTime t x y t,u x y f(x t,y t ) f(x j,y j )

Details: Semi-Markov model

Conditional Semi-Markov models CMM: CSMM:

A training algorithm for CSMM’s (1) Review: Collins’ perceptron training algorithm Correct tags Viterbi tags

A training algorithm for CSMM’s (2) Variant of Collins’ perceptron training algorithm: voted perceptron learner for T TRANS like Viterbi

A training algorithm for CSMM’s (3) Variant of Collins’ perceptron training algorithm: voted perceptron learner for T TRANS like Viterbi

A training algorithm for CSMM’s (3) Variant of Collins’ perceptron training algorithm: voted perceptron learner for T SEGTRANS like Viterbi

Viterbi for HMMs

Viterbi for SMM

Sample CSMM features

Experimental results Baseline algorithms: –HMM-VP/1: tags are “in entity”, “other” –HMM-VP/4: tags are “begin entity”, “end entity”, “continue entity”, “unique”, “other” –SMM-VP: all features f(w) have versions for “f(w) true for some w in segment that is first (last, any) word of segment” –dictionaries: like Borthwick HMM-VP/1: f D (w)=“word w is in D” HMM-VP/4: f D,begin (w)=“word w begins entity in D”, etc, etc Dictionary lookup

Datasets used Used small training sets (10% of available) in experiments.

Results

Results: varying history

Results: changing the dictionary

Results: vs CRF