Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.

Slides:



Advertisements
Similar presentations
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
October 2014 Paul Kantor’s Fusion Fest Workshop Making Sense of Unstructured Data Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Improved TF-IDF Ranker
Textual Relations Task Definition Annotate input text with disambiguated Wikipedia titles: Motivation Current state-of-the-art Wikifiers, using purely.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.
Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,
Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
Relational Inference for Wikification
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Short Text Understanding Through Lexical-Semantic Analysis
Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Algorithmic Detection of Semantic Similarity WWW 2005.
Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
ISchool, Cloud Computing Class Talk, Oct 6 th Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective Tamer Elsayed,
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Bo Lin Kevin Dela Rosa Rushin Shah.  As part of our research, we are working on a cross- document co-reference resolution system  Co-reference Resolution:
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Cross-lingual Dataless Classification for Many Languages
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Semantic Processing with Context Analysis
Cross-lingual Dataless Classification for Many Languages
Entity- & Topic-Based Information Ordering
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
X Ambiguity & Variability The Challenge The Wikifier Solution
Lecture 24: NER & Entity Linking
Applying Key Phrase Extraction to aid Invalidity Search
Relational Inference for Wikification
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Enriching Taxonomies With Functional Domain Knowledge
Entity Linking Survey
Topic: Semantic Text Mining
Presentation transcript:

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of Illinois at Urbana-Champaign 2 Northwestern University 3 Rexonomy

Information overload 2

Organizing knowledge 3 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Cross-document co-reference resolution 4 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Reference resolution: (disambiguation to Wikipedia) 5 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

The “reference” collection has structure 6 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Used_In Is_a Succeeded Released

Analysis of Information Networks 7 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Here – Wikipedia as a knowledge resource …. but we can use other resources 8 Used_In Is_a Succeeded Released

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 9

Problem formulation - matching/ranking problem 10 Text Document(s)—News, Blogs,… Wikipedia Articles

Local approach 11  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles

Local approach 12  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Local score of matching the mention to the title Text Document(s)—News, Blogs,… Wikipedia Articles

Local + Global : using the Wikipedia structure 13 A “global” term – evaluating how good the structure of the solution is Text Document(s)—News, Blogs,… Wikipedia Articles

Can be reduced to an NP-hard problem 14 Text Document(s)—News, Blogs,… Wikipedia Articles

A tractable variation 15 1.Invent a surrogate solution Γ’; disambiguate each mention independently. 2.Evaluate the structure based on pair- wise coherence scores Ψ(t i,t j ) Text Document(s)—News, Blogs,… Wikipedia Articles

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 16

I. Baseline : P(Title|Surface Form) 17 P(Title|”Chicago”)

II. Context(Title) 18 Context(Charcoal)+= “a font called __ is used to”

III. Text(Title) 19 Just the text of the page (one per title)

Putting it all together City Vs Font: ( , , ) Band Vs Font: ( , , ) Training ranking SVM:  Consider all title pairs.  Train a ranker on the pairs (learn to prefer the correct solution).  Inference = knockout tournament.  Key: Abstracts over the text – learns which scores are important. 20 Score Baseline Score Context Score Text Chicago_city Chicago_font Chicago_band

Example: font or city? 21 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Lexical matching 22 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) Cosine similarity, TF-IDF weighting

Ranking – font vs. city 23 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Train a ranking SVM 24 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) (0.5, 0.2, 0.1, 0.8) (0.3, 0.2, 0.3, 0.5) [(0.2, 0, -0.2, 0.3), -1]

Scaling issues – one of our key contributions 25 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Scaling issues 26 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) This stuff is big, and is loaded into the memory from the disk

Improving performance 27 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) Rather than computing TF- IDF weighted cosine similarity, we want to train a classifier on the fly. But due to the aggressive feature pruning, we choose PrTFIDF

Performance (local only): ranking accuracy 28 DatasetBaseline (solvable) +Local TFIDF (solvable) +Local PrTFIDF (solvable) ACE MSN News AQUAINT Wikipedia Test

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 29

Co-occurrence(Title 1,Title 2 ) 30 The city senses of Boston and Chicago appear together often.

Co-occurrence(Title 1,Title 2 ) 31 Rock music and albums appear together often

Global ranking How to approximate the “global semantic context” in the document”? (What is Γ’?)  Use only non-ambiguous mentions for Γ’  Use the top baseline disambiguation for NER surface forms.  Use the top baseline disambiguation for all the surface forms. How to define relatedness between two titles? (What is Ψ?) 32

Ψ : Pair-wise relatedness between 2 titles: Normalized Google Distance Pointwise Mutual Information 33

What is best the Γ’? (ranker accuracy, solvable mentions) 34 DatasetBaselineBaseline+ Lexical Baseline+ Global Unambiguous Baseline+ Global NER Baseline+ Global, All Mentions ACE MSN News AQUAINT Wikipedia Test

Results – ranker accuracy (solvable mentions) 35 DatasetBaselineBaseline+ Lexical Baseline+ Global Unambiguous Baseline+ Global NER Baseline+ Global, All Mentions ACE MSN News AQUAINT Wikipedia Test

Results: Local + Global 36 DatasetBaselineBaseline+ Lexical Baseline+ Lexical+ Global ACE MSN News AQUAINT Wikipedia Test

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 37

Conclusions: Dealing with a very large scale knowledge acquisition and extraction problem State-of-the-art algorithmic tools that exploit using content & structure of the network.  Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks  Proposed local and global algorithms: state of the art performance.  Addressed scaling issue: a major issue.  Identified key remaining challenges (next slide). 38

We want to know what we don’t know Not dealt well in the literature  “As Peter Thompson, a 16-year-old hunter, said..”  “Dorothy Byrne, a state coordinator for the Florida Green Party…” We train a separate SVM classifier to identify such cases. The features are:  All the baseline, lexical and semantic scores of the top candidate.  Score assigned to the top candidate by the ranker.  The “confidence” of the ranker on the top candidate with respect to second-best disambiguation.  Good-Turing probability of out-of-Wikipedia occurrence for the mention. Limited success; future research. 39

Comparison to the previous state of the art (all mentions, including OOW) 40 DatasetBaselineMilne&WittenOur System- GLOW ACE MSN News AQUAINT Wikipedia Test

Demo 41