Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.

March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of Illinois at Urbana-Champaign 2 Northwestern University 3 Rexonomy

Information overload 2

Organizing knowledge 3 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Cross-document co-reference resolution 4 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Reference resolution: (disambiguation to Wikipedia) 5 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

The “reference” collection has structure 6 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Used_In Is_a Succeeded Released

Analysis of Information Networks 7 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Here – Wikipedia as a knowledge resource …. but we can use other resources 8 Used_In Is_a Succeeded Released

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 9

Problem formulation - matching/ranking problem 10 Text Document(s)—News, Blogs,… Wikipedia Articles

Local approach 11  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles

Local approach 12  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Local score of matching the mention to the title Text Document(s)—News, Blogs,… Wikipedia Articles

Local + Global : using the Wikipedia structure 13 A “global” term – evaluating how good the structure of the solution is Text Document(s)—News, Blogs,… Wikipedia Articles

Can be reduced to an NP-hard problem 14 Text Document(s)—News, Blogs,… Wikipedia Articles

A tractable variation 15 1.Invent a surrogate solution Γ’; disambiguate each mention independently. 2.Evaluate the structure based on pair- wise coherence scores Ψ(t i,t j ) Text Document(s)—News, Blogs,… Wikipedia Articles

I. Baseline : P(Title|Surface Form) 17 P(Title|”Chicago”)

II. Context(Title) 18 Context(Charcoal)+= “a font called __ is used to”

III. Text(Title) 19 Just the text of the page (one per title)

Putting it all together City Vs Font: (0.99-0.0001, 0.01-0.2, 0.03-0.01) Band Vs Font: (0.001-0.0001, 0.001-0.2, 0.02-0.01) Training ranking SVM:  Consider all title pairs.  Train a ranker on the pairs (learn to prefer the correct solution).  Inference = knockout tournament.  Key: Abstracts over the text – learns which scores are important. 20 Score Baseline Score Context Score Text Chicago_city0.990.010.03 Chicago_font0.00010.20.01 Chicago_band0.001 0.02

Example: font or city? 21 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Lexical matching 22 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) Cosine similarity, TF-IDF weighting

Ranking – font vs. city 23 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) 0.5 0.2 0.1 0.8 0.3 0.2 0.3 0.5

Train a ranking SVM 24 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) (0.5, 0.2, 0.1, 0.8) (0.3, 0.2, 0.3, 0.5) [(0.2, 0, -0.2, 0.3), -1]

Scaling issues – one of our key contributions 25 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Scaling issues 26 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) This stuff is big, and is loaded into the memory from the disk

Improving performance 27 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) Rather than computing TF- IDF weighted cosine similarity, we want to train a classifier on the fly. But due to the aggressive feature pruning, we choose PrTFIDF

Performance (local only): ranking accuracy 28 DatasetBaseline (solvable) +Local TFIDF (solvable) +Local PrTFIDF (solvable) ACE94.0595.6796.21 MSN News81.9184.0485.10 AQUAINT93.1994.3895.57 Wikipedia Test85.8892.7693.59

Co-occurrence(Title 1,Title 2 ) 30 The city senses of Boston and Chicago appear together often.

Co-occurrence(Title 1,Title 2 ) 31 Rock music and albums appear together often

Global ranking How to approximate the “global semantic context” in the document”? (What is Γ’?)  Use only non-ambiguous mentions for Γ’  Use the top baseline disambiguation for NER surface forms.  Use the top baseline disambiguation for all the surface forms. How to define relatedness between two titles? (What is Ψ?) 32

Ψ : Pair-wise relatedness between 2 titles: Normalized Google Distance Pointwise Mutual Information 33

What is best the Γ’? (ranker accuracy, solvable mentions) 34 DatasetBaselineBaseline+ Lexical Baseline+ Global Unambiguous Baseline+ Global NER Baseline+ Global, All Mentions ACE94.0594.5696.2196.75 MSN News81.9184.4684.0488.51 AQUAINT93.1995.4094.0495.91 Wikipedia Test85.8889.6789.5989.79

Results – ranker accuracy (solvable mentions) 35 DatasetBaselineBaseline+ Lexical Baseline+ Global Unambiguous Baseline+ Global NER Baseline+ Global, All Mentions ACE94.0596.2196.75 MSN News81.9185.1088.51 AQUAINT93.1995.5795.91 Wikipedia Test85.8893.5989.79

Results: Local + Global 36 DatasetBaselineBaseline+ Lexical Baseline+ Lexical+ Global ACE94.0596.2197.83 MSN News81.9185.1087.02 AQUAINT93.1995.5794.38 Wikipedia Test85.8893.5994.18

Conclusions: Dealing with a very large scale knowledge acquisition and extraction problem State-of-the-art algorithmic tools that exploit using content & structure of the network.  Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks  Proposed local and global algorithms: state of the art performance.  Addressed scaling issue: a major issue.  Identified key remaining challenges (next slide). 38

We want to know what we don’t know Not dealt well in the literature  “As Peter Thompson, a 16-year-old hunter, said..”  “Dorothy Byrne, a state coordinator for the Florida Green Party…” We train a separate SVM classifier to identify such cases. The features are:  All the baseline, lexical and semantic scores of the top candidate.  Score assigned to the top candidate by the ranker.  The “confidence” of the ranker on the top candidate with respect to second-best disambiguation.  Good-Turing probability of out-of-Wikipedia occurrence for the mention. Limited success; future research. 39

Comparison to the previous state of the art (all mentions, including OOW) 40 DatasetBaselineMilne&WittenOur System- GLOW ACE69.5272.7677.25 MSN News72.8368.4974.88 AQUAINT82.6483.6183.94 Wikipedia Test81.7780.3290.54

Demo 41

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.

Similar presentations

Presentation on theme: "Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.

Similar presentations

Presentation on theme: "Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of."— Presentation transcript:

Similar presentations

About project

Feedback