Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer.

Slides:

Advertisements

Similar presentations

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

October 2014 Paul Kantor’s Fusion Fest Workshop Making Sense of Unstructured Data Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,

Textual Relations Task Definition Annotate input text with disambiguated Wikipedia titles: Motivation Current state-of-the-art Wikifiers, using purely.

1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1

Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.

Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.

Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.

Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.

Experiments  Synthetic data: random linear scoring function with random constraints  Information extraction: Given a citation, extract author, book-title,

A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,

Relational Inference for Wikification

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

A Language Independent Method for Question Classification COLING 2004.

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Algorithmic Detection of Semantic Similarity WWW 2005.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Lead: Roth (UIUC) Abdelzaher (UIUC) Huang (UIUC) Lei (IBM) Presented by: Tarek Abdelzaher.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.

Cross-lingual Dataless Classification for Many Languages

Concept Grounding to Multiple Knowledge Bases via Indirect Supervision

A Brief Introduction to Distant Supervision

Cross-lingual Dataless Classification for Many Languages

On Dataless Hierarchical Text Classification

Entity- & Topic-Based Information Ordering

GLOW- Global and Local Algorithms for Disambiguation to Wikipedia

X Ambiguity & Variability The Challenge The Wikifier Solution

Lecture 24: NER & Entity Linking

Applying Key Phrase Extraction to aid Invalidity Search

Relational Inference for Wikification

Enriching Taxonomies With Functional Domain Knowledge

Entity Linking Survey

Dan Roth Department of Computer Science

Topic: Semantic Text Mining

Presentation transcript:

Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

INARC Activities I: Dan Roth, UIUC I1.1: Fundamentals of Context-aware Real-time Data Fusion  Advances in Learning & Inference of Constrained Conditional Models CCM: A computational framework for learning and inference with interdependent variables in constrained settings.  Formulating Information Fusion as CCMs;  Preliminary theoretical and experimental work on Information Fusion  Key Publications: R. Samdani and D. Roth, Efficient Learning for Constrained Structured Prediction, submitted. G. Kundu, D. Roth and R. Samdani, Constrained Classification Models for Information Fusion, submitted. M. Chang, M. Connor and D. Roth, The Necessity of Combining Adaptation Methods, EMNLP’10. M. Chang, V. Srikumar, D. Goldwasser and D. Roth, Structured Output Learning with Indirect Supervision, ICML’10. M. Chang, D. Goldwasser, D. Roth and V. Srikumar, Discriminative Learning over Constrained Latent Representations, NAACL’10 2

3 I3.2: Modeling and Mining of Text-Rich Information Networks  Large heterogeneous information networks of structured and unstructured data.  State-of-the-art algorithmic tools for knowledge acquisition and information extraction, using content & structure of the network.  Make use of both explicit network structure and hidden `ontological’ structure (e.g., category structure).  Acquire and extract information from heterogeneous information networks when data is noisy, volatile, uncertain, and incomplete  Key Publications: Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth, Local and Global Algorithms for Disambiguation to Wikipedia, ACL’11, Q. Do and D. Roth, Constraints based Taxonomic Relation Classification, EMNLP’10 Y. Chan and D. Roth, Exploiting Background Knowledge for Relation Extraction, COLING’10 Y. Chan and D. Roth, Exploiting Syntactico-Semantic Structures for Relation Extraction, ACL’11 J. Pasternack and D. Roth, Knowing What to Believe (when you already know something), COLING’10, J. Pasternack and Dan Roth, Generalized Fact-Finding, WWW’10. J. Pasternack and Dan Roth, Comprehensive Trust Metrics for Information Networks, Army Science Conference‘10 INARC activities II: Dan Roth, UIUC

Page 4 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

Information overload 5

Organizing knowledge 6 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Cross-document co-reference resolution 7 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Reference resolution: (disambiguation to Wikipedia) 8 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

The “reference” collection has structure 9 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Used_In Is_a Succeeded Released

Analysis of Information Networks 10 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Here – Wikipedia as a knowledge resource …. but we can use other resources 11 Used_In Is_a Succeeded Released

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 12

Problem formulation - matching/ranking problem 13 Text Document(s)—News, Blogs,… Wikipedia Articles

Local approach 14  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia title Text Document(s)—News, Blogs,… Wikipedia Articles

Local approach 15  Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia title Local score of matching the mention to the title Text Document(s)—News, Blogs,… Wikipedia Articles

Local + Global : using the Wikipedia structure 16 A “global” term – evaluating how good the structure of the solution is Text Document(s)—News, Blogs,… Wikipedia Articles

Can be reduced to an NP-hard problem 17 Text Document(s)—News, Blogs,… Wikipedia Articles

A tractable variation 18 1.Invent a surrogate solution Γ’; disambiguate each mention independently. 2.Evaluate the structure based on pairwise coherence scores Ψ(t i,t j ) Text Document(s)—News, Blogs,… Wikipedia Articles

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 19

I. Baseline : P(Title|Surface Form) 20 P(Title|”Chicago”)

II. Context(Title) 21 Context(Charcoal)+= “a font called __ is used to”

III. Text(Title) 22 Just the text of the page (one per title)

Putting it all together City Vs Font: ( , , ) Band Vs Font: ( , , ) Training a ranking SVM:  Consider all title pairs.  Train a ranker on the pairs (learn to prefer the correct solution).  Inference = knockout tournament.  Key: Abstracts over the text – learns which scores are important. 23 Score Baseline Score Context Score Text Chicago_city Chicago_font Chicago_band

Example: font or city? 24 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Lexical matching 25 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) Cosine similarity, TF-IDF weighting

Ranking – font vs. city 26 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Train a ranking SVM 27 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) (0.5, 0.2, 0.1, 0.8) (0.3, 0.2, 0.3, 0.5) [(0.2, 0, -0.2, 0.3), -1]

Scaling issues – one of our key contributions 28 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font)

Scaling issues 29 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) This stuff is big, and is loaded into the memory from the disk

Improving performance 30 It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Text(Chicago_city), Context(Chicago_city) Text(Chicago_font), Context(Chicago_font) Rather than computing TF- IDF weighted cosine similarity, we want to train a classifier on the fly. But due to the aggressive feature pruning, we choose PrTFIDF

Performance (local only): ranking accuracy 31 DatasetBaseline (solvable) +Local TFIDF (solvable) +Local PrTFIDF (solvable) ACE MSN News AQUAINT Wikipedia Test

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 32

Co-occurrence(Title 1,Title 2 ) 33 The city senses of Boston and Chicago appear together often.

Co-occurrence(Title 1,Title 2 ) 34 Rock music and albums appear together often

Global ranking How to approximate the “global semantic context” in the document”? (What is Γ’?)  Use only non-ambiguous mentions for Γ’  Use the top baseline disambiguation for NER surface forms.  Use the top baseline disambiguation for all the surface forms. How to define relatedness between two titles? (What is Ψ?) 35

Ψ : Pair-wise relatedness between 2 titles: Normalized Google Distance Pointwise Mutual Information 36

What is best the Γ’? (ranker accuracy, solvable mentions) 37 DatasetBaselineBaseline+ Lexical Baseline+ Global Unambiguous Baseline+ Global NER Baseline+ Global, All Mentions ACE MSN News AQUAINT Wikipedia Test

Results – ranker accuracy (solvable mentions) 38 DatasetBaselineBaseline+ Lexical Baseline+ Global Unambiguous Baseline+ Global NER Baseline+ Global, All Mentions ACE MSN News AQUAINT Wikipedia Test

Results: Local + Global 39 DatasetBaselineBaseline+ Lexical Baseline+ Lexical+ Global ACE MSN News AQUAINT Wikipedia Test

Talk outline High-level algorithmic approach.  Bi-partite graph matching with global and local inference. Local Inference.  Experiments & Results Global Inference.  Experiments & Results Results, Conclusions Demo 40

Conclusions: Dealing with a very large scale knowledge acquisition and extraction problem State-of-the-art algorithmic tools that exploit using content & structure of the network.  Formulated a framework for Local & Global reference resolution and disambiguation into knowledge networks  Proposed local and global algorithms: state of the art performance.  Addressed scaling issue: a major issue.  Identified key remaining challenges (next slide). 41

Future: We want to know what we don’t know Not dealt well in the literature  “As Peter Thompson, a 16-year-old hunter, said..”  “Dorothy Byrne, a state coordinator for the Florida Green Party…” We train a separate SVM classifier to identify such cases. The features are:  All the baseline, lexical and semantic scores of the top candidate.  Score assigned to the top candidate by the ranker.  The “confidence” of the ranker on the top candidate with respect to second-best disambiguation.  Good-Turing probability of out-of-Wikipedia occurrence for the mention. Limited success; future research. 42

Comparison to the previous state of the art (all mentions, including OOW) 43 DatasetBaselineMilne&WittenOur System- GLOW ACE MSN News AQUAINT Wikipedia Test

Demo 44