Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
October 2014 Paul Kantor’s Fusion Fest Workshop Making Sense of Unstructured Data Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.
Textual Relations Task Definition Annotate input text with disambiguated Wikipedia titles: Motivation Current state-of-the-art Wikifiers, using purely.
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
Anatomy of the Keyword Search Results Screen. A keyword search will result in.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Improving performance of Multiple Sequence Alignment in Multi-client Environments Aaron Zollman CMSC 838 Presentation.
UCB BioText TREC 2003 Participation Participants: Marti Hearst Gaurav Bhalotia, Presley Nakov, Ariel Schwartz Track: Genomics, tasks 1 and 2.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Microsoft Access 2010 Chapter 7 Using SQL.
Experiments  Synthetic data: random linear scoring function with random constraints  Information extraction: Given a citation, extract author, book-title,
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Smten: Automatic Translation of High-level Symbolic Computations into SMT Queries Richard Uhler (MIT-CSAIL) and Nirav Dave (SRI International) CAV 2013.
Relational Inference for Wikification
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
ZLOT Prototype Assessment John Carlo Bertot Associate Professor School of Information Studies Florida State University.
Querying Structured Text in an XML Database By Xuemei Luo.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
How B20 Recommendations Translate into G20 Decisions Report prepared by IORI HSE and G20 Research Group of the University of Toronto 22 March 2013 G20-B20.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Scalable Methods for Estimating Document Frequencies of Collocations in Databases Tan Yee Fan 2006 December 15 WING Group Meeting.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Deploying an Intelligent Pairing Assistant for Air Operation Centers Jeremy Ludwig, Ph.D. June 21, Distribution A: Approved for public release.
This research is supported by NIH grant U54-GM114838, a grant from the Allen Institute for Artificial Intelligence (allenai.org), and Contract HR
Unsupervised Sparse Vector Densification for Short Text Similarity
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Albert I. Reuther & Joel Goodman HPEC Sept 2003
On Dataless Hierarchical Text Classification
Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.
The Binary Number System
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
Margin-based Decomposed Amortized Inference
X Ambiguity & Variability The Challenge The Wikifier Solution
Lecture 24: NER & Entity Linking
The Four Dimensions of Search Engine Quality
Relational Inference for Wikification
Title here Subtitle (optional) Methodology Introduction Discussion
Title of Poster Site Visit 2017 Introduction Results
First Author 1, Second Author2, Useful links and key points
KDI Assignment Title - Step 3
Title of Poster Site Visit 2018 Introduction Results
Entity Linking Survey
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Shelly Cashman: Microsoft Access 2016
Project Title: I. Research Overview and Outcome
Presentation transcript:

Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA C-0181 and by and by the Army Research Laboratory (ARL) under agreement W911NF Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, ARL or the US government. GLOW Problem Formulation: bipartite matching  Γ * is a solution to the problem, a set of mention-title pairs (m,t).  Evaluate the local matching quality using Φ(m,t).  Evaluate the global structure based on (a) pair-wise coherence scores Ψ(t i,t j ) (b) an approximate solution Γ’.Γ’ allows disambiguating the mentions independently while taking into account the global structure. GLOW Problem Formulation: bipartite matching  Γ * is a solution to the problem, a set of mention-title pairs (m,t).  Evaluate the local matching quality using Φ(m,t).  Evaluate the global structure based on (a) pair-wise coherence scores Ψ(t i,t j ) (b) an approximate solution Γ’.Γ’ allows disambiguating the mentions independently while taking into account the global structure. Visit our demo: Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid * ID=2012 * “Ford” * “The Ford Presidential Library is named after President Gerald Ford” 1) MENTION IDENTIFICATION “The [Ford] m1 Presidential Library is named after President [Gerald Ford] m2 ” (m1, 0.1, -0.1) (m2, 0.2, 0.7) … Michael Jordan (basketball) Michael Jackson (singer) Gerald Ford (president) … Michael Jordan (basketball) Michael Jackson (singer) Gerald Ford (president) … KBP TAC Knowledgebase 3) GLOW OUTPUT RECONCILIATION TAC QUERY Gerald Ford (president) QUERY MAPPING Is a Macintosh font Has a distinctive N Used in Mac OS 7.6 …. Is a Macintosh font Has a distinctive N Used in Mac OS 7.6 …. It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. 2) GLOW DISAMBIGUATION (ID=2012, Form= “Ford”, Text=“The Ford Presidential Library is named after President Gerald Ford”) TAC QUERY … Michael Jordan (basketball) Michael Jackson (singer) Gerald Ford (president) … Michael Jordan (basketball) Michael Jackson (singer) Gerald Ford (president) … KBP TAC Knowledgebase Vision: aggregate information about an entity from multiple documents Task methodology: map queries to a TAC entity database (ID=2017, Form= “Michael”, Text=“This video shows Michael Jackson performing Billie Jean”) TAC QUERY Our approach: use the GLOW “disambiguation to Wikipedia” system Local and Global Algorithms for Disambiguation to Wikipedia L. Ratinov and D. Downey and M. Anderson and D. Roth (ACL 2011)Local and Global Algorithms for Disambiguation to Wikipedia Our approach: use the GLOW “disambiguation to Wikipedia” system Local and Global Algorithms for Disambiguation to Wikipedia L. Ratinov and D. Downey and M. Anderson and D. Roth (ACL 2011)Local and Global Algorithms for Disambiguation to Wikipedia 2) GLOW DISAMBIGUATION We have explored two strategies: Simple Query Identification (SIQI): mark the expressions in the text which match the query form exactly. Named Entity Query Identification (NEQI): identify the named entities in the text matching the query form approximately, normalize the spelling using Wikipedia (this poster illustrates NEQI). This is similar to query expansion. We have explored two strategies: Simple Query Identification (SIQI): mark the expressions in the text which match the query form exactly. Named Entity Query Identification (NEQI): identify the named entities in the text matching the query form approximately, normalize the spelling using Wikipedia (this poster illustrates NEQI). This is similar to query expansion. 1) MENTION IDENTIFICATION Experiments, Results (TAC 2011 Test Data) Conclusions: 1)It is possible to apply a “disambiguation to Wikipedia” system directly to the TAC KBP Entity Linking task. We did not train our system on TAC data. 2)NEQI mention identification gains 4 B 3 F1 points over SIQI. 3)All reasonable output reconciliation policies have performed comparably. Experiments, Results (TAC 2011 Test Data) Conclusions: 1)It is possible to apply a “disambiguation to Wikipedia” system directly to the TAC KBP Entity Linking task. We did not train our system on TAC data. 2)NEQI mention identification gains 4 B 3 F1 points over SIQI. 3)All reasonable output reconciliation policies have performed comparably. 3) GLOW OUTPUT RECONCILIATION Given a set of mentions linked to the query, we need to provide a single Wikipedia title. However each mention can be assigned a different title. We are using the ranker scores and the linker scores to make the decision. The “with linker” strategy discards mentions assigned negative linker score (which means the objective function increases if we map these mentions to NULL). The “no linker” strategy uses all mentions. The decision on the single-best matching title is based on ranker scores. The “Max” strategy uses a single mention with the highest ranker score. The “Sum” strategy, sums the ranker scores of all the mentions assigned to the same title. In the figure on the left, we illustrate the 4 resulting strategies along with the mentions they use, and with the resulting ranker scores for each title. The hollow circles indicate the discarded mentions, while the full circles indicate mentions that contribute to final title ranking scores.