Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Distant Supervision for Relation Extraction without Labeled Data CSE 5539.
Knowledge Base Completion via Search-Based Question Answering
Overview of the TAC2013 Knowledge Base Population Evaluation: Temporal Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji,
Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:
A Corpus for Cross- Document Co-Reference D. Day 1, J. Hitzeman 1, M. Wick 2, K. Crouch 1 and M. Poesio 3 1 The MITRE Corporation 2 University of Massachusetts,
Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher.
Tri-lingual EDL Planning Heng Ji (RPI) Hoa Trang Dang (NIST) WORRY, BE HAPPY!
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language.
Linguistic Resources for the 2013 TAC KBP Sentiment SF Evaluation Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic.
Linguistic Resources for the 2013 TAC KBP Slot Filling Evaluations Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Introduction to Machine Learning Approach Lecture 5.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Jan 4 th 2013 Event Extraction Using Distant Supervision Kevin Reschke.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
1 The BT Digital Library A case study in intelligent content management Paul Warren
INTRODUCTION TO RA.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Wei Xu, Ralph Grishman, Le Zhao (CMU) New York University Novmember 24, 2011.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.
1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Presenter: Shanshan Lu 03/04/2010
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Linguistic Resources for the 2013 TAC KBP Entity Linking Evaluation Joe Ellis (presenter), Justin Mott, Xuansong Li, Jeremy Getman, Jonathan Wright, Stephanie.
PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications
Ang Sun Director of Research, Principal Scientist, inome
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
RESEARCH POSTER PRESENTATION DESIGN © Triggers in Extraction 5. Experiments Data Development set: KBP SF 2012 corpus.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Linguistic Resources for the 2013 TAC KBP Cold Start Evaluation Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic Data.
DeepDive Introduction Dongfang Xu Ph.D student, School of Information, University of Arizona Sept 10, 2015.
Michelle Obama Researched via: Student Name: Tashe Bowen Date: 12/4/09.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
CERTIFICATE IV IN BUSINESS JULY 2015 BSBWRT401A - Write Complex Documents.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
Linguistic Resources for the 2013 TAC KBP Temporal SF Evaluation Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic Data.
Cold-Start KBP Something from Nothing Sean Monahan, Dean Carpenter Language Computer.
Mathematics Initiative Office of Superintendent of Public Instruction Mathematics Initiative Office of Superintendent of Public Instruction CAA Options.
Combining GATE and UIMA Ian Roberts. 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE and UIMA.
Automatically Labeled Data Generation for Large Scale Event Extraction
CRF &SVM in Medication Extraction
NYU Coreference CSCI-GA.2591 Ralph Grishman.
Director, Proteus Project Research in Natural Language Processing
Prepared by: Mahmoud Rafeek Al-Farra
A Framework for Benchmarking Entity-Annotation Systems
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Presentation transcript:

Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and Ralph Grishman

Introduction Slot filling (SF): extract values of specified attributes for a given entity from a large collection of natural language texts. This was the 5 th year for the KBP SF evaluation A few new things this year

New: Annotation Guidelines per:title – Titles at different organizations are different – Mitt Romney CEO at Bain Capital CEO at Bain & Company CEO at 2002 Winter Olympics per:employee_of + per:member_of = per:employee_or_member_of Entities in meta data can be used as query input or output – Consider post authors as filler candidates different fillers!

New: Provenance and Justification Exact provenance and justification required – Up to two mentions for slot/filler provenance – Up to two sentences for justification

New: Provenance and Justification Michelle Obama started her career as a corporate lawyer specializing in marketing and intellectual property. Michelle met Barack Obama when she was employed as a corporate attorney with the law firm Sidley Austin. She married him in query entity: Michelle Obama slot: per:spouse Entity provenance: “She”, “Michelle Obama” Filler provenance: “him”, “Barack Obama” Justification: “She married him in 1992.” output

New: Source Corpus One million documents from Gigaword One million web documents (similar to 2012) ~100,000 documents from web discussion fora Released as a single corpus for convenience

Scoring Metric Each non-NIL response is assessed as: Correct, ineXact, Redundant, or Wrong – Justification contains >2 sentences  W – Provenance and/or justification incomplete  W – Filler string incomplete or include extraneous material  X – Text spans justify the extraction and filler is exact Filler exists in the KB  R Filler does not exist in KB  C Credit given for C and R

Scoring Metric Precision, recall, and F1 score computed considering C and R fillers as correct Recall is tricky – Gold keys constructed from System outputs judged as correct A manual key prepared by LDC annotators independently Also serves as a fair performance ceiling

PARTICIPANTS

Participants

Participation Trends Number of teams who submitted at least one SF run

Participation Trends Number of SF submissions

RESULTS

The Task Was Harder This Year Harder – Stricter scoring – More complex queries, with a more uniform slot distribution Easier – Extracting redundant fillers is somewhat easier

Overall Results

Official scores are generally higher than diagnostic scores Redundant fillers are easier to extract. That’s why they are already in the KB? Official scores are generally higher than diagnostic scores Redundant fillers are easier to extract. That’s why they are already in the KB?

Overall Results Harder task: this score was 81.4 in 2012.

Overall Results Increased performance: 6 systems over 30 F1. Last year, there were only 2. Increased performance: 6 systems over 30 F1. Last year, there were only 2.

Overall Results Increased performance: Median: 15.7 F1. Last year: 9.9 F1. Increased performance: Median: 15.7 F1. Last year: 9.9 F1.

Overall Results Perspective: We are at 54% of human performance Perspective: We are at 54% of human performance

Distribution of Slots in Evaluation Queries …

… Harder data: 13 slots needed to cover 60% of data Some of these are hard. In 2011, only 7 slots needed to cover 60% of data. Harder data: 13 slots needed to cover 60% of data Some of these are hard. In 2011, only 7 slots needed to cover 60% of data.

Results with Lenient Scoring

Not directly comparable with the official scores due to collapsing of per:title

Results with Lenient Scoring System bug?

Results with Lenient Scoring About the same as the official scores. If systems identify the correct docs, they can extract correct offsets. About the same as the official scores. If systems identify the correct docs, they can extract correct offsets.

Results with Lenient Scoring These are much higher for some systems.

Results with Lenient Scoring Extracted fillers from documents not in the source corpus.

Results with Lenient Scoring Inferred relations not explicitly stated in text.

Technology Most successful approaches combine distant supervision (DS) with rules DS models with built-in noise reduction (Stanford) Rules DS Combiner Rules DS (Stanford, NYU, BIT) (Saarland)

Technology KBP system based on OpenIE (Washington) – Extracted tuples (Arg1, Rel, Arg2) from the KBP corpus – Manual written rules to map these tuples to KBP relations Bootstrapping dependency-based patterns (Beijing University of Posts and Telecommunications)

Technology Unsupervised clustering of patterns (UPC) Combining observed and unlabeled data through matrix factorization (UMass) Inferring new relations from the stated ones using first-order logic rules learned BLP (UT)

Conclusions Positive trends – Most popular SF evaluation to date – Best performance on average Things that need more work – Still at ~50% of human performance – Participant retention rate at lower than 50% Reduce barrier of entry: offer preprocessed data? – Sentences containing in training – Sentences containing entity in testing – NLP annotations