Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Slides:



Advertisements
Similar presentations
On-line Compilation of Comparable Corpora and Their Evaluation Radu ION, Dan TUFIŞ, Tiberiu BOROŞ, Alexandru CEAUŞU and Dan ŞTEFĂNESCU Research Institute.
Advertisements

© NCSR, Paris, December 5-6, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Enrich the lexicons for the 1 st domain based on partners remarks.
Overview of the TAC2013 Knowledge Base Population Evaluation: Temporal Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji,
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher.
Tri-lingual EDL Planning Heng Ji (RPI) Hoa Trang Dang (NIST) WORRY, BE HAPPY!
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Linguistic Resources for the 2013 TAC KBP Sentiment SF Evaluation Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic.
TAC 2012 Cold Start Knowledge Base Population James Mayfield Javier Artiles Hoa Trang Dang Special thanks to: Brendan Callahan Bonnie Dorr Joe Ellis Tim.
Entity-oriented filtering of large streams John R. Frank Ian Soboroff Max Kleiman-Weiner Dan A. Roberts.
Linguistic Resources for the 2013 TAC KBP Slot Filling Evaluations Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
4/14/20051 ACE Annotation Ralph Grishman New York University.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Query Health Business Working Group Kick-Off September 8, 2011.
Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)
JAVELIN Project Briefing 1 AQUAINT Year I Mid-Year Review Language Technologies Institute Carnegie Mellon University Status Update for Mid-Year Program.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Intelius-NYU Cold Start System Ang Sun, Xin Wang, Sen Xu, Yigit Kiran, Shakthi Poornima, Andrew Borthwick (Intelius Inc.) Ralph Grishman (New York University)
Text REtrieval Conference (TREC) The TREC Conferences Ellen Voorhees.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
University of Sheffield, NLP Entity Linking Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
The Computational Linguistics Summarization Pilot TAC 2014 Kokil Jaidka †, Muthu Kumar Chandrasekaran* ‡, Min-Yen Kan* ‡, Ankur Khanna ‡ Nanyang.
Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)
University of Economics Prague Information Extraction (WP6) Martin Labský MedIEQ meeting Helsinki, 24th October 2006.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
1 Automating Slot Filling Validation to Assist Human Assessment Suzanne Tamang and Heng Ji Computer Science Department and Linguistics Department, Queens.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Linguistic Resources for the 2013 TAC KBP Entity Linking Evaluation Joe Ellis (presenter), Justin Mott, Xuansong Li, Jeremy Getman, Jonathan Wright, Stephanie.
ACE Automatic Content Extraction A program to develop technology to extract and characterize meaning from human language.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
RESEARCH POSTER PRESENTATION DESIGN © Triggers in Extraction 5. Experiments Data Development set: KBP SF 2012 corpus.
Linguistic Resources for the 2013 TAC KBP Cold Start Evaluation Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic Data.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
Information Retrieval Quality of a Search Engine.
Linguistic Resources for the 2013 TAC KBP Temporal SF Evaluation Joe Ellis (presenter), Jeremy Getman, Jonathan Wright, Stephanie Strassel Linguistic Data.
Evaluating Web Sources By Kathy West English II Research.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Cold-Start KBP Something from Nothing Sean Monahan, Dean Carpenter Language Computer.
1 Evaluation of Multi-Media Data QA Systems AQUAINT Breakout Session – June 2002 Howard Wactlar, Carnegie Mellon Yiming Yang, Carnegie Mellon Herb Gish,
WePS2 Attribute Extraction Task Sekine and Artiles WWW 2009 Workshop.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Automatically Labeled Data Generation for Large Scale Event Extraction
Modeling Influence Opinions and Structure in Social Media
Tri-lingual EDL for 2017 and Beyond
FPG Child Development Institute
SSID ENROLLMENT Capabilities and Key Concepts
Social Knowledge Mining
Wikitology Wikipedia as an Ontology
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

TAC KBP Goals Goal: Populate a knowledge base (KB) with information about entities as found in a collection of source documents, following a specified schema for the KB KBP : Focus on augmenting an existing KB. Decompose KBP into two tasks ▫Entity-Linking: link each given named entity mention to a node in reference KB (or create new node) ▫Slot-Filling: Learn attributes about target entities from the source documents and add new information about the entity to the reference KB KBP 2012: Combine entity-linking and slot-filling to build a KB from scratch -> Cold Start KBP 2013: ▫Conversational, informal data (discussion fora) ▫Temporal constraints for Slot Filling (2011 pilot) ▫Sentiment analysis for Slot Filling

TAC KBP 2013 Track Participants Track coordinators ▫Hoa Dang (Slot Filler Validation) ▫Jim Mayfield (Entity Linking, Cold Start KBP) ▫Margaret Mitchell (Sentiment Slot Filling) ▫Mihai Surdeanu (English Slot Filling and Temporal Slot Filling) LDC linguistic resource providers: Joe Ellis, Jeremy Getman, Justin Mott, Xuansong Li, Kira Griffitt, Stephanie M. Strassel, Jonathan Wright Coordinators emeritus: Ralph Grishman, Heng Ji Advisor: Boyan Onyshkevych 45 Teams ▫14 countries (21 USA, 9 China, 3 Spain, 2 Germany,….)

6 (8) TAC KBP 2013 Tracks Entity-Linking ▫English ▫Chinese ▫Spanish Slot-Filling (English) ▫Regular ▫Sentiment ▫Temporal ▫Slot Filler Validation Task Cold Start (English)

Entity Linking and Slot Filling Tracks Goal: Augment a reference knowledge base (KB) with info about query entities (PER, ORG, GPE) as found in a diverse collection of documents Reference KB: Oct 2008 Wikipedia snapshot. Each KB node corresponds to a Wikipedia page and contains: ▫Infobox ▫Wiki_text (free text not in infobox) English source documents: ▫1M News docs ▫1M Web docs ▫99K Discussion Forum docs (threads) Chinese source documents: 2M news, 800K Web Spanish source documents: 900K news

Entity-Linking Evaluation Results English ▫Participants:26 teams ▫Highest F1: (0.730 in 2012) ▫Median F1: (0.536 in 2012) Chinese ▫Participants:4 teams ▫Highest F1:0.622 (0.740 in 2012) ▫Median F1:0.619 (0.617 in 2012) Spanish ▫Participants3 teams ▫Highest F1:0.709 (0.641 in 2012) ▫Median F1:0.651 (0.612 in 2012)

Regular Slot Filling Evaluation Results Participants: 18 teams Human F1: (0.814 in 2012) Highest System F1: (0.517 in 2012) 2 nd Highest System F1:0.339 (0.296 in 2012) Median System F1:0.150 (0.099 in 2012)

Sentiment Slot Filling Track Sentiment analysis for KBP: ▫Holder (PER, ORG, GPE) ▫Target (PER, ORG, GPE) ▫Polarity (positive, negative) Implemented as regular slot filling, with different set of slots ▫{per,org,gpe}:positive-towards ▫{per,org,gpe}:negative-towards ▫{per,org,gpe}:positive-from ▫{per,org,gpe}:negative-from Participants: 3 teams Evaluation results: ▫Human F1:0.727 ▫Highest System F1:0.132 ▫Median System F1:0.014

Temporal Slot Filling Track Find tightest temporal constraints [T1 T2 T3 T4] on a given relation ▫Relation is true for a period beginning between T1 and T2 ▫Relation is true for a period ending between T3 and T4 Participants: 5 teams Evaluation results: ▫Human Accuracy: ▫Highest System Accuracy: ▫Median System Accuracy:0.148

Slot Filler Validation Track (SFV) Task: Determine whether or not a candidate slot filler is correct Objective: improve precision without excessive reduction of recall Participants: 5 teams Some SFV runs had overwhelmingly positive impact on individual SF runs!

Cold Start KBP Track Goal: Build a KB from scratch, containing all targeted info about all entities as found in a relatively closed domain corpus of documents KB schema: same entity types and slots as regular slot-filling task Source document collection: ▫50K Web pages from small-town publications (from TREC KBA document stream) Required capabilities: ▫Entity-linking: Grounding all named entity mentions in docs to KB nodes ▫Slot-filling: Learning attributes about all named entities Post-submission evaluation queries traverse KB starting from a single entity node (entity mention): ▫0-hop: Find all children of Michael Jordan ▫1-hop: Find date of birth of each of the children of Michael Jordan

Cold Start Evaluation Results (Preliminary) Participants: 3 teams 0-hop queries: ▫Highest F (0.497 in 2012) 1-hop queries: ▫Highest F (0.255 in 2012) Combined 0-hop and 1-hop F1 ▫Highest F1: (~0.352 in 2012)

TAC KBP Discussion/Planning Sessions Monday, November 18 (2:15-3:10pm): ▫English Slot Filling ▫Slot Filler Validation ▫Temporal Slot Filling? ▫+Spanish Slot Filling? ▫+Event identification and argument extraction? Tuesday, November 19 (3:00-4:00pm): ▫Cold Start ▫English Entity Linking (as queries in Cold Start framework?) ▫Cross-Lingual Spanish and Chinese Entity Linking  + Discussion forum