Download presentation
Presentation is loading. Please wait.
Published byAmice Ramsey Modified over 9 years ago
1
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology
2
Slot Filler Validation (SFV) Track Goals ▫Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval ▫Evaluate the contribution of RTE systems on KBP slot-filling ▫Allow teams to experiment with system voting and global SFV input: ▫Candidate slot filler ▫Possibly additional information about candidate slot fillers SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Can only improve precision, not recall of full slot-filling systems Evaluation metrics depends on SFV use case and availability of additional information about candidate fillers TAC RTE KBP Validation task (2011) TAC KBP Slot Filler Validation task (2012)
3
TAC RTE KBP Validation task (2011) 1 RTE evaluation pair, where: T is the entire document supporting the slot filler H is a set of synonymous sentences, representing different realizations of the slot filler Each slot filler returned by SF systems
4
Use Case 1: SFV as Textual Entailment (2011) SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance) Local Approach: ▫Generic textual entailment: H is relation implied by candidate slot filler (e.g., “Barack Obama has lived in Chicago”), T is provenance (entire document, or smaller regions defined by justification offsets) ▫Tailored textual entailment: train on different slot types; could be a validation module for a full slot filling system. Evaluation: ▫F score on entire pool of candidate slot fillers (unique slot filler, provenance) ▫Baseline: All T’s classified as entailing the corresponding H: P=R=percentage of entailing pairs in the pooled SF responses ▫Weak baseline, easily beat by all SFV systems; not a direct measure of utility of SFV to SF
5
Use Case 2: SFV impact on single SF systems SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence) Broken out into individual slot filling runs Global Approach: ▫System Voting, leveraging features across multiple SF runs Evaluation: ▫Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run
6
Slot Filler Validation (SFV) 2012 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence) Broken out into individual slot filling runs ▫System profile for each SF run ▫Preliminary assessment of 10% of KBP 2013 Slot Filling queries SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run
7
Slot Filler Validation (SFV) 2012 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence) Broken out into individual slot filling runs ▫System profile for each SF run ▫Preliminary assessment of 10% of KBP 2013 Slot Filling queries SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run One SFV submission, decreased F1 of almost all SF runs except poorest performing SF runs.
8
Slot Filler Validation (SFV) 2013 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence) Broken out into individual slot filling runs SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run
9
Slot Filler Validation (SFV) 2013 SFV input: ▫All regular English slot filling input (slot definitions, queries, source documents) ▫Individual candidate slot fillers (filler, provenance, confidence) Broken out into individual slot filling runs ▫System profile for each SF run ▫Preliminary assessment of 10% of KBP 2013 Slot Filling queries SFV output: ▫Binary classification (Correct / Incorrect) of each candidate slot filler Evaluation: Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run Score only on the 90% of KBP 2013 slot filling queries that didn’t have preliminary assessments released as part of SFV input
10
SF System Profile SF Team ranks in KBP 2009-2012 Did the system extract fillers from the KBP 2013 source corpus? Do the Confidence Values have meaning? Is the Confidence Value a probability? Tools or methods for: ▫Query expansion ▫Document retrieval ▫Sentence retrieval ▫NER nominal tagging ▫Coreference resolution ▫Third-party relation/event extraction ▫Dependency/Constituent parsing ▫POS tagging ▫Chunking ▫Main slot filling algorithm ▫Learning algorithm ▫Ensemble model ▫External resources
11
Slot Filler Validation Teams and Approaches BIT: Beijing Institute of Technology [local] ▫Generic RTE approach based on word overlap, cosine similarity, and token edit distance Stanford: Stanford University [local] ▫Based on Stanford’s full slot-filling system, especially component for checking consistency and validity of candidate fillers UI_CCG: University of Illinois at Urbana-Champaign [local] ▫Tailored RTE approach; check candidate for slot-specific constraints jhuapl: Johns Hopkins University Applied Physics Laboratory [weak global] ▫Consider only the confidence value associated with each candidate filler and aggregate confidence values across systems. RPI_BLENDER: Rensselaer Polytechnic Institute [strong global] ▫Based on RPI_BLENDER full slot-filling system (like Stanford), but also leveraged full set of SFV input (including SF system profile and preliminary assessments) to rank systems and apply tier-specific filtering.
13
Impact of RPI_BLENDER2 SFV on SF Runs SF RunF1 of original SF run F1 after applying SFV filter lsv10.3712120.012212 lsv50.3684620.025411 lsv30.3674380.029463 ARPANI10.364683-0.01695 lsv40.3634410.041238 RPI_BLENDER30.3366940.025749 RPI_BLENDER10.3339090.027718 lsv20.3333330.008259 RPI_BLENDER50.3328660.017108 PRIS201330.3273840.021544 NYU10.253842-0.00105 UWashington10.184026-0.011544 UWashington20.156271-0.004999 UWashington30.140677-0.013133 SAFT_KRes30.134615-0.004458 CMUML30.098274-0.002241 TALP_UPC30.036237-0.007019 Top 10 SF runs Negatively impacted SF runs
14
Conclusion Leveraging global features boosts scores of individual SF runs…. If done discriminately ▫Don’t treat all slot filling systems the same Even weak global features (e.g. raw confidence values) may help in some cases Caveat: other evaluation metrics also valid depending on use case. ▫RTE KBP validation (2011) metric may be appropriate if goal is to make assessment more efficient
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.