Jan 4 th 2013 Event Extraction Using Distant Supervision Kevin Reschke
Event Extraction “… Delta Flight 14 crashed in Mississippi killing 40 … ” … … News Corpus Knowledge Base
Event Extraction 1)Generate Candidates Flight 14 crashed in Mississippi. 2) Classify Mentions Features: (Unigram:Mississippi) (NEType:Location) (PrevWord:in) (ObjectOf:crashed) Label: CrashSite 3) Aggregate Labels Final Label: CrashSite Run Named Entity Recognition on relevant docs
Training a Mention Classifier Need Labeled Training Data Problems: - Expensive - Does not scale One year after [USAir] Operator [Flight 11] FlightNumber crashed in [Toronto] CrashSite, families of the [200] Fatalities victims attended a memorial service in [Vancouver] NIL.
Distant Supervision Solution: Use known events to automatically label training data. Training Knowledge-Base One year after [USAir] Operator [Flight 11] FlightNumber crashed in [Toronto] CrashSite, families of the [200] Fatalities victims attended a memorial service in [Vancouver] NIL.
Distant Supervision (High Level) Begin with set of known facts. Use this set to automatically label training instances from corpus. Train and classify (handle noise) 6
Distant Supervision for Relation Extraction Slot filling for named entity relations. Minz et al (ACL); Surdeanu et al (TAC-KBP). Example: Company:,,,, etc. Known relations: founder_of(Steve Jobs, Apple) Noisy Labeling Rule: Slot value and entity name must be in same sentence. 1.(+) Apple co-founder Steve Jobs passed away yesterday. 2.(-) Steve Jobs delivered the Stanford commencement address. 3.(+) Steve Jobs was fired from Apple in
Distant Supervision for Event Extraction Sentence level labeling rule doesn’t work. 1.Events lack proper names. “The crash of USAir Flight 11” 2.Slots values occur separate from names. The plane went down in central Texas. 10 died and 30 were injured in yesterday’s tragic incident. 8
Automatic Labeling: Event Extraction Solution: Document Level Noisy Labeling Rule. Heuristic: Use Flight Number as proxy for event name. Labeling Rule: Slot value and Flight Number must appear in same document. 9 Training Fact: {, } …Flight 11 crash Sunday… …The plane went down in [Toronto] CrashSite …
Evaluation: 80 plane crashes from Wikipedia infoboxes. Training set: 32; Dev set: 8; Test set: 40 Corpus: Newswire data from 1989 – present.
Automatic Labeling 38,000 Training Instances. 39% Noise: Examples: Good: At least 52 people survived the crash of the Boeing 737. Bad: First envisioned in 1964, the Boeing 737 entered service in 1968.
Extraction Models Local Model Train and classify each mention independently. Pipeline Model Classify sequentially; use previous label as feature. Captures dependencies between labels. E.g., Passengers and Crew go together: “4 crew and 200 passengers were on board.” Joint Model Searn Algorithm (Daumé III et al., 2009). Jointly models all mentions in a sentence.
Results
Label Aggregation Exhaustive Aggregation 14 Four Four Four Four
Label Aggregation: Noisy-OR Key idea: Classifier gives us distribution over labels: Stockholm Compute Noisy-OR for each label. If Noisy-OR > threshold, use label. 15
Results: Noisy-OR
Next Step Compare Distant Supervision with state of the art supervised approach (Huang & Rilloff, ACL-2011). MUC-4 Shared Task: Terrorist Attacks. Slot Template:,,,, Distant Supervision Source: rorist_incidents rorist_incidents Short summaries of several hundred terrorist attacks.