Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preliminaries CSCI-GA.2591

Similar presentations


Presentation on theme: "Preliminaries CSCI-GA.2591"— Presentation transcript:

1 Preliminaries CSCI-GA.2591
NYU Preliminaries CSCI-GA.2591 Ralph Grishman

2 Goal and Approach What are the limitations in extracting knowledge from text? Approach: start with a skeleton MR system enhance individual components enhance ensemble estimate confidence of components estimate confidence of combined system use domain model (Markov Logic Network) learn as we go

3 Requirements provenance — addressed by using Tipster arch, UIMA
speed - fast enough for rapid development scaling … develop algorithms of time linear or n log n and use DB-based system (we won't) capture domain constraints: rule-based inference capture uncertainty: MLN enable joint inference: MLN domain adaptive: emphasize task-specific components

4 Schedule 1. preliminaries (tipster, jet-lite, ACE); sentence segmentation 2. NE 3. coreference 4. XD coreference; brief plan reports 5. relations 6. event 7. time 8. reports on components 9. joint inference: opportunities 10. prob graphical models: beam search, belief prop. 11. Alchemy; domain models 12 KBP systems 13 self-learning 14. project reports

5 ACE Our system will be designed to read a document and extract the entities, relations, and events To train and evaluate our system, we need a corpus annotated with this information We will use the ACE 2005 corpus and the domain of national and international news (very broad)

6 Domains News domain is very broad, hard to model
Difficult to see impact of domain model on language analysis Time permitting, may use a second, narrower model football game reports hurricane news … suggestions?

7 Corpora ACE 2005: 300 kw Penn Tree Bank Reuters OntoNotes
defines classes of relations and events widely used benchmark 6 genres being augmented by ERE annotation Penn Tree Bank for sentences and POS Reuters for NE annotation OntoNotes for coreference and word sense training


Download ppt "Preliminaries CSCI-GA.2591"

Similar presentations


Ads by Google