Download presentation
Presentation is loading. Please wait.
1
Preliminaries CSCI-GA.2591
NYU Preliminaries CSCI-GA.2591 Ralph Grishman
2
Goal and Approach What are the limitations in extracting knowledge from text? Approach: start with a skeleton MR system enhance individual components enhance ensemble estimate confidence of components estimate confidence of combined system use domain model (Markov Logic Network) learn as we go
3
Requirements provenance — addressed by using Tipster arch, UIMA
speed - fast enough for rapid development scaling … develop algorithms of time linear or n log n and use DB-based system (we won't) capture domain constraints: rule-based inference capture uncertainty: MLN enable joint inference: MLN domain adaptive: emphasize task-specific components
4
Schedule 1. preliminaries (tipster, jet-lite, ACE); sentence segmentation 2. NE 3. coreference 4. XD coreference; brief plan reports 5. relations 6. event 7. time 8. reports on components 9. joint inference: opportunities 10. prob graphical models: beam search, belief prop. 11. Alchemy; domain models 12 KBP systems 13 self-learning 14. project reports
5
ACE Our system will be designed to read a document and extract the entities, relations, and events To train and evaluate our system, we need a corpus annotated with this information We will use the ACE 2005 corpus and the domain of national and international news (very broad)
6
Domains News domain is very broad, hard to model
Difficult to see impact of domain model on language analysis Time permitting, may use a second, narrower model football game reports hurricane news … suggestions?
7
Corpora ACE 2005: 300 kw Penn Tree Bank Reuters OntoNotes
defines classes of relations and events widely used benchmark 6 genres being augmented by ERE annotation Penn Tree Bank for sentences and POS Reuters for NE annotation OntoNotes for coreference and word sense training
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.