Lance Ramshaw (with Ralph Weischedel) BBN
2 Ontobank Coreference Part of the multi-site Ontobank effort –Intended to combine with word-sense and propositional structure to capture lexical semantics Target coreference types: –Names, nominals, pronouns Attributes not corefed (“Bush is the president.”) Generic or underspecified nominals are not corefed, though pronouns may still refer to them –Appositives These can then be treated like copulas –Temporal expressions –Definite references to events Intended for broad coverage –Initial testing being done on Penn Treebank data
3Examples Names/Nominals/Pronouns –Elco Industries Inc. said it expects net income in the year ending June 30, 1990, to fall below a recent analyst 's estimate of $ 1.65 a share. The Rockford, Ill. maker of fasteners also said it expects to post sales in the current fiscal year that are `` slightly above '' fiscal 1989 sales of $ 155 million. Appositive constructions Heads and attributes in –the PhacoFlex intraocular lens, the first foldable silicone lens available for cataract surgery Events/Verbs –Sales of passenger cars grew 22%. The strong growth followed year-to-year increases. Temporal expressions –John spent three years in jail. In that time …
4Results Corpus: WSJ –Mention extents extracted automatically from the Treebank trees Annotated so far: ~300K words (4 annotators) Annotation speed: ~4500 words / hour Double-annotation and adjudication of 100K words: ~60 hours Interannotator agreement (using MUC coref measure) –Coreference: ~84% measured between annotators ~90% between annotator and the adjudicated version –Apposition: ~90% measured between annotators ~94% between annotator and the adjudicated version