Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diana McCarthy Erasmus Mundus Visiting Scholar Saarland University

Similar presentations


Presentation on theme: "Diana McCarthy Erasmus Mundus Visiting Scholar Saarland University"— Presentation transcript:

1 Diana McCarthy Erasmus Mundus Visiting Scholar Saarland University
STS: under the hood Diana McCarthy Erasmus Mundus Visiting Scholar Saarland University

2 Proposal: Annotation with Alignments
so that we can see where the similarity lies and rationale for scores sub-alignments look for consensus on sub-parts alignments annotated with relation e.g. (just for brainstorming purposes) = (equivalence/substitutable) != (contradiction) → entailment - (missing) extra propositional speculation/certainty sentiment assign category (relation) and score to whole text pair or to sub- alignments

3 Annotation (examples from Microsoft Research Paraphrase Corpus: STS pilot task)
Reference: The new system costs between $1.1 million and $22 million, depending on configuration. Candidate: The system is priced from US$1.1 million to $22.4 million, depending on configuration.

4 Annotation (Please note, 1-5 scores off top of my head before seeing guidelines for illustrative purposes only) Reference: The new system costs between $1.1 million and $22 million, depending on configuration. Candidate: score 4.2 The system is priced from US$1.1 million to $22.4 million, depending on configuration. (* good starting point, but we want to look inside the box * brain storming – all annotations done by myself in 20 mins before coming to the workshop.) Good starting point, but we want to look inside the box

5 Annotation with Alignments: (brainstorming purposes)
Reference: [=A The [-X new] system][=B [=D costs] between [=C $1.1 million and $22 million], depending on configuration.] Candidate: score 4.2 [=A.4.2 The [-X] system] [=B.4 [=D.5 is priced] from [=C.4 US$1.1 million to $22.4 million], depending on configuration.] (* mark alignments between reference and candidate, with category (equivalence =, entails !=, - missing etc...) and score. * Alignments may overlap * May also get non contiguous sections which we can mark with same id (A, B, C etc...)) Idea to mark alignments to reference, with category and score. Alignments may overlap so not neat trees necessarily May also get non contiguous sections which we can mark with same id (A, B, C etc...)

6 Annotation with Alignments: (brainstorming purposes)
Reference: [=A The hearing occurred a day after the Pentagon for the first time singled out an officer, Dallager, for not addressing the scandal.] Candidate: score 4.9 [=A.4.9 The hearing came one day after the Pentagon for the first time singled out an officer - Dallager - for failing to address the scandal.] (* To save annotators – and systems, could avoid aligning everything. Do sub-alignments where the a subpart differs from the whole, by category or score) To save annotators – and systems, could avoid aligning everything so do sub alignments where the match does not cover everything, or the score differs for subparts

7 Annotation with Alignments
Reference: [=C U.S.] prosecutors [=B have arrested more than 130 individuals] and have [=D [=F seized] [-Y more than] $17 million [-X]] in a continuing crackdown on [=E Internet fraud [-Z and abuse].] Candidate: score ? [=B.5 More than 130 people have been arrested] and [=D.3 [-Y] $17 million [-X worth of property] [=F.5 seized]] in an [=E Internet fraud [-Z]] sweep announced Friday by three [=C.5 U.S.] government agencies. (* annotators should be allowed to leave parts without annotation. Don't know is important. Also allow for comments on any item. * Could weight according to salience of word, modifier or predicate, syntactic relation, order in sentence (new information towards the end). All depends on goal.) Could supplement with confidence? (probably do by #annotators, or only in system output) Could weight according to salience of word, modifier or predicate, syntactic relation, order in sentence (new information towards the end)

8 Annotation with Alignments:
Reference: [=A The company] [!=C didn't detail [-specD] [=B the costs of the replacement and repairs]]. Candidate: score 4.9 But [=A.5 company officials] [!=C [-specD expect] [=B.5 the costs of the replacement work] to run into the millions of dollars.] ( * mark speculation somehow, where missing or different type of level) Speculation not between a pair (aligned units) but something that should perhaps be marked to ensure extra propositional info is the same

9 Components Need for semantic and non semantic (syntax, pragmatic, extra propositional, extra-linguistic. Interleaved, but components could provide score on sub- components just as annotators can Systems mark confidence and components used on sub- alignments with categories (equivalence, contradicts, entails, speculation) We can learn interaction, rather than assume a priori Sampling really important, esp if want thin tail rather than just fat head! (Steedman, ACL dinner 2007) However, all depends on your goal/practical requirement


Download ppt "Diana McCarthy Erasmus Mundus Visiting Scholar Saarland University"

Similar presentations


Ads by Google